Note: This tutorial mainly introduces the usage of PP-OCR series models, please refer to PP-Structure Quick Start for the quick use of document analysis related functions.
pipinstall"paddleocr>=2.0.1"# Recommend to use version 2.0.1+
For windows users: If you getting this error OSError: [WinError 126] The specified module could not be found when you install shapely on windows. Please try to download Shapely whl file here.
[[[441.0,174.0],[1166.0,176.0],[1165.0,222.0],[441.0,221.0]],('ACKNOWLEDGEMENTS',0.9971134662628174)][[[403.0,346.0],[1204.0,348.0],[1204.0,384.0],[402.0,383.0]],('We would like to thank all the designers and',0.9761400818824768)][[[403.0,396.0],[1204.0,398.0],[1204.0,434.0],[402.0,433.0]],('contributors who have been involved in the',0.9791957139968872)]......
pdf file is also supported, you can infer the first few pages by using the page_num parameter, the default is 0, which means infer all pages
Version
paddleocr uses the PP-OCRv4 model by default(--ocr_version PP-OCRv4). If you want to use other versions, you can set the parameter --ocr_version, the specific version description is as follows:
version name
description
PP-OCRv4
support Chinese and English detection and recognition, direction classifier, support multilingual recognition
PP-OCRv3
support Chinese and English detection and recognition, direction classifier, support multilingual recognition
PP-OCRv2
only supports Chinese and English detection and recognition, direction classifier, multilingual model is not updated
PP-OCR
support Chinese and English detection and recognition, direction classifier, support multilingual recognition
If you want to add your own trained model, you can add model links and keys in paddleocr and recompile.
More whl package usage can be found in whl package
frompaddleocrimportPaddleOCR,draw_ocr# Paddleocr supports Chinese, English, French, German, Korean and Japanese.# You can set the parameter `lang` as `ch`, `en`, `fr`, `german`, `korean`, `japan`# to switch the language model in order.ocr=PaddleOCR(use_angle_cls=True,lang='en')# need to run only once to download and load model into memoryimg_path='./imgs_en/img_12.jpg'result=ocr.ocr(img_path,cls=True)foridxinrange(len(result)):res=result[idx]forlineinres:print(line)# draw resultfromPILimportImageresult=result[0]image=Image.open(img_path).convert('RGB')boxes=[line[0]forlineinresult]txts=[line[1][0]forlineinresult]scores=[line[1][1]forlineinresult]im_show=draw_ocr(image,boxes,txts,scores,font_path='./fonts/simfang.ttf')im_show=Image.fromarray(im_show)im_show.save('result.jpg')
Output will be a list, each item contains bounding box, text and recognition confidence
[[[441.0,174.0],[1166.0,176.0],[1165.0,222.0],[441.0,221.0]],('ACKNOWLEDGEMENTS',0.9971134662628174)][[[403.0,346.0],[1204.0,348.0],[1204.0,384.0],[402.0,383.0]],('We would like to thank all the designers and',0.9761400818824768)][[[403.0,396.0],[1204.0,398.0],[1204.0,434.0],[402.0,433.0]],('contributors who have been involved in the',0.9791957139968872)]......
Visualization of results
If the input is a PDF file, you can refer to the following code for visualization
frompaddleocrimportPaddleOCR,draw_ocr# Paddleocr supports Chinese, English, French, German, Korean and Japanese.# You can set the parameter `lang` as `ch`, `en`, `fr`, `german`, `korean`, `japan`# to switch the language model in order.PAGE_NUM=10# Set the recognition page numberpdf_path='default.pdf'ocr=PaddleOCR(use_angle_cls=True,lang="ch",page_num=PAGE_NUM)# need to run only once to download and load model into memory# ocr = PaddleOCR(use_angle_cls=True, lang="ch", page_num=PAGE_NUM,use_gpu=0) # To Use GPU,uncomment this line and comment the above one.result=ocr.ocr(pdf_path,cls=True)foridxinrange(len(result)):res=result[idx]ifres==None:# Skip when empty result detected to avoid TypeError:NoneTypeprint(f"[DEBUG] Empty page {idx+1} detected, skip it.")continueforlineinres:print(line)# draw the resultimportfitzfromPILimportImageimportcv2importnumpyasnpimgs=[]withfitz.open(pdf_path)aspdf:forpginrange(0,PAGE_NUM):page=pdf[pg]mat=fitz.Matrix(2,2)pm=page.get_pixmap(matrix=mat,alpha=False)# if width or height > 2000 pixels, don't enlarge the imageifpm.width>2000orpm.height>2000:pm=page.get_pixmap(matrix=fitz.Matrix(1,1),alpha=False)img=Image.frombytes("RGB",[pm.width,pm.height],pm.samples)img=cv2.cvtColor(np.array(img),cv2.COLOR_RGB2BGR)imgs.append(img)foridxinrange(len(result)):res=result[idx]ifres==None:continueimage=imgs[idx]boxes=[line[0]forlineinres]txts=[line[1][0]forlineinres]scores=[line[1][1]forlineinres]im_show=draw_ocr(image,boxes,txts,scores,font_path='doc/fonts/simfang.ttf')im_show=Image.fromarray(im_show)im_show.save('result_page_{}.jpg'.format(idx))
Detection and Recognition Using Sliding Windows
To perform OCR using sliding windows, the following code snippet can be employed:
frompaddleocrimportPaddleOCRfromPILimportImage,ImageDraw,ImageFont# Initialize OCR engineocr=PaddleOCR(use_angle_cls=True,lang="en")img_path="./very_large_image.jpg"slice={'horizontal_stride':300,'vertical_stride':500,'merge_x_thres':50,'merge_y_thres':35}results=ocr.ocr(img_path,cls=True,slice=slice)# Load imageimage=Image.open(img_path).convert("RGB")draw=ImageDraw.Draw(image)font=ImageFont.truetype("./doc/fonts/simfang.ttf",size=20)# Adjust size as needed# Process and draw resultsforresinresults:forlineinres:box=[tuple(point)forpointinline[0]]# Finding the bounding boxbox=[(min(point[0]forpointinbox),min(point[1]forpointinbox)),(max(point[0]forpointinbox),max(point[1]forpointinbox))]txt=line[1][0]draw.rectangle(box,outline="red",width=2)# Draw rectangledraw.text((box[0][0],box[0][1]-25),txt,fill="blue",font=font)# Draw text above the box# Save resultimage.save("result.jpg")
This example initializes the PaddleOCR instance with angle classification enabled and sets the language to English. The ocr method is then called with several parameters to customize the detection and recognition process, including the slice parameter for handling image slices.
In this section, you have mastered the use of PaddleOCR whl package.
PaddleX provides a high-quality ecological model of the paddle. It is a one-stop full-process high-efficiency development platform for training, pressing and pushing. Its mission is to help AI technology to be implemented quickly. The vision is to make everyone an AI Developer! Currently PP-OCRv4 has been launched on PaddleX, you can enter General OCR to experience the whole process of model training, compression and inference deployment.