python3-mpipinstallpaddleocr
# Install the image direction classification dependency package paddleclas (if you do not use the image direction classification, you can skip it)python3-mpipinstallpaddleclas
# Temporarily disable the new IR featureexportFLAGS_enable_pir_api=0paddleocr--image_dir=ppstructure/docs/table/1.png--type=structure--image_orientation=true
Key information extraction does not currently support use by the whl package. For detailed usage tutorials, please refer to: Key Information Extraction.
importosimportcv2frompaddleocrimportPPStructure,draw_structure_result,save_structure_restable_engine=PPStructure(show_log=True)save_folder='./output'img_path='ppstructure/docs/table/1.png'img=cv2.imread(img_path)result=table_engine(img)save_structure_res(result,save_folder,os.path.basename(img_path).split('.')[0])forlineinresult:line.pop('img')print(line)fromPILimportImagefont_path='doc/fonts/simfang.ttf'# font provided in PaddleOCRimage=Image.open(img_path).convert('RGB')im_show=draw_structure_result(image,result,font_path=font_path)im_show=Image.fromarray(im_show)im_show.save('result.jpg')
importosimportcv2importnumpyasnpfrompaddleocrimportPPStructure,save_structure_resfrompaddle.utilsimporttry_importfromPILimportImageocr_engine=PPStructure(table=False,ocr=True,show_log=True)save_folder='./output'img_path='ppstructure/docs/recovery/UnrealText.pdf'fitz=try_import("fitz")imgs=[]withfitz.open(img_path)aspdf:forpginrange(0,pdf.page_count):page=pdf[pg]mat=fitz.Matrix(2,2)pm=page.get_pixmap(matrix=mat,alpha=False)# if width or height > 2000 pixels, don't enlarge the imageifpm.width>2000orpm.height>2000:pm=page.get_pixmap(matrix=fitz.Matrix(1,1),alpha=False)img=Image.frombytes("RGB",[pm.width,pm.height],pm.samples)img=cv2.cvtColor(np.array(img),cv2.COLOR_RGB2BGR)imgs.append(img)forindex,imginenumerate(imgs):result=ocr_engine(img)save_structure_res(result,save_folder,os.path.basename(img_path).split('.')[0],index)forlineinresult:line.pop('img')print(line)
importosimportcv2frompaddleocrimportPPStructure,save_structure_resfrompaddleocr.ppstructure.recovery.recovery_to_docimportsorted_layout_boxes,convert_info_docx# Chinese imagetable_engine=PPStructure(recovery=True)# English image# table_engine = PPStructure(recovery=True, lang='en')save_folder='./output'img_path='ppstructure/docs/table/1.png'img=cv2.imread(img_path)result=table_engine(img)save_structure_res(result,save_folder,os.path.basename(img_path).split('.')[0])forlineinresult:line.pop('img')print(line)h,w,_=img.shaperes=sorted_layout_boxes(result,w)convert_info_docx(img,res,save_folder,os.path.basename(img_path).split('.')[0])
importosimportcv2frompaddleocrimportPPStructure,save_structure_resfrompaddleocr.ppstructure.recovery.recovery_to_docimportsorted_layout_boxesfrompaddleocr.ppstructure.recovery.recovery_to_markdownimportconvert_info_markdown# Chinese imagetable_engine=PPStructure(recovery=True)# English image# table_engine = PPStructure(recovery=True, lang='en')save_folder='./output'img_path='ppstructure/docs/table/1.png'img=cv2.imread(img_path)result=table_engine(img)save_structure_res(result,save_folder,os.path.basename(img_path).split('.')[0])forlineinresult:line.pop('img')print(line)h,w,_=img.shaperes=sorted_layout_boxes(result,w)convert_info_markdown(res,save_folder,os.path.basename(img_path).split('.')[0])
[{'type':'Text',
'bbox':[34,432,345,462],
'res':([[36.0,437.0,341.0,437.0,341.0,446.0,36.0,447.0],[41.0,454.0,125.0,453.0,125.0,459.0,41.0,460.0]],
[('Tigure-6. The performance of CNN and IPT models using difforen',0.90060663),('Tent ',0.465441)])}]
Each field in dict is described as follows:
field
description
type
Type of image area.
bbox
The coordinates of the image area in the original image, respectively [upper left corner x, upper left corner y, lower right corner x, lower right corner y].
res
OCR or table recognition result of the image area. table: a dict with field descriptions as follows: html: html str of table. In the code usage mode, set return_ocr_result_in_table=True whrn call can get the detection and recognition results of each text in the table area, corresponding to the following fields: boxes: text detection boxes. rec_res: text recognition results. OCR: A tuple containing the detection boxes and recognition results of each single text.
After the recognition is completed, each image will have a directory with the same name under the directory specified by the output field. Each table in the image will be stored as an excel, and the picture area will be cropped and saved. The filename of excel and picture is their coordinates in the image.
Through the content in this section, you can master the use of PP-Structure related functions through PaddleOCR whl package. Please refer to documentation tutorial for more detailed usage tutorials including model training, inference and deployment, etc.