PaddleX Pipelines (NPU)¶
1. Basic Pipelines¶
Pipeline Name | Pipeline Modules | Baidu AI Studio Community Experience URL | Pipeline Introduction | Applicable Scenarios |
---|---|---|---|---|
Image Classification | Image Classification | Online Experience | Image classification is a technique that assigns images to predefined categories. It is widely used in object recognition, scene understanding, and automatic annotation. Image classification can identify various objects such as animals, plants, traffic signs, etc., and categorize them based on their features. By leveraging deep learning models, image classification can automatically extract image features and perform accurate classification. The General Image Classification Pipeline is designed to solve image classification tasks for given images. |
|
Object Detection | Object Detection | Online Experience | Object detection aims to identify the categories and locations of multiple objects in images or videos by generating bounding boxes to mark these objects. Unlike simple image classification, object detection not only recognizes what objects are in the image, such as people, cars, and animals, but also accurately determines the specific location of each object, usually represented by a rectangular box. This technology is widely used in autonomous driving, surveillance systems, and smart photo albums, relying on deep learning models (e.g., YOLO, Faster R-CNN) that efficiently extract features and perform real-time detection, significantly enhancing the computer's ability to understand image content. |
|
Semantic Segmentation | Semantic Segmentation | Online Experience | Semantic segmentation is a computer vision technique that assigns each pixel in an image to a specific category, enabling detailed understanding of image content. Semantic segmentation not only identifies the types of objects in an image but also classifies each pixel, allowing entire regions of the same category to be marked. For example, in a street scene image, semantic segmentation can distinguish pedestrians, cars, sky, and roads at the pixel level, forming a detailed label map. This technology is widely used in autonomous driving, medical image analysis, and human-computer interaction, often relying on deep learning models (e.g., FCN, U-Net) that use Convolutional Neural Networks (CNNs) to extract features and achieve high-precision pixel-level classification, providing a foundation for further intelligent analysis. |
|
Instance Segmentation | Instance Segmentation | Online Experience | Instance segmentation is a computer vision task that identifies object categories in images and distinguishes the pixels of different instances within the same category, enabling precise segmentation of each object. Instance segmentation can separately mark each car, person, or animal in an image, ensuring they are processed independently at the pixel level. For example, in a street scene image with multiple cars and pedestrians, instance segmentation can clearly separate the contours of each car and person, forming multiple independent region labels. This technology is widely used in autonomous driving, video surveillance, and robot vision, often relying on deep learning models (e.g., Mask R-CNN) that use CNNs for efficient pixel classification and instance differentiation, providing powerful support for understanding complex scenes. |
|
PP-ChatOCRv3 | Table Structure Recognition | Online Experience | Document Image Scene Information Extraction v3 (PP-ChatOCRv3-doc) is a PaddlePaddle-specific intelligent document and image analysis solution that integrates LLM and OCR technologies to solve common complex document information extraction challenges such as layout analysis, rare characters, multi-page PDFs, tables, and seal recognition. By integrating the Wenxin large model, it combines vast data and knowledge, providing high accuracy and wide applicability. The open-source version supports local experience and deployment, and fine-tuning training for each module. |
|
Layout Detection | ||||
Text Detection | ||||
Text Recognition | ||||
Seal Text Detection | ||||
Text Image Unrapping | ||||
Document Image Orientation Classification | ||||
PP-ChatOCRv4 | Table Structure Recognition | Coming Soon | Document Scene Information Extraction v4 (PP-ChatOCRv4) is a PaddlePaddle-featured intelligent analysis solution for documents and images, combining LLM, MLLM, and OCR technologies. Based on PP-ChatOCRv3, it optimizes common complex document information extraction challenges such as layout analysis, rare characters, multi-page PDFs, tables, and seal recognition. It integrates massive data and knowledge with the Ernie model, achieving high accuracy and wide applicability. This pipeline also provides flexible service deployment methods, supporting deployment on various hardware. Furthermore, it offers secondary development capabilities, allowing you to train and optimize on your own dataset, and the trained model can be seamlessly integrated. |
|
Layout Detection | ||||
Text Detection | ||||
Text Recognition | ||||
Seal Text Detection | ||||
Text Image Unrapping | ||||
Document Image Orientation Classification | ||||
Document-based Vision-Language Model | ||||
General OCR | Text Detection | Online Experience | OCR (Optical Character Recognition) is a technology that converts text in images into editable text. It is widely used in document digitization, information extraction, and data processing. OCR can recognize printed text, handwritten text, and even certain types of fonts and symbols. General OCR is used to solve text recognition tasks, extracting text information from images and outputting it in text form. PP-OCRv4 is an end-to-end OCR system that can achieve millisecond-level accurate text prediction on CPUs, reaching open-source SOTA in general scenarios. Based on this project, many developers from academia, industry, and research have quickly implemented multiple OCR applications, covering various fields such as general, manufacturing, finance, and transportation. |
|
Text Recognition | ||||
Document Image Orientation Classification | ||||
Text Image Unwarping | ||||
Text Line Orientation Classification | ||||
General Table Recognition | Table Structure Recognition | Online Experience | Table recognition is a technology that automatically identifies and extracts table content and structure from documents or images. It is widely used in data entry, information retrieval, and document analysis. By using computer vision and machine learning algorithms, table recognition can convert complex table information into an editable format, facilitating further processing and analysis by users. |
|
Text Detection | ||||
Text Recognition | ||||
Layout Detection | ||||
Doc Img Orientation Classification | ||||
Text Image Unrapping | ||||
Time Series Forecasting | Time Series Forecasting Module | Online Experience | Time series forecasting is a technique that utilizes historical data to predict future trends by analyzing patterns in time series data. It is widely applied in financial markets, weather forecasting, and sales prediction. Time series forecasting typically employs statistical methods or deep learning models (such as LSTM, ARIMA, etc.), which can handle time dependencies in data to provide accurate predictions, assisting decision-makers in better planning and response. This technology plays a crucial role in many industries, including energy management, supply chain optimization, and market analysis |
|
Time Series Anomaly Detection | Time Series Anomaly Detection Module | Online Experience | Time series anomaly detection is a technique that identifies abnormal patterns or behaviors in time series data. It is widely used in network security, device monitoring, and financial fraud detection. By analyzing normal trends and patterns in historical data, it discovers events that significantly differ from expected behaviors, such as sudden increases in network traffic or unusual transaction activities. Time series anomaly detection often employs statistical methods or machine learning algorithms (like Isolation Forest, LSTM, etc.), which can automatically identify anomalies in data, providing real-time alerts to enterprises and organizations to help promptly address potential risks and issues. This technology plays a vital role in ensuring system stability and security |
|
Time Series Classification | Time Series Classification Module | Online Experience | Time series classification is a technique that categorizes time series data into predefined classes. It is widely applied in behavior recognition, speech recognition, and financial trend analysis. By analyzing features that vary over time, it identifies different patterns or events, such as classifying a speech signal as "greeting" or "request" or dividing stock price movements into "rising" or "falling." Time series classification typically utilizes machine learning and deep learning models, effectively capturing time dependencies and variation patterns to provide accurate classification labels for data. This technology plays a key role in intelligent monitoring, voice assistants, and market forecasting applications |
|
Multi-label Image Classification | Multi-label Image Classification | Online Experience | Image multi-label classification is a technology that assigns an image to multiple related categories simultaneously. It is widely used in image tagging, content recommendation, and social media analysis. It can identify multiple objects or features present in an image, such as both "dog" and "outdoor" labels in a single picture. By using deep learning models, image multi-label classification can automatically extract image features and perform accurate classification to provide more comprehensive information for users. This technology is significant in applications like intelligent search engines and automatic content generation. |
|
Small Object Detection | Small Object Detection | Online Experience | Small object detection is a technology specifically for identifying small objects in images. It is widely used in surveillance, autonomous driving, and satellite image analysis. It can accurately find and classify small-sized objects like pedestrians, traffic signs, or small animals in complex scenes. By using deep learning algorithms and optimized convolutional neural networks, small object detection can effectively enhance the recognition ability of small objects, ensuring that important information is not missed in practical applications. This technology plays an important role in improving safety and automation levels. |
|
Image Anomaly Detection | Image Anomaly Detection | None | Image anomaly detection is a technology that identifies images that deviate from or do not conform to normal patterns by analyzing their content. It is widely used in industrial quality inspection, medical image analysis, and security surveillance. By using machine learning and deep learning algorithms, image anomaly detection can automatically identify potential defects, anomalies, or abnormal behavior in images, helping us detect problems and take appropriate measures promptly. Image anomaly detection systems are designed to automatically detect and label abnormal situations in images to improve work efficiency and accuracy. |
|
General Layout Parsing | Layout Detection | None | Layout parsing is a technology that extracts structured information from document images, primarily used to convert complex document layouts into machine-readable data formats. This technology is widely applied in document management, information extraction, and data digitization. By combining Optical Character Recognition (OCR), image processing, and machine learning algorithms, layout parsing can identify and extract text blocks, headings, paragraphs, images, tables, and other layout elements from documents. The process typically includes three main steps: layout analysis, element analysis, and data formatting, ultimately generating structured document data to enhance the efficiency and accuracy of data processing. |
|
Layout Detection Module | ||||
Text Detection Module | ||||
Text Recognition Module | ||||
Doc Img Orientation Classification | ||||
Text Image Unrapping | ||||
Table Structure Recognition | ||||
Text Line Orientation Classification | ||||
Formula Recognition | ||||
Seal Text Detection | ||||
Formula Recognition | Formula Recognition | Online Experience | Formula recognition is a technology that automatically identifies and extracts LaTeX formula content and structure from documents or images. It is widely used in document editing and data analysis in fields such as mathematics, physics, and computer science. By using computer vision and machine learning algorithms, formula recognition can convert complex mathematical formula information into editable LaTeX format, facilitating further processing and analysis by users. |
|
Layout Detection Module | ||||
Doc Img Orientation Classification | ||||
Text Image Unrapping | ||||
Seal Text Recognition | Seal Text Detection | Online Experience | Seal text recognition is a technology that automatically extracts and identifies seal content from documents or images. Seal text recognition is a part of document processing and is useful in many scenarios, such as contract comparison, inventory audit, and invoice reimbursement review. |
|
Text Recognition | ||||
Layout Detection | ||||
Doc Img Orientation Classification | ||||
Text Image Unrapping | ||||
General Image Recognition | Mainbody Detection | None | The general image recognition pipeline is designed to address open-domain target localization and recognition issues. It can effectively identify and differentiate various target objects in different environments and conditions, making it widely applicable in autonomous driving, intelligent security, medical image analysis, and industrial automation, among other fields. |
|
Image Features | ||||
Pedestrian Attribute Recognition | Pedestrian Detection | None | Pedestrian attribute recognition is a key function in computer vision systems used to locate and tag specific features of pedestrians in images or videos, such as gender, age, clothing color, and style. |
|
Pedestrian Attribute Recognition | ||||
Vehicle Attribute Recognition | Vehicle Detection | None | Vehicle attribute recognition is an important component of computer vision systems. Its main task is to locate and tag specific attributes of vehicles in images or videos, such as vehicle type, color, and license plate number. This task not only requires accurate detection of vehicles but also the recognition of detailed attribute information for each vehicle. |
|
Vehicle Attribute Recognition | ||||
Document Image Preprocessing | Doc Img Orientation Classification | Not Available | Document image preprocessing is a key step in document analysis and recognition, aiming to optimize document images through a series of technical means to improve the accuracy and efficiency of subsequent processing. Document image preprocessing includes operations such as orientation classification, text rectification, noise removal, and binarization, which can effectively improve image quality, correct document orientation, and remove interference factors. This technology is widely used in document scanning, OCR text recognition, and electronic document generation. |
|
Text Image Unrapping |
2. Featured Pipelines¶
Not supported yet, please stay tuned!