PaddleX Multi-Label Classification Task Data Annotation Tutorial¶
This section will introduce how to use Labelme and PaddleLabel annotation tools to complete data annotation for multi-label classification tasks with a single model. Click on the above links to install the annotation tools and view detailed usage instructions by referring to the homepage documentation.
1. Annotation Data Example¶
This dataset is manually collected, covering two categories: safety helmets and human heads, with photos taken from different angles. Image examples:
2. Labelme Annotation¶
2.1 Introduction to Labelme Annotation Tool¶
Labelme
is a Python-based image annotation software with a graphical user interface. It can be used for image classification, object detection, image segmentation, and other tasks. In object detection annotation tasks, labels are stored as JSON
files.
2.2 Labelme Installation¶
To avoid environment conflicts, it is recommended to install in a conda
environment.
2.3 Labelme Annotation Process¶
2.3.1 Prepare Data for Annotation¶
- Create a root directory for the dataset, e.g.,
helmet
. - Create an
images
directory (must be namedimages
) withinhelmet
and store the images to be annotated in theimages
directory, as shown below:
* Create a category label file label.txt
in the helmet
folder and write the categories of the dataset to be annotated into label.txt
by line. For example, for a helmet detection dataset, label.txt
would look like this:
2.3.2 Start Labelme¶
Navigate to the root directory of the dataset to be annotated in the terminal and start the Labelme
annotation tool:
flags
creates classification labels for images, passing in the label path.
* nodata
stops storing image data in the JSON
file.
* autosave
enables automatic saving.
* output
specifies the storage path for label files.
2.3.3 Begin Image Annotation¶
- After starting
Labelme
, it will look like this:
* Click "Edit" to select the annotation type.
* Choose to create a rectangular box.
* Drag the crosshair to select the target area on the image.
* Click again to select the category of the target box.
* After labeling, click Save. (If the output
field is not specified when starting Labelme
, it will prompt you to select a save path the first time you save. If the autosave
field is used for automatic saving, there is no need to click the Save button).
* Then click Next Image
to label the next image.
* The final labeled tag file looks like this:
* Adjust the directory to obtain the safety helmet detection dataset in the standard Labelme
format
* Create two text files, train_anno_list.txt
and val_anno_list.txt
, in the root directory of the dataset. Write the paths of all json
files in the annotations
directory into train_anno_list.txt
and val_anno_list.txt
at a certain ratio, or write all of them into train_anno_list.txt
and create an empty val_anno_list.txt
file. Use the data splitting function to re-split. The specific filling format of train_anno_list.txt
and val_anno_list.txt
is shown below:
* The final directory structure after organization is as follows:
2.3.4 Format Conversion¶
After labeling with Labelme
, the data format needs to be converted to coco
format. Below is a code example for converting the data labeled using Labelme
according to the above tutorial:
cd /path/to/paddlex
wget https://paddle-model-ecology.bj.bcebos.com/paddlex/data/det_labelme_examples.tar -P ./dataset
tar -xf ./dataset/det_labelme_examples.tar -C ./dataset/
python main.py -c paddlex/configs/obeject_detection/PicoDet-L.yaml \
-o Global.mode=check_dataset \
-o Global.dataset_dir=./dataset/det_labelme_examples \
-o CheckDataset.convert.enable=True \
-o CheckDataset.convert.src_dataset_type=LabelMe
3. PaddleLabel Annotation¶
3.1 Installation and Startup of PaddleLabel¶
- To avoid environment conflicts, it is recommended to create a clean
conda
environment: - It can also be installed with
pip
in one step - After successful installation, you can start PaddleLabel using one of the following commands in the terminal:
PaddleLabel
will automatically open a webpage in the browser after starting. Next, you can start the annotation process based on the task.
3.2 Annotation Process with PaddleLabel¶
- Open the automatically popped-up webpage, click on the sample project, and click on Object Detection.
* Fill in the project name and dataset path. Note that the path is the absolute path on the local machine. Click Create after completion.
* First, define the categories that need to be annotated. Taking layout analysis as an example, provide 10 categories, each with a unique corresponding id. Click Add Category to create the required category names. * Start annotating * Select the label you want to annotate with. * Click the rectangular selection button on the left. * Draw a bounding box around the desired region in the image, ensuring semantic partitioning. If there are multiple columns, annotate each separately. * After completing the annotation, the result will appear in the lower-right corner. Check if the annotation is correct. * Once done, click Project Overview.
-
Export Annotated Files
-
In the Project Overview, segment the dataset as needed and click "Export Dataset".
- Fill in the export path and format. The export path should be an absolute path, and the format should be
coco
.
- After successful export, the annotated files will be available in the specified path.
-
Adjust directories to obtain COCO-formatted dataset for helmet detection
-
Rename the three
json
files and theimage
directory as follows:
Original File/Directory Name | Renamed File/Directory Name |
---|---|
train.json |
instance_train.json |
|val.json
|instance_val.json
|
|test.json
|instance_test.json
|
|image
|images
|
- Create an
annotations
directory in the root of the dataset and move alljson
files into it. The final dataset structure should look like this:
- Compress the
helmet
directory into a.tar
or.zip
file to obtain the COCO-formatted dataset for helmet detection.
4. Image Multi-Label Classification Data Format Conversion¶
After obtaining data in COCO format, you need to convert the data format to MLClsDataset
format. Below is a code example that follows the previous tutorial to use LabelMe
or PaddleLabel
annotated data and perform data format conversion:
# Download and unzip the COCO example dataset
cd /path/to/paddlex
wget https://paddle-model-ecology.bj.bcebos.com/paddlex/data/det_coco_examples.tar -P ./dataset
tar -xf ./dataset/det_coco_examples.tar -C ./dataset/
# Convert the COCO example dataset to MLClsDataset
python main.py -c paddlex/configs/multilabel_classification/PP-LCNet_x1_0_ML.yaml \
-o Global.mode=check_dataset \
-o Global.dataset_dir=./dataset/det_coco_examples \
-o CheckDataset.convert.enable=True \
-o CheckDataset.convert.src_dataset_type=COCO
5. Data Format¶
The dataset defined by PaddleX for image multi-label classification tasks is named MLClsDataset, with the following directory structure and annotation format:
dataset_dir # Root directory of the dataset, the directory name can be changed
├── images # Directory where images are saved, the directory name can be changed, but note the correspondence with the content of train.txt and val.txt
├── label.txt # Correspondence between annotation IDs and category names, the file name cannot be changed. Each line gives the category ID and category name, for example: 45 wallflower
├── train.txt # Annotation file for the training set, the file name cannot be changed. Each line gives the image path and multi-label classification tags for the image, separated by spaces, for example: images/0041_2456602544.jpg 0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
└── val.txt # Annotation file for the validation set, the file name cannot be changed. Each line gives the image path and multi-label classification tags for the image, separated by spaces, for example: images/0045_845243484.jpg 0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
The annotation files use the multi-label classification format. Please prepare your data according to the above specifications. Additionally, you can refer to the example dataset.