PaddleX Instance Segmentation Task Module Data Annotation Tutorial¶
This document will introduce how to use the Labelme annotation tool to complete data annotation for a single model related to instance segmentation. Click on the link above to install the data annotation tool and view detailed usage instructions by referring to the homepage documentation.
1. Annotation Data Example¶
This dataset is a fruit instance segmentation dataset, covering five different types of fruits, including photos taken from different angles of the targets. Image examples:
2. Labelme Annotation¶
2.1 Introduction to Labelme Annotation Tool¶
Labelme
is a Python-based image annotation software with a graphical user interface. It can be used for tasks such as image classification, object detection, and image segmentation. For instance segmentation annotation tasks, labels are stored as JSON
files.
2.2 Labelme Installation¶
To avoid environment conflicts, it is recommended to install in a conda
environment.
2.3 Labelme Annotation Process¶
2.3.1 Prepare Data for Annotation¶
- Create a root directory for the dataset, such as
fruit
. - Create an
images
directory (must be namedimages
) withinfruit
and store the images to be annotated in theimages
directory, as shown below:
- Create a category label file
label.txt
in thefruit
folder for the dataset to be annotated, and write the categories of the dataset to be annotated intolabel.txt
by line. Taking the fruit instance segmentation dataset'slabel.txt
as an example, as shown below:
2.3.2 Start Labelme¶
Navigate to the root directory of the dataset to be annotated in the terminal and start the labelme
annotation tool.
labels
is the path to the category labels.
* nodata
stops storing image data in the JSON file.
* autosave
enables automatic saving.
* output
specifies the path for storing label files.
2.3.3 Begin Image Annotation¶
- After starting
labelme
, it will look like this:
* Click Edit
to select the annotation type, choose Create Polygons
.
* Create polygons on the image to outline the boundaries of the segmentation areas.
* Click again to select the category of the segmentation area.
- After annotation, click Save. (If
output
is not specified when startinglabelme
, it will prompt to select a save path upon the first save. Ifautosave
is specified, there is no need to click the Save button).
- Then click
Next Image
to annotate the next image.
* The final annotated label file will look like this.
-
Adjusting Directory Structure to Obtain a Standard
labelme
Format Dataset for Fruit Instance Segmentation -
Prepare the
train_anno_list.txt
andval_anno_list.txt
text files in the root directory of your dataset. Populate these files with the paths of alljson
files in theannotations
directory, distributing them intotrain_anno_list.txt
andval_anno_list.txt
at a specified ratio. Alternatively, you can include all paths intrain_anno_list.txt
and create an emptyval_anno_list.txt
file, intending to use a zero-code data splitting feature for re-partitioning upon upload. The specific format for fillingtrain_anno_list.txt
andval_anno_list.txt
is illustrated as follows:
- The final directory structure after organization should resemble the following:
- Compress the
fruit
directory into a.tar
or.zip
format archive to obtain the standardlabelme
format dataset for fruit instance segmentation.
3. Data Format¶
PaddleX defines a dataset named COCOInstSegDataset for instance segmentation tasks, with the following organizational structure and annotation format:
dataset_dir # Root directory of the dataset, the directory name can be changed
├── annotations # Directory for saving annotation files, the directory name cannot be changed
│ ├── instance_train.json # Training set annotation file, the file name cannot be changed, using COCO annotation format
│ └── instance_val.json # Validation set annotation file, the file name cannot be changed, using COCO annotation format
└── images # Directory for saving images, the directory name cannot be changed
Annotation files adopt the COCO
format. Please refer to the above specifications for data preparation. Additionally, refer to: Example Dataset.
When using PaddleX 2.x version for instance segmentation datasets, please refer to the corresponding format conversion section in Instance Segmentation Module Development Tutorial to convert VOC format datasets to COCO datasets. (Note in module development documentation)
Note:
- Instance segmentation data requires the use of the
COCO
data format to annotate the pixel boundaries and categories of each target area in each image in the dataset. The polygon boundaries (segmentation) of objects are represented as[x1,y1,x2,y2,...,xn,yn]
, where(xn,yn)
denotes the coordinates of each corner point of the polygon. Annotation information is stored injson
files under theannotations
directory, with separate files for the training set (instance_train.json
) and validation set (instance_val.json
). - If you have a batch of unlabeled data, we recommend using
LabelMe
for data annotation. PaddleX Pipelines support data format conversion for datasets annotated withLabelMe
. - To ensure successful format conversion, please strictly follow the file naming and organization of the example dataset: LabelMe Example Dataset.