Skip to content

Layout Analysis Dataset

Layout Analysis Dataset

Here are the common datasets of layout anlysis, which are being updated continuously. Welcome to contribute datasets.

Most of the layout analysis datasets are object detection datasets. In addition to open source datasets, you can also label or synthesize datasets using tools such as labelme and so on.

1. PubLayNet dataset

2、CDLA dataset

  • Data source: https://github.com/buptlihang/CDLA
  • Data introduction: CDLA dataset contains 5000 training images and 1000 validation images with 10 categories, which are Text, Title, Figure, Figure caption, Table, Table caption, Header, Footer, Reference, Equation. Some images and their annotations as shown below.

  • Download address: https://github.com/buptlihang/CDLA

  • Note: When you train detection model on CDLA dataset using PaddleDetection, you need to remove the label __ignore__ and _background_.

3、TableBank dataet

Comments