Visually29K

This dataset is provided for research purposes, with annotations that can be used for different computer vision and natural language tasks.

Papers

"Synthetically Trained Icon Proposals for Parsing and Summarizing Infographics" uses this dataset to train an icon proposal mechanism (icon detection for infographics), and demonstrates an automatic summarization application. "Understanding Infographics through Textual and Visual Tag Prediction" was an earlier paper demonstrating results on the task of tag prediction, by extracting the text and visual content from within an infographic.

Data & Code

We supply meta-data for 60K infographics as well as a subset of 29K infographics with curated categories and tags, split into training and test sets for prediction tasks. For 1.4K infographics, we additionally provide crowdsourced annotations of icon (visual element) locations for detection applications.

Explore

Explore sample infographics from the dataset along with our automatically-computed text and icon detections. We demo how looking inside the image can provide relevant results for a given search query.

Team

(affiliations at the time when the contributions were made)

Zoya Bylinskii

Research Assistant (PhD)
MIT

Spandan Madan

Research Assistant (MS)
Harvard

Matthew Tancik

Research Assistant (MEng)
MIT

Sami Alsheikh

Research Assistant (MEng)
MIT

Adrià Recasens

Research Assistant (PhD)
MIT

Kimberli Zhong

Research Assistant (MEng)
MIT

Hanspeter Pfister

Professor
SEAS, Harvard

Frédo Durand

Professor
EECS, MIT

Aude Oliva

Principal Research Scientist
CSAIL, MIT

Dataset

Metadata and annotations corresponding to thousands of infographics

Dataset	Tags	Images per tag	Tags per image	Aspect ratios
Visually63K	19,469	Mean=7.8, Range=1-3784	Mean=3.7, Range=0-10	1:20 to 22:1
Visually29K	391	Mean=151, Range=50-2331	Mean=2.1, Range=1-9	1:5 to 5:1

Visually63K: We provide the URLs for downloading 63,738 infographics from http://visual.ly along with a Python pickle file with meta-data (category labels, tags, titles, # likes, etc.) available for research problems in image recognition and natural language processing.
Visually29K: We curated a subset of 28,973 infographics to cover a fixed set of 391 tags (filtered down from free-form text). There are at least 50 infographic instances for each of 391 different tags. Infographics have an average of 2 tags each (fine-grained topics) and are additionally annotated with 1 of 26 categories (coarse topics). We have split the files into training and test sets for prediction tasks.
Human annotations: For a subset of 1,400 infographics, we collected bounding boxes of all the icons (visual elements) in the infographics. We used these annotations to evaluate our synthetically-trained icon proposal mechanism.

Papers

If you use our data or code, please consider citing


@inproceedings{visually2,
      author = {Spandan Madan*, Zoya Bylinskii*, Matthew Tancik*, Adrià Recasens, Kimberli Zhong, Sami Alsheikh, Hanspeter Pfister, Aude Oliva, Fredo Durand}
      title = {Synthetically Trained Icon Proposals for Parsing and Summarizing Infographics},
      booktitle = {arXiv preprint arXiv:1807.10441},
      url = {https://arxiv.org/pdf/1807.10441},
      year = {2018}
}


@inproceedings{visually1,
      author = {Zoya Bylinskii*, Sami Alsheikh*, Spandan Madan*, Adria Recasens*, Kimberli Zhong, Hanspeter Pfister, Fredo Durand, Aude Oliva}
      title = {Understanding infographics through textual and visual tag prediction},
      booktitle = {arXiv preprint arXiv:1709.09215},
      url = {https://arxiv.org/pdf/1709.09215},
      year = {2017}
}

Icon Detection

Synthetically Trained Icon Proposals for Parsing and Summarizing Infographics

We generate synthetic data by augmenting Internet-scraped icons onto patches of existing infographics to train an icon proposal mechanism. The synthetically trained model successfully localizes icons (visual elements) in real-world infographics.

We provide our dataset of 250K collected icons from the web that we used for our data augmentation as well as a classification task. These icons span 391 different tag classes (the same ones as used for tagging the infographics in our Visually29K dataset).

Dense annotation

Synthetically Trained Icon Proposals for Parsing and Summarizing Infographics

We use the Google API OCR and combine it with our own icon detection and classification approach to densely label all the elements on our infographic images. We provide all the automatically computed text transcriptions (along with bounding boxes), and icon locations with class labels generated by our approach for the 63K dataset. We also provide code for detecting and classifying text and icons on novel multi-modal images.

Multimodal summarization

Synthetically Trained Icon Proposals for Parsing and Summarizing Infographics

We use our automatic icon proposals in combination with icon classification and text extraction to present a novel multimodal summarization application. Given an infographic as input, our application automatically outputs text tags and visual hashtags that are textually and visually representative of the infographic’s topics, respectively.

Visually29K

a large-scale curated infographics dataset