Multimodal summaries

Text tags and visual hashtags diagnostic of an infographic’s topics

Just as video thumbnails facilitate the sharing, retrieval, and organization of complex media files, our multimodal summaries can be used for effectively capturing a visual digest of complex infographics. Given an infographic as input, our multimodal summary consists of textual and visual hashtags representative of an infographic’s topics. We define visual hashtags as icons that are most representative of a particular text tag.

The multimodal summaries below were computed automatically by parsing the text on an infographic (OCR), detecting icons in the infographic (blue insets), classifying the topics of an infographic, and then outputting representative icons for each topic. Results below show results of various qualities.

fig1
fig1
fig1
fig1
fig1
fig1
fig1
fig1
fig1
fig1
fig1
fig1
fig1
fig1
fig1
fig1
fig1
fig1
fig1
fig1
fig1
fig1
fig1
fig1