DS0707 - Interactions des mondes physiques, de l'humain et du monde numérique

Analysis and Understanding of Document Images in Network Media – AUDINM

AUDINM

Analysis and Understanding of Document Images in Network Media

Text detection and recognition in scene images and born-digital document images.

In the AUDINM project, we are combining efforts and expertise of two research labs from document analysis community towards achieving the goal of mining and retrieval of weakly structured contents of social networks. We focus mainly on two types of the large set of different images on social networks: scene images with embedded text and born-digital documents. Those two image classes are more popular in social networks and bring new technical challenges compared to traditional paper documents. Analyzing the contents of those two image classes will help in the development of the next generation of search engines, cyber security, commercial data mining and interactive tourists’ guidance.

Fast image categorization, Scene text detection, Layout analysis

Fast image categorization , Image database, Method for fast categorization of Web images
Scene text detection: Text confidence computation, Text component verification
Layout analysis and graphics recognition: Method(s) for multiple layer separation

Results

Fast image categorization (WP1) – Partner: NLPR
o Image database – Deliverable 1.1 as a database and a publication – Month 12
o Method for fast categorization of Web images – Deliverable 1.2 as a publication and a software prototype – Month 12
Scene text detection (WP2) – Partner: NLPR (with collaboration from L3i)
o Text confidence computation – Deliverable 2.1 as a publication and a software prototype – Month 12
o Text component verification using CRF – Deliverable 2.2 as a publication and a software prototype – Month 18
Layout analysis and graphics recognition (WP4) – Partner L3i
o Method(s) for multiple layer separation – Deliverable 4.1 as a publication and a software prototype – Month 12

Prospects

Continue research and development in WP2 (text detection), and work on WP3 (text recognition) with focus on deep learning techniques.
During research visits, enforce collaborative work between the two partners in terms of co-authored publications.

Scientific productions and patents

- Many publications (cited in our report) - both conference and journal publications
- Image database

Submission summary

There is a huge growth in the amount of multimedia data on social network media. With such large data collections of weakly structured content, current information retrieval methods face difficulties in mining such data. The objective of the proposed project is to develop a system for mining and retrieval of heterogeneous documents, mainly focusing on weakly structured documents such as born-digital documents and scene images with embedded text. Analyzing the contents of such documents is very challenging because of complex background, complex layout, perspective distortion, lighting variance, defocus, variation of font type, size and color, mixed graphics and text, multi-languages within the same text and sometimes low resolution.

The research plan of the proposed system is composed of complementary parts that finally form a pipeline of a complete system. First, different image types are received as input; they will be classified by the “fast image categorization” part. Then, scene images will be analyzed by the “scene text detection and extraction” part, whereas born-digital documents will be analyzed by the “layout analysis and page segmentation” part. The text extracted from different images types from the previous two parts will be analyzed by the “multi-lingual text recognition” part. Finally, the “contextual interpretation and information integration” part will combine the information analyzed from the previous parts and integrate them in order to reach a meaningful representation of the document database.

The two project partners will collaborate on solving the different problems in accordance with their respective expertise. The expected research achievements in analyzing the contents of document images in network media will provide research experience and visibility to the partners, and will be very useful for different social applications -- such as interactive tourists guidance --, cyber security and commercial data mining.

Jean-Marc OGIER (Laboratoire Informatique, Image, Interactions)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

L3i Laboratoire Informatique, Image, Interactions
NLPR Institute of Automation of Chinese Academy of Sciences

Help of the ANR 244,296 euros
Beginning and duration of the scientific project: September 2014 - 48 Months

Explorez notre base de projets financés

ANR makes available its datasets on funded projects, click here to find more.