CONTINT - Contenus et Interactions

Fine-grained Recognition in large image databases – FIRE-ID

Submission summary

The FIRE-ID project considers the semantic annotation of visual content, such as photos or videos shared on social networks, or images captured by video surveillance devices or scanned documents. More specifically, we consider the fine-grained recognition problem, where the number of classes is large and where classes are visually similar, for instance animals, products, vehicles or document forms.

This task is useful in many application contexts: detection of logos or brands for marketing applications, vehicle classification (brand/model) for video surveillance or document forms for administrative tasks. All these sample applications are innovative services based on visual content.

As for now, fine-grained recognition has received limited attention. Previous works have mainly been devoted to specific tasks such as bird, flower or leaf recognition. The proposed solutions are very specialized and not amenable to generalize to new problems. Our goal is to address the fine-grained recognition problem in a generic manner, so that the proposed solutions could be used for the largest number of tasks possible with limited human intervention or expertise.

For this purpose, we propose to consider an original framework that consists in considering fined-grained recognition as a continuum between 1) the classification and the 2) the query-by-example (based on visual similarity) problems. Although these problems are closely related, the respective state-of-the-art techniques for these tasks differ significantly. Our goal is therefore to unify query-by-example and classification to get the best out of both techniques. More precisely, query-by-example uses precise matching techniques while classification techniques exploit powerful machine learning algorithms that can generalize. A particular attention will be devoted to the critical question of constructing a scalable system, i.e., the ability to process in the order of 10000 categories and millions of images.

A very important aspect of this project will be the rigorous evaluation of proposed algorithms. It will be conducted through participations to international evaluation campaigns involving the best vision and multimedia teams. Such campaigns will be used to compare our techniques with those of the state-of-the-art.

The two partners of the project, namely INRIA and Xerox, are complementary: they gather renowned skills in classification and search-by-example, and also have complementary means (computer infrastructure, access to large databases of documents) in view of the goals of the project.

Hervé JÉGOU (Inria, centre de recherche de Rennes - Bretagne Atlantique) – herve.jegou@inria.fr

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

Inria Rennes - Bretagne Atlantique Inria, centre de recherche de Rennes - Bretagne Atlantique
XEROX XEROX SAS

Help of the ANR 435,227 euros
Beginning and duration of the scientific project: April 2012 - 36 Months

Explorez notre base de projets financés

ANR makes available its datasets on funded projects, click here to find more.