DS0707 - Interactions des mondes physiques, de l'humain et du monde numérique

Acurate Human Modeling in Video – ACHMOV

Submission summary

The technological advancements made over the past decade now allow the acquisition of vast amounts of visual information through the use of image capturing devices like digital cameras or camcorders. A central subject of interest in video are the humans, their motions, actions or expressions, the way they collaborate and communicate. Analyzing video data of humans, collected for complex real-world events--extracting high-fidelity content, transferring raw data into knowledge--, detecting, reconstructing or understanding human motion are problems of key importance for the advancement of a variety of technological fields, including video coding, entertainment, culture, animation and virtual reality, intelligent human-computer interfaces, protection and security. The visual analysis of humans in real-world environments, indoors and outdoors, faces major scientific and computational challenges however. The proportions of the human body vary largely across individuals, any single human body has many degrees of freedom due to articulations, and individual limbs deform due to moving muscles and clothing. Finally, real-world events involve multiple interacting humans occluded by each other or by other objects, and the scene conditions may also vary due to camera motion or lighting changes. All these factors make appropriate models of human structure, motion and action difficult to construct and difficult to estimate from images. The goal of ACHMOV is to extract detailed representations of multiple interacting humans in real-world environments in an integrated fashion through a synergy between detection, figure-ground segmentation and body part labeling, accurate 3D geometric methods for kinematic and shape modeling, and large-scale statistical learning techniques. By integrating the complementary expertise of two teams (one French, MORPHEO and one Romanian, CLVP), with solid prior track records in the field, there are considerable opportunities to move towards processing complex real world scenes of multiple interacting people, and be able to extract rich semantic representations with high fidelity. This would enable interpretation, recognition and synthesis at unprecedented levels of accuracy and in considerably more realistic setups than currently considered.

Edmond Boyer (INRIA CENTRE GRENOBLE RHÔNE-ALPES)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

IMAR - Bucharest Institut of mathematics of the Romanian academy - Bucharest
INRIA - MORPHEO INRIA CENTRE GRENOBLE RHÔNE-ALPES

Help of the ANR 286,444 euros
Beginning and duration of the scientific project: September 2014 - 36 Months

Explorez notre base de projets financés

ANR makes available its datasets on funded projects, click here to find more.