The French National Research Agency Projects for science

Voir cette page en français

ANR funded project

Interactions humain-machine, objets connectés, contenus numériques, données massives et connaissance (DS0707) 2015
Projet ArtSpeech

Phonetic Articulatory Synthesis

The objective is to synthesize speech from text via the numerical simulation of the human speech production processes, i.e. the articulatory, aerodynamic and acoustic aspects.

Corpus based approaches have taken a hegemonic place in text to speech synthesis. They exploit very good acoustic quality speech databases while covering a high number of expressions and of phonetic contexts. This is sufficient to produce intelligible speech. However, these approaches face almost insurmountable obstacles as soon as parameters intimately related to the physical process of speech production have to be modified. On the contrary, an approach which rests on the simulation of the physical speech production process makes explicitly use of source parameters, anatomy and geometry of the vocal tract, and of a temporal supervision strategy. It thus offers a direct control on the nature of the synthetic speech.

The project is organized in 5 work packages:
1. Aerodynamic and acoustic simulations so as to produce a speech acoustic signal from the knowledge of the transversal area at any point of all the cavities of the vocal tract,
2. Source and coordination scenarios so as to coordinate sources together with the temporal evolution of the vocal tract, which is crucial for the production of consonants in order to ensure their identification by human listeners,
3. Supervision of the temporal evolution of the vocal tract geometry so as to anticipate the production of upcoming sounds and generate realistic articulatory gestures,
4. Acquisition of speech production data essential to know the vocal fold activation, aerodynamic parameters, and the geometrical shape of the vocal tract (via MRI at a high sampling rate),
5. General architecture to incorporate the different levels and synthesize an acoustic signal from the text.

The development of realistic simulations of the speech production processes will be a key asset to understand the respective contributions of the anatomical characteristics, the coordination capabilities, and the control of the vocal folds in the resulting speech signal. The scope of this project goes far beyond the comprehension of speech production phenomena and concerns phonetics, motor control, and within the domain of automatic speech processing, at least text to speech synthesis.

There is a number of applications. They concern situations in which standard text-to-speech synthesis is not well suited as foreign language learning or language acquisition. This project also opens new perspectives in the domain of expressive speech synthesis, and thus within the framework of conversational agents. In the medical field applications involve MRI acquisition protocols offering a high sampling rate applicable to organs which deform quickly over time, speech production pathologies, or evaluating the impact of surgery on the vocal folds or vocal tract.

We firmly believe that ArtSpeech will realize scientific and major scientific and technical advances, and will demonstrate the interest of the physical approach whether to open new research perspectives, or develop highly innovative applications in the domain of speech production in the broadest sense.

The consortium consists of four remarkably complementary research teams with leading international theoretical and practical experiences in the domains of:
• aerodynamic and acoustic simulation of speech production, and modeling of the source and the geometry of the vocal tract,
• magnetic resonance imaging and other acquisition techniques of speech production data.

Partners

Gipsa-lab Grenoble Images Parole Signal Automatique - UMR 5216

IADI IMAGERIE ADAPTATIVE DIAGNOSTIQUE ET INTERVENTIONNELLE - INSERM U947

LPP Laboratoire de phonétique et phonologie - UMR 7018

LORIA Laboratoire Lorrain de Recherche en Informatique et ses applications - UMR 7503

ANR grant: 500 117 euros
Beginning and duration: octobre 2015 - 42 mois

 

ANR Programme: Interactions humain-machine, objets connectés, contenus numériques, données massives et connaissance (DS0707) 2015

Project ID: ANR-15-CE23-0024

Project coordinator:
Monsieur Yves Laprie (Laboratoire Lorrain de Recherche en Informatique et ses applications - UMR 7503)

 

Back to the previous page

 

The project coordinator is the author of this abstract and is therefore responsible for the content of the summary. The ANR disclaims all responsibility in connection with its content.