Suggesting that the speech code is related to distinctive features, the platform comprises of cascaded phonological speech analysis and phonological speech synthesis systems, both sharing the same phonological speech representation. The speech analysis is based on acoustic modelling using deep neural networks, resulting into very accurate estimates of the features, or more precisely, phonological posterior probabilities, continuous values in the range of <0−1>. We consider the trajectories of these features as surrogates to articulatory gestures.












For example,  the figure above shows a comparison of articulatory tongue tip measurements (vertical direction with respect to the occlusal plane) and the phonological anterior posterior features, and their relation is evident.

Phonological speech synthesis employs another deep neural network to transform the phonological posteriors back to speech (Cernak et al., 2015). Here is a BSD licensed open software with pre-trained English models:

https://github.com/idiap/phonvoc

To be added (1Q2016):

  • Compositional phonology settings
  • Phonological TTS settings
  • Low bit rate speech coding settings