Responsible

Mireia Farrús

Research group

CLiC

Lead Researcher

Maria Taulé Delor

Institution

Universitat de Barcelona

Text-to-speech and speech-to-text technology, appropriate for uses in which privacy is essential or when high performance levels are required for a specific domain. 

Unlike cloud-based solutions, this system operates locally, ensuring that sensitive data are not sent through the internet. It uses deep neural networks such as Tacotron2 and models based on Transformers for speech synthesis, and Wav2Vec or DeepSpeech for transcription. It can be trained with data for a specific domain and personalised with your own voice or voices from a specific accent or dialect.

This automatic domain-specific transcription and speech synthesis technology has great potential in digital accessibility, as it provides more precise, natural, and secure tools. Some of the specific features are:

  • Screen readers
  • Voice assistants for persons with motor disabilities
  • Transcription programmes for the deaf and hard of hearing
  • Predictive text for persons with motor disabilities
  • Adaptive learning
  • Adaptation to dialects or regional accents
Category
  • Technology
Subject area
  • Easy Reading - Clear Communication
  • Digital Accessibility

If you would like more information, please contact us.

Screenshot with a spectrogram and synthesiser