Research

Machine learning methods, and deep neural networks in particular, have led to drastic improvements in many fields of technology. We complement these data-driven methods with model-based approaches to statistical signal processing in order to solve a wide range of speech and audio signal processing tasks in innovative ways.

Spoken language is the most important communication medium for humans, especially for (tele)communication over a distance. Speech is also increasingly being used to communicate with machines. For this to function reliably, flexibly and robustly, the recorded speech signal must be freed from external influences. The term speech signal enhancement covers methods for noise suppression, de-sounding or disentangling speech mixtures of several speakers, while speech recognition refers to its transcription, i.e. the conversion into a form that can be read by a computer. We are active in all these fields, often in cooperation with well-known international companies. A special feature of our research is that we skilfully combine machine learning methods with classical methods of statistical signal processing in order to arrive at more robust, energy-efficient and explainable solutions than would be possible with purely data-driven machine learning methods.

Speech is a fascinating signal because, in addition to the content, i.e. the information about what is being spoken, it also contains a great deal of information about who is speaking and in what environment. Phonetics research investigates, among other things, which acoustic characteristics are used to convey certain para- and extralinguistic information that provides insight into the state of the speaker and the environment. We believe that these research questions can be investigated in a novel way using the possibilities offered by today's machine learning methods for the targeted manipulation of speech signals. To this end, we are collaborating with phoneticians at Bielefeld University in a DFG project and as part of the Transregio TRR 318 "Constructing Explainability".

In our daily lives, we are surrounded by a multitude of sounds and other acoustic signals. Often unconsciously, we analyse these signals to form an idea of the environment and the activities in it. A technical system with similar capabilities would have a wide range of applications, for example for assistance systems, intelligent control systems or to support environmental perception in autonomous driving. Together with colleagues from other German universities, we are researching so-called acoustic sensor networks as part of a DFG Research Unit, which record, clean and classify acoustic signals via distributed sensor nodes in order to realise the above applications.