In daily life, we are surrounded by a multitude of noises and other acoustic events. Nevertheless we are able to effortlessly converse in such an environment, retrieve a desired voice while disregarding others, or draw conclusions about the composition of the environment and activities therein, given the observed sound scene. A technical system with similar capabilities would find numerous applications in fields as diverse as ambient assisted living, personal communications, and surveillance. With the continuously decreasing cost of acoustic sensors and the pervasiveness of wireless networks and mobile devices, the technological infrastructure of wireless acoustic sensor networks is available, and the bottleneck for unleashing new applications is clearly on the algorithmic side.
This Research Unit aims at rendering acoustic signal processing and classification over acoustic sensor networks more 'intelligent', more adaptive to the variability of acoustic environments and sensor configurations, less dependent on supervision, and at the same time more trustworthy for the users. This will pave the way for a new array of applications which combine advanced acoustic signal processing with semantic analysis of audio. The project objectives will be achieved by adopting a three-layer approach treating communication and synchronization aspects on the lower, signal extraction and enhancement on the middle, and acoustic scene classification and interpretation on the upper layer. We aim at applying a consistent methodology for optimization across these layers and a unification of advanced statistical signal processing with Bayesian learning and other machine learning techniques. Thus, the project is dedicated to pioneering work towards a generic and versatile framework for the integration of acoustic sensor networks in several classes of state-of-the-art and emerging applications.
We have prepared a set of demonstrations for the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA'21). All demonstrations utilizes recent technologies developed within the RU.
Since Jan. 1, 2017, the German Research Foundation (Deutsche Forschungsgemeinschaft) supports the Research Unit DFG FOR 2457 “Acoustic Sensor Networks”, a collaboration of researchers from Universities of Paderborn (Häb-Umbach, Karl, Schmalenströer), Bochum (Enzner, Martin), and Erlangen-Nürnberg (Kellermann). We, the PIs and the research fellows, invite you the signal processing community to a midterm workshop to share our enthusiasm for the fascinating and challenging topic of signal processing and classification over Acoustic Sensor Networks. The workshop will be held as a satellite event to the ITG Conference on Speech Communication in Oldenburg, Germany. (Project Workshop Information)
An acoustic sensor network comprises multiple nodes each with acoustic sensing, sampling and computing capabilities which are further connected in a wireless and cooperative digital communication network. Being spatially distributed, the network can cover a large space and yet have a sensor close to relevant sound sources. The purpose of the network is the meaningful aggregation of sound from various sensors in order to deliver high-fidelity speech output or to classify environmental sounds or acoustic scenes. Applications range from support for smart rooms to enhanced teleconferencing and large-scale environmental monitoring. The Research Unit addresses the fundamental challenges involved, such as communication, synchronization, distributed signal enhancement, classification, and privacy preservation. The workshop gives an overview of the work carried out in the consortium, the hard- and software frameworks used, and offers us an opportunity to interface with you – the speech and audio experts!
The first coding workshop on the software framework MARVELO took place in Paderborn from February 26.-28. All project partners worked together on the Linux based, distributed software to realize first (tiny) prototypes or mock-ups. Project P1 regarded a diarization scenario with a 16-channel soundcard using a C/C++ implementation of an energy-based voice activity detection and angle of arrival information from SRP-PHAT. The signal processing itself was conducted by two Raspberry-PI3. Project P2 implemented in Python a Farrow-Filter for resampling. A new localization algorithm and the required pre-training of it was implemented in Project P3 in Python. Project P4 implemented a distributed feature
extraction and a three class sound classification (noise, music, speech) running on two Raspberrys. A tiny acoustic scene analysis task was tackled by P5 where two neural networks where employed for feature extraction (convolutive layer) and classification (feed forward). These tensorflow graphs run on Raspberry Pis and were able to distinguish between music and speech. In a nice atmosphere all participants enjoyed a hands-on training on the MARVELO software framework which supports distributing signal processing tasks on a Raspberry PI network.
This RU is dedicated to address the key scientific challenges of next-generation audio processing, based on acoustic sensor networks. It will not restrict itself to a single application but rather consider the challenges common to many applications. The goals are summarized as follows:
- Communication and audio processing over acoustic sensor networks: Signal processing algorithms will be developed that are aware of the limitations of the communication network and processing units and that strike an optimal balance between centralized and distributed processing. Further, the communication protocol will be made aware of the signal processing needs to optimally organize data streams and allocate resources, trading off energy efficiency, communication bandwidth requirements, and signal extraction and enhancement performance.
- Time synchronization of distributed sensor nodes: We will develop waveform-based sampling clock synchronization algorithms that derive estimates of the sampling rate offsets from the observed acoustic signals. As an alternative for challenging acoustic environments we will achieve clock synchronization by exchanging time stamps over the wireless communication link.
- Acoustic signal extraction and enhancement from natural acoustic environments: Novel concepts for semi-blind signal extraction and enhancement will be developed that combine the generic potential of blind, especially ICA-based algorithms to extract signals from unspecified scenes with the efficiency and robustness of supervised algorithms for spatiotemporal filtering. The estimated information rendering the blind signal extraction algorithms "informed" will be represented and processed -- as much as possible and reasonable -- under the paradigm of Bayesian learning, such that it can be easily exchanged with the other tasks addressed in the RU.
- Learning and classification of acoustic events and scenes: We will develop unsupervised learning algorithms for acoustic event detection and scene classification to cater for a broad range of signals and absence of labeled training data. The issue of overlapping sounds will be approached from a signal processing perspective employing the aforementioned advanced signal extraction algorithms. Unsupervised feature learning techniques will be employed to find appropriate representations for a broad range of possible audio signals.
- Ensuring privacy: While we may assume that for the local exchange of audio data between adjacent nodes encryption schemes and protocols are available, the diffusion of data across larger
areas or into the cloud may pose more serious privacy concerns. Therefore, in this RU we investigate an approach that is based on the data-minimisation paradigm. We will investigate audio features that provide a scalable amount of information with respect to temporal, spectral, and spatial dimensions. While they should carry sufficient information for the signal analysis or classification task at hand, they will be tuned to be less informative about assumed private aspects. For example, suprasegmental features, which will be employed to classify an acoustic scene, will be designed to not allow the reconstruction of speech. Obviously, there is a tradeoff involved which constitutes the central part of our research question.
The research unit is divided into five projects:
- Project P1: Distributed Acoustic Signal Processing over Wireless Sensor Networks
- Project P2: Time-Synchronization for Coherent Digital Signal Processing in Wireless Acoustic Sensor Networks
- Project P3: Acoustic Signal Extraction and Enhancement
- Project P4: Scalable Audio Features for Clustering and Classification with Privacy Constraints
- Project P5: Unsupervised Acoustic Event Detection and Scene Classification over Sensor Networks