Acoustic Source Localisation in constrained environments
Abstract
Acoustic Source Localisation (ASL) is a problem with real-world applications
across multiple domains, from smart assistants to acoustic detection and tracking.
And yet, despite the level of attention in recent years, a technique for rapid and
robust ASL remains elusive – not least in the constrained environments in which
such techniques are most likely to be deployed.
In this work, we seek to address some of these current limitations by presenting
improvements to the ASL method for three commonly encountered constraints: the
number and configuration of sensors; the limited signal sampling potentially available;
and the nature and volume of training data required to accurately estimate Direction
of Arrival (DOA) when deploying a particular supervised machine learning technique.
In regard to the number and configuration of sensors, we find that accuracy can be
maintained at state-of-the-art levels, Steered Response Power (SRP), while reducing
computation sixfold, based on direct optimisation of well known ASL formulations.
Moreover, we find that the circular microphone configuration is the least desirable
as it yields the highest localisation error.
In regard to signal sampling, we demonstrate that the computer vision inspired
algorithm presented in this work, which extracts selected keypoints from the signal spectrogram, and uses them to select signal samples, outperforms an audio
fingerprinting baseline while maintaining a compression ratio of 40:1.
In regard to the training data employed in machine learning ASL techniques,
we show that the use of music training data yields an improvement of 19% against
a noise data baseline while maintaining accuracy using only 25% of the training
data, while training with speech as opposed to noise improves DOA estimation by
an average of 17%, outperforming the Generalised Cross-Correlation technique by
125% in scenarios in which the test and training acoustic environments are matched.