Deep learning has made significant progress in a wide range of application areas. An important contributing factor has been the availability of increasingly larger datasets and models. However, a downside of this trend is that training state-of-the-art models has also become increasingly expensive, leading to environmental concerns and accessibility issues for some practitioners. Additionally, directly reusing pre-trained models can result in performance degradation when facing distribution shifts during deployment. Researchers have explored Source-Free Domain Adaptation (SFDA) to address these challenges. This technique adapts pre-trained models to new target domains without access to the original training data. This article focuses on the problem of SFDA and introduces a novel method, NOTELA, designed to tackle distribution shifts in the audio domain, specifically in bioacoustics.
The bioacoustics dataset (XC) is widely used for bird species classification, includes:
- Both focal recordings.
- Targeting individual birds in natural conditions.
- Soundscape recordings were obtained through omnidirectional microphones.
It poses unique challenges, as soundscape recordings have a lower signal-to-noise ratio, multiple birds vocalizing simultaneously, and significant distractors like environmental noise. Furthermore, soundscape recordings are collected from different geographical locations, leading to extreme label shifts since only a small subset of species in XC may appear in a specific area. Additionally, both the source and target domains exhibit class imbalance, and the problem is a multi-label classification task due to the presence of multiple bird species within each recording.
In this study, Google researchers first evaluate several existing SFDA methods on the bioacoustics dataset, including entropy minimization, pseudo-labeling, denoising teacher-student, and manifold regularization. The evaluation results show that while these methods have demonstrated success in traditional vision tasks, their performance in bioacoustics varies significantly. In some cases, they perform worse than having no adaptation at all. This result highlights the need for specialized methods to handle the bioacoustics domain’s unique challenges.
To address this limitation, the researchers propose a new and innovative method named NOisy student TEacher with Laplacian Adjustment (NOTELA). This novel approach combines principles from denoising teacher-student (DTS) methods and manifold regularization (MR) techniques. NOTELA introduces a mechanism for adding noise to the student model (inspired by DTS) while enforcing the cluster assumption in the feature space (similar to MR). This combination helps stabilize the adaptation process and enhances the model’s generalizability across different domains. The method leverages the model’s feature space as an additional source of truth, allowing it to succeed in the challenging bioacoustics dataset and achieve state-of-the-art performance.
In the bioacoustics domain, NOTELA demonstrated substantial improvements over the source model and outperformed other SFDA methods across multiple test target domains. It achieved impressive mean average precision (mAP) and class-wise mean average precision (cmAP) values, standard metrics for multi-label classification. Its notable performances on various target domains, such as S. Nevada (mAP 66.0, cmAP 40.0), Powdermill (mAP 62.0, cmAP 34.7), and SSW (mAP 67.1, cmAP 42.7), highlight its effectiveness in handling the challenges of the bioacoustics dataset.
In the context of vision tasks, NOTELA consistently demonstrated strong performance, outperforming other SFDA baselines. It achieved notable top-1 accuracy results on various vision datasets, including CIFAR-10 (90.5{29fe85292aceb8cf4c6c5bf484e3bcf0e26120073821381a5855b08e43d3ac09}) and S. Nevada (73.5{29fe85292aceb8cf4c6c5bf484e3bcf0e26120073821381a5855b08e43d3ac09}). Although it showed slightly lower performance on ImageNet-Sketch (29.1{29fe85292aceb8cf4c6c5bf484e3bcf0e26120073821381a5855b08e43d3ac09}) and VisDA-C (43.9{29fe85292aceb8cf4c6c5bf484e3bcf0e26120073821381a5855b08e43d3ac09}), NOTELA’s overall effectiveness and stability in handling the SFDA problem across bioacoustics and vision domains are evident.
The above figure shows the evolution of test mean average precision (mAP) for multi-label classification on six soundscape datasets. It compares NOTELA and Dropout Student (DS) with SHOT, AdaBN, Tent, NRC, DUST, and Pseudo-Labelling, demonstrating that NOTELA is the only method that consistently improves the source model, setting it apart.
Overall, this research highlights the importance of considering different modalities and problem settings when evaluating and designing SFDA methods. The authors propose the bioacoustics task as a valuable avenue for studying SFDA. It emphasizes the need for consistent and generalizable performance, especially without domain-specific validation data. Their findings suggest that NOTELA emerges as a compelling baseline for SFDA, showcasing its ability to deliver reliable performance across diverse domains. These valuable insights open new doors for advancing SFDA techniques and enabling more effective and versatile deep-learning applications.
Check out the Paper and Google Blog. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 27k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
Madhur Garg is a consulting intern at MarktechPost. He is currently pursuing his B.Tech in Civil and Environmental Engineering from the Indian Institute of Technology (IIT), Patna. He shares a strong passion for Machine Learning and enjoys exploring the latest advancements in technologies and their practical applications. With a keen interest in artificial intelligence and its diverse applications, Madhur is determined to contribute to the field of Data Science and leverage its potential impact in various industries.