Mitigating Corpus Bias in Speech Emotion Recognition: A Robust Hybrid Framework using Generalization-Aware Metaheuristic Feature Selection

Chaugule, Irfan and R Sankaye, Dr. Satish (2025) Mitigating Corpus Bias in Speech Emotion Recognition: A Robust Hybrid Framework using Generalization-Aware Metaheuristic Feature Selection. International Journal of Innovative Science and Research Technology, 10 (6): 25jun755. pp. 799-807. ISSN 2456-2165

[A][B][+][-]

Abstract

A formidable challenge impeding the real-world deployment of Speech Emotion Recognition (SER) systems is the problem of corpus bias. Models trained on a specific speech dataset often experience a significant degradation in performance when tested on new, unseen data, which may differ in language, speaker demographics, recording conditions, and emotional expression styles. This lack of generalization severely limits the practical applicability of SER technology. This paper proposes a novel hybrid framework specifically designed to enhance cross-corpus robustness by integrating deep learning for feature extraction with a sophisticated, generalization-aware metaheuristic for feature selection. We posit that while deep learning models, particularly those pre-trained on large-scale data (e.g., HuBERT, Wav2Vec2), can learn powerful and abstract feature representations, these features may still retain biases from their training data. Our core contribution is the design of a metaheuristic feature selection process guided by a novel fitness function that explicitly optimizes for generalization. This function evaluates candidate feature subsets not only on their accuracy on a source validation set but also on their performance stability across multiple, diverse validation sets, thereby promoting the selection of features that are invariant to inter-dataset variations. We outline a rigorous cross-corpus experimental protocol using datasets with diverse characteristics (e.g., IEMOCAP, EMO-DB, RAVDESS) to demonstrate the framework's ability to mitigate performance drop in cross-language and cross-condition scenarios. This research aims to provide a new pathway towards developing truly robust SER systems that can maintain reliable performance in the varied and unpredictable acoustic environments of the real world.

Documents