A lightweight speaker verification approach for autonomous vehicles

Yousef Salah, Omar Shalash, Esraa Khatab

Abstract


Speaker verification is the process of verifying an individual’s identity by comparing their recorded voice samples with their test speech signals. Speaker verification has various practical applications, such as verifying customer identities in call centers, enabling contactless facility access, and supporting some medical applications. With the advances in autonomous vehicles, speaker verification has become an essential feature that provides security, access control, personalization, command authentication, driver monitoring, and compliance. Recent technological advancements have led to the rise of voice-based authentication systems, which are considered a more convenient alternative to traditional security systems. However, improving the accuracy is still an ongoing research aim. In this research, four different models were proposed and compared with previous work on speaker verification. The models are combinations of using two networks (BiLSTM and Transformer) with two different loss functions (Triplet and Quadruplet loss functions). The models are trained and tested on the LibriSpeech dataset. The results show improvements in equal error rate of the four proposed models over the previous models that used the Librispeech dataset with 0.068 compared to 0.11.

 

Received: 29 October 2024

Accepted: 03 December 2024

Published: 22 December 2024


Keywords


Autonomous Vehicles Security, Speaker Verification, Tranformer Network, BiLSTM Networm, Driver Personalization, Command Authentication

Full Text:

PDF

References


Gaurav, S. Bhardwaj, and R. Agarwal, “Two-Tier Feature Extraction with Metaheuristics-Based Automated Forensic Speaker Verification Model,” Electronics (Switzerland), vol. 12, no. 10, 2023, doi: 10.3390/electronics12102342.

O. Shalash and P. Rowe, “Computer-assisted robotic system for autonomous unicompartmental knee arthroplasty,” Alexandria Engineering Journal, vol. 70, 2023, doi: 10.1016/j.aej.2023.03.005.

Y. Lin et al., “Voxblink: A large scale speaker verification dataset on camera,” in ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, pp. 10271–10275, 2024.

M. Neelima and I. S. Prabha, “Optimized deep network based spoof detection in automatic speaker verification system,” Multimed Tools Appl, vol. 83, no. 5, 2024, doi: 10.1007/s11042-023-16127-w.

M. Jakubec, R. Jarina, E. Lieskovska, and P. Kasak, “Deep speaker embeddings for Speaker Verification: Review and experimental comparison,” Eng Appl Artif Intell, vol. 127, 2024, doi: 10.1016/j.engappai.2023.107232.

M. Yasser, O. Shalash, and O. Ismail, “Optimized Decentralized Swarm Communication Algorithms for Efficient Task Allocation and Power Consumption in Swarm Robotics,” Robotics, vol. 13, no. 5, p. 66, Apr. 2024, doi: 10.3390/robotics13050066.

E. Khatab, A. Onsy, M. Varley, and A. Abouelfarag, “A Lightweight Network for Real-Time Rain Streaks and Rain Accumulation Removal from Single Images Captured by AVs,” Applied Sciences (Switzerland), vol. 13, no. 1, 2023, doi: 10.3390/app13010219.

Q. Lin, L. Yang, X. Wang, X. Qin, J. Wang, and M. Li, “TOWARDS LIGHTWEIGHT APPLICATIONS: ASYMMETRIC ENROLL-VERIFY STRUCTURE FOR SPEAKER VERIFICATION,” in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2022. doi: 10.1109/ICASSP43922.2022.9746247.

H. Said et al., “Forearm Intravenous Detection and Localization for Autonomous Vein Injection Using Contrast-Limited Adaptive Histogram Equalization Algorithm,” Applied Sciences, vol. 14, no. 16, p. 7115, Aug. 2024, doi: 10.3390/app14167115.

A. Mittal and M. Dua, “Automatic speaker verification systems and spoof detection techniques: review and analysis,” Int J Speech Technol, vol. 25, no. 1, 2022, doi: 10.1007/s10772-021-09876-2.

H. Elsayed, N. S. Tawfik, O. Shalash, and O. Ismail, “Enhancing human emotion classification in human-robot interaction,” in 2024 International Conference on Machine Intelligence and Smart Innovation (ICMISI), pp. 1–6, 2024.

M. Elkholy, O. Shalash, M. S. Hamad, and M. S. Saraya, “Empowering the grid: A comprehensive review of artificial intelligence techniques in smart grids,” n 2024 International Telecommunications Conference (ITC-Egypt), pp. 531–518, 2024.

A. Abouelfarag, M. A. Elshenawy, and E. A. Khattab, “Accelerating sobel edge detection using compressor cells over FPGAs,” in Smart Technology Applications in Business Environments, 2017. doi: 10.4018/978-1-5225-2492-2.ch001.

E. Khatab, A. Onsy, and A. Abouelfarag, “Evaluation of 3D Vulnerable Objects’ Detection Using a Multi-Sensors System for Autonomous Vehicles,” Sensors, vol. 22, no. 4, 2022, doi: 10.3390/s22041663.

O. Shalash, “Design and development of autonomous robotic machine for knee arthroplasty,” 2018.

T. Zhou, Y. Zhao, and J. Wu, “ResNeXt and Res2Net Structures for Speaker Verification,” in 2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Proceedings, 2021. doi: 10.1109/SLT48900.2021.9383531.

I. M. Gaber, O. Shalash, and M. S. Hamad, “Optimized Inter-Turn Short Circuit Fault Diagnosis for Induction Motors using Neural Networks with LeLeRU,” in IEEE Conference on Power Electronics and Renewable Energy, CPERE 2023, 2023. doi: 10.1109/CPERE56564.2023.10119618.

M. Anjum and S. Shahab, “Improving Autonomous Vehicle Controls and Quality Using Natural Language Processing-Based Input Recognition Model,” Sustainability (Switzerland), vol. 15, no. 7, 2023, doi: 10.3390/su15075749.

E. Manfron, J. P. Teixeira, and R. Minetto, “Speaker recognition in door access control system,” in 3rd Symposium of Applied Science for, p. 8, 2023.

M. Andrade et al., “A Voice-Assisted Approach for Vehicular Data Querying from Automotive IoT-Based Databases,” in 2023 Symposium on Internet of Things, SIoT 2023, 2023. doi: 10.1109/SIoT60039.2023.10389856.

A. Khaled, O. Shalash, and O. Ismaeil, “Multiple Objects Detection and Localization using Data Fusion,” in 2023 2nd International Conference on Automation, Robotics and Computer Engineering (ICARCE), IEEE, Dec. 2023, pp. 1–6. doi: 10.1109/ICARCE59252.2024.10492609.

X. Zhou and Y. Zheng, “Research on Personality Traits of In-Vehicle Intelligent Voice Assistants to Enhance Driving Experience,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2023. doi: 10.1007/978-3-031-35678-0_15.

C. Wu, Z. Xu, L. Liu, and T. Yang, “A new driving style recognition method for personalized adaptive cruise control to enhance vehicle personalization,” Journal of Intelligent and Fuzzy Systems, vol. 46, no. 4, 2024, doi: 10.3233/JIFS-235045.

M. Deng et al., “Using voice recognition to measure trust during interactions with automated vehicles,” Appl Ergon, vol. 116, 2024, doi: 10.1016/j.apergo.2023.104184.

B. Rudrusamy, H. C. Teoh, J. Y. Pang, T. H. Lee, and S. C. Chai, “IoT-Based Vehicle Monitoring and Driver Assistance System Framework for Safety and Smart Fleet Management,” International Journal of Integrated Engineering, vol. 15, no. 1, 2023, doi: 10.30880/ijie.2023.15.01.035.

M. J. Roan, M. Beard, L. Neurauter, and M. Miller, “A Data Driven Approach to the Development and Evaluation of Acoustic Electric Vehicle Alerting Systems for Vision Impaired Pedestrians,” 2023.

S. Ayas, B. Donmez, and X. Tang, “Drowsiness Mitigation Through Driver State Monitoring Systems: A Scoping Review,” Human Factors: The Journal of the Human Factors and Ergonomics Society, vol. 66, no. 9, pp. 2218–2243, Sep. 2024, doi: 10.1177/00187208231208523.

W. Liu, Q. Li, Z. Wang, W. Wang, C. Zeng, and B. Cheng, “A Literature Review on Additional Semantic Information Conveyed from Driving Automation Systems to Drivers through Advanced In-Vehicle HMI Just Before, During, and Right After Takeover Request,” Int J Hum Comput Interact, vol. 39, no. 10, 2023, doi: 10.1080/10447318.2022.2074669.

R. Mohd Hanifa, K. Isa, and S. Mohamad, “A review on speaker recognition: Technology and challenges,” Computers and Electrical Engineering, vol. 90, 2021, doi: 10.1016/j.compeleceng.2021.107005.

A. M. Sharma, “Speaker Recognition Using Machine Learning Techniques,” San Jose State University, San Jose, CA, USA, 2019. doi: 10.31979/etd.fhhr-49pm.

S. Malik and F. A. Afsar, “Wavelet transform based automatic speaker recognition,” in 2009 IEEE 13th International Multitopic Conference, IEEE, Dec. 2009, pp. 1–4. doi: 10.1109/INMIC.2009.5383083.

S. Sujiya and E. Chandra, “A review on speaker recognition,” Int J Eng Technol, vol. 9, no. 3, pp. 1592–1598, 2017.

Y. A. Ibrahim, J. C. Odiketa, and T. S. Ibiyemi, “Preprocessing Technique in Automatic Speech Recognition For Human Computer Interaction: An Overview,” Anale. Seria Informatică, vol. 15, 2017.

A. Métwalli, M. H. Sallam, E. Khatab, and O. Shalash, “Polygraph-based truth detection system: Leveraging machine learning model on physiological and behavioral data using data fusion,” Available at SSRN 5031332.

N. Singh, R. Khan, and R. Shree, “Applications of speaker recognition,” Procedia Eng, vol. 38, pp. 3122–3126, 2012.

S. A. Imam, P. Bansal, and V. Singh, “Speaker recognition using automated systems,” AGU International Journal of Engineering and Technology (AGUIJET), vol. 5, pp. 31–39, 2017.

S. Joshi and M. Dua, “Noise robust automatic speaker verification systems: review and analysis,” Telecommun Syst, pp. 1–42.

R. Nisa and A. M. Baba, “A speaker identification-verification approach for noise-corrupted and improved speech using fusion features and a convolutional neural network,” International Journal of Information Technology, pp. 1–9, 2024.

S. Salim, S. Shahnawazuddin, and W. Ahmad, “Combined approach to dysarthric speaker verification using data augmentation and feature fusion,” Speech Commun, vol. 160, p. 103070, May 2024, doi: 10.1016/j.specom.2024.103070.

M. Balipa and A. Farhath, “Twins Voice Verification and Speaker Identification,” in International Conference on Artificial Intelligence and Data Engineering, AIDE 2022, 2022. doi: 10.1109/AIDE57180.2022.10060064.

V. Singh and N. Meena, “Engine Fault Diagnosis using DTW, MFCC and FFT,” in Proceedings of the First International Conference on Intelligent Human Computer Interaction, 2009. doi: 10.1007/978-81-8489-203-1_6.

Z. K. Abdul, “Kurdish speaker identification based on one dimensional convolutional neural network,” Computational Methods for Differential Equations, vol. 7, no. 4, 2019.

F. K.Faek and A. K. Al-Talabani, “Speaker Recognition from Noisy Spoken Sentences,” Int J Comput Appl, vol. 70, no. 20, 2013, doi: 10.5120/12182-8213.

R. Jahangir et al., “Text-Independent Speaker Identification through Feature Fusion and Deep Neural Network,” IEEE Access, vol. 8, 2020, doi: 10.1109/ACCESS.2020.2973541.

V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, “Librispeech: An ASR corpus based on public domain audio books,” in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2015. doi: 10.1109/ICASSP.2015.7178964.

M. Niu, L. He, Z. Fang, B. Zhao, and K. Wang, “Pseudo-Phoneme Label Loss for Text-Independent Speaker Verification,” Applied Sciences (Switzerland), vol. 12, no. 15, 2022, doi: 10.3390/app12157463.

A. Nagraniy, J. S. Chungy, and A. Zisserman, “VoxCeleb: A large-scale speaker identification dataset,” in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2017. doi: 10.21437/Interspeech.2017-950.

Q. Zheng, Z. Chen, H. Liu, Y. Lu, J. Li, and T. Liu, “MSRANet: Learning discriminative embeddings for speaker verification via channel and spatial attention mechanism in alterable scenarios,” Expert Syst Appl, vol. 217, 2023, doi: 10.1016/j.eswa.2023.119511.

M. K. Singh, “A text independent speaker identification system using ANN, RNN, and CNN classification technique,” Multimed Tools Appl, vol. 83, no. 16, pp. 48105–48117, Nov. 2023, doi: 10.1007/s11042-023-17573-2.

D. Chicco, “Siamese Neural Networks: An Overview,” in Methods in Molecular Biology, vol. 2190, 2021. doi: 10.1007/978-1-0716-0826-5_3.

J. BROMLEY et al., “SIGNATURE VERIFICATION USING A ‘SIAMESE’ TIME DELAY NEURAL NETWORK,” Intern J Pattern Recognit Artif Intell, vol. 07, no. 04, 1993, doi: 10.1142/s0218001493000339.

B. Logan, “Mel Frequency Cepstral Coefficients for Music Modeling,” International Symposium on Music Information Retrieval, vol. 28, 2000, doi: 10.1.1.11.9216.

Z. C. Lipton, J. Berkowitz, and C. Elkan, “A Critical Review of Recurrent Neural Networks for Sequence Learning,” arXiv preprint arXiv:1506.00019, 2015.

N. Mellor, The good, the bad and the irritating: A practical approach for parents of children who are attention seeking. The Good, the Bad and the Irritating, 2000.

A. Hermans, L. Beyer, and B. Leibe, “In defense of the triplet loss for person re-identification,” arXiv preprint arXiv:1703.07737, 2017.

W. Chen, X. Chen, J. Zhang, and K. Huang, “Beyond triplet loss: A deep quadruplet network for person re-identification,” in Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, 2017. doi: 10.1109/CVPR.2017.145.

R. Issa et al., “A Data-Driven Digital Twin of Electric Vehicle Li-Ion Battery State-of-Charge Estimation Enabled by Driving Behavior Application Programming Interfaces,” Batteries, vol. 9, no. 10, 2023, doi: 10.3390/batteries9100521.




DOI: http://dx.doi.org/10.21622/RIMC.2024.01.2.1112

Refbacks

  • There are currently no refbacks.


Copyright (c) 2024 Yousef Salah, Omar Shalash, Esraa Khatab


Robotics : Integration, Manufacturing and Control

E-ISSN: 3009-7967

P-ISSN: 3009-6987

 

Published by:

Academy Publishing Center (APC)

Arab Academy for Science, Technology and Maritime Transport (AASTMT)

Alexandria, Egypt

rimc@aast.edu