Research on multimodal identity authentication based on voiceprint recognition technology

To overcome the limitations of unimodal biometric authentication—such as low recognition accuracy and vulnerability to spoofing in noisy or adversarial environments—this paper proposes a novel multimodal authentication framework based on a Cross-modal Attention-based Voiceprint and Face Fusion (CAVF) mechanism. The core innovation lies in a cross-modal self-attention module that enables adaptive alignment and dynamic fusion of heterogeneous features from voiceprint and facial inputs. Unlike traditional fusion schemes, this method introduces an integrated liveness detection and spoofing attack recognition module based on a CNN architecture, which enhances system security against replay, synthesis, and cross-modal forgeries. The entire system includes four functional modules: deep biometric feature extraction, attention-driven fusion, anti-spoof detection, and decision optimization. Experiments conducted on benchmark datasets—VoxCeleb and CASIA-WebFace—under multiple noise and attack conditions demonstrate that CAVF achieves superior recognition accuracy (92.56%), a lower Equal Error Rate (4.25%), and improved AUC values, significantly outperforming unimodal and conventional fusion baselines. These results confirm the robustness and effectiveness of the proposed approach, providing a feasible and scalable solution for high-security identity verification in real-world deployments.

Leave a Reply Cancel reply