大理大学学报 ›› 2025, Vol. 10 ›› Issue (6): 38-46.

• 物理学 • 上一篇    下一篇

基于MobileViT的多特征融合鸟鸣声识别方法

  

  1. (1.大理大学工程学院,云南大理 671003; 2.大理大学东喜玛拉雅研究院,云南大理 671003;
    3.中国科学院生态环境研究中心区域与城市生态安全全国重点实验室,北京 100085)
  • 出版日期:2025-06-15 发布日期:2025-06-24
  • 通讯作者: 赵恩铭,教授,博士,E-mail:zhaoem163@163.com。
  • 作者简介:罗创,硕士研究生,主要从事深度学习算法、声纹识别技术研究。
  • 基金资助:
    国家自然科学基金项目(62065001);云南省中青年学术和技术带头人后备人才项目(202205AC160001);云
    南省教育厅科学研究基金项目(2024Y852;2024Y847)

A Multi-Feature Fusion Bird Song Recognition Method Based on MobileViT

  1. (1. College of Engineering, Dali University, Dali, Yunnan 671003, China; 2. Institute of Eastern-Himalaya Biodiversity Research,
    Dali University, Dali, Yunnan 671003, China; 3. State Key Laboratory of Regional and Urban Ecology, Research Center for
    Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China)
  • Online:2025-06-15 Published:2025-06-24

摘要: 目的:鸟类多样性监测是一项重大挑战,鸟类更容易被听到而不是看到。为提高鸟鸣声识别的准确性,提出了一种基
于MobileViT的多特征融合鸟鸣声识别方法。方法:以北京百鸟数据库(BirdsData)为研究对象,从预处理后的鸟鸣声信号中提
取3种不同的语谱图样本集,分别作为输入训练3个基于MobileViT的单一特征模型。最后,对3个单一特征模型进行自适应
加权特征融合。结果:多特征融合模型在语谱图测试集上的识别准确率达到97.57%,较单一特征模型提升1.77%~2.24%。
结论:从相同鸟鸣声信号中提取的不同语谱图表征鸟鸣声的特征存在差异,而多特征融合模型可以从中学习到更为广泛的信
息,从而显著提高识别准确率。

关键词: 鸟鸣声识别, MobileViT, 多特征融合

Abstract: Objective: Monitoring bird diversity is a major challenge, as birds are more easily heard rather than seen. To improve the
accuracy of bird song recognition, a multi-feature fusion bird song recognition method based on MobileViT is proposed. Methods:
Using the Beijing BirdsData database as the research object, three different spectrogram sample sets were extracted from preprocessed bird song signals and used as inputs to train three single feature models based on MobileViT. Finally, adaptive weighted feature fusion on the three single feature models was performed. Results: The recognition accuracy of the multi-feature fusion model on the spectrogram test set reached 97.57%, with an improvement of 1.77%-2.24% compared to the single feature model. Conclusion: Different spectrograms extracted from the same bird song signal exhibit differences in the characteristics of bird song, and multifeature fusion models can learn more extensive information from them, thereby significantly improving recognition accuracy.

Key words: bird song recognition, MobileViT, multi-feature fusion

中图分类号: