Song Scope軟件在通過聲音自動(dòng)識(shí)別動(dòng)物物種中的應(yīng)用
Abstract
Commercially available autonomous recorders for monitoring vocal wildlife populations such as birds and frogs now make it possible to collect thousands of hours of audio data in a field season. Given limited resources, it is not practical to manually review this volume of data “by ear”. The automatic processing of sound recordings to detect and identify specific species from their vocalizations, even if not perfectly accurate, makes efficient use of researchers who review only those samples most likely to contain vocalizations of interest. This results in significant gains of sample coverage, operating efficiency, and cost savings.
Developing generalized computer algorithms capable of accurate species identification in real-world field conditions is full of difficult challenges. First, recordings made by autonomous recorders typically receive sounds from all directions, scattered and reflected by trees, obscured by an unpredictable constellation of random noise, wind, rustling leaves, airplanes, road traffic, and other species of birds, frogs, insects and mammals. Second, the vocalizations of many species are highly varied from one individual to the next. Any algorithm must be prepared to accept vocalizations that are similar, but not identical, to known references in order to successfully detect the previously unobserved individual. However, in so doing, the algorithm is then susceptible to misclassifying a vocalization from a different species with similar components. This is especially true for species with narrowband vocalizations lacking distinctive spectral properties and in species with short duration vocalizations lacking distinctive temporal properties.
The bulk of prior research has generally differentiated among only a handful of simple mono-syllabic vocalizations at a time. While the results have been promising, we found that many approaches degrade significantly as the number of species increases, especially when more complex multi-syllabic and highly variable vocalizations are also included.
In this paper, we discuss an algorithm based on Hidden Markov Models automatically constructed so as to consider not just the spectral and temporal features of individual syllables, but also how syllables are organized into more complex songs. Additionally, several techniques are employed to reduce the effects of noise present in recordings made by autonomous recorders.
摘要:
用于監(jiān)測(cè)鳥類和青蛙等有聲野生動(dòng)物種群的商用自動(dòng)記錄儀現(xiàn)在可以在野外季節(jié)收集數(shù)千小時(shí)的音頻數(shù)據(jù)。鑒于資源有限,“憑耳朵”手動(dòng)審查如此大量的數(shù)據(jù)是不切實(shí)際的。自動(dòng)處理錄音以從特定物種的叫聲中檢測(cè)和識(shí)別它們,即使不是完全準(zhǔn)確,也能有效地利用只審查最有可能包含感興趣叫聲的樣本的研究人員。這大大提高了樣本覆蓋率、運(yùn)營(yíng)效率和成本節(jié)約。
開發(fā)能夠在現(xiàn)實(shí)世界的野外條件下準(zhǔn)確識(shí)別物種的通用計(jì)算機(jī)算法充滿了艱巨的挑戰(zhàn)。首先,自動(dòng)錄音機(jī)的錄音通常會(huì)接收來自各個(gè)方向的聲音,這些聲音被樹木散射和反射,被不可預(yù)測(cè)的隨機(jī)噪聲、風(fēng)、沙沙作響的樹葉、飛機(jī)、道路交通和其他鳥類、青蛙、昆蟲和哺乳動(dòng)物的星座所掩蓋。其次,許多物種的叫聲因個(gè)體而異。任何算法都必須準(zhǔn)備好接受與已知參考相似但不完全相同的發(fā)音,以便成功檢測(cè)到以前未觀察到的個(gè)體。然而,在這樣做的過程中,該算法很容易對(duì)來自具有相似成分的不同物種的發(fā)音進(jìn)行錯(cuò)誤分類。對(duì)于缺乏獨(dú)特光譜特性的窄帶發(fā)聲物種和缺乏獨(dú)特時(shí)間特性的短時(shí)發(fā)聲物種來說尤其如此。
之前的大部分研究通常一次只區(qū)分了少數(shù)簡(jiǎn)單的單音節(jié)發(fā)音。雖然結(jié)果很有希望,但我們發(fā)現(xiàn),隨著物種數(shù)量的增加,許多方法會(huì)顯著退化,特別是當(dāng)還包括更復(fù)雜的多音節(jié)和高度可變的發(fā)音時(shí)。
本文討論了一種基于隱馬爾可夫模型的自動(dòng)構(gòu)建算法,該算法不僅考慮了單個(gè)音節(jié)的頻譜和時(shí)間特征,還考慮了音節(jié)如何組織成更復(fù)雜的歌曲。此外,還采用了幾種技術(shù)來減少自主錄音機(jī)錄制的錄音中存在的噪聲的影響。
關(guān)鍵詞:Song Scope軟件,聲音采集軟件,野生動(dòng)物聲音監(jiān)測(cè),鳥鳴監(jiān)測(cè)記錄