Abstract
This paper proposes a real-time person activity detection framework operating in presence of multiple sources in reverberated environments. Such a framework is composed by two main parts: The speech enhancement front-end and the activity detector. The aim of the former is to automatically reduce the distortions introduced by room reverberation in the available distant speech signals and thus to achieve a significant improvement of speech quality for each speaker. The overall front-end is composed by three cooperating blocks, each one fulfilling a specific task: Speaker diarization, room impulse responses identification, and speech dereverberation. In particular, the speaker diarization algorithm is essential to pilot the operations performed in the other two stages in accordance with speakers' activity in the room. The activity estimation algorithm is based on bidirectional Long Short-Term Memory networks which allow for context-sensitive activity classification from audio feature functionals extracted via the real-time speech feature extraction toolkit openSMILE. Extensive computer simulations have been performed by using a subset of the AMI database for activity evaluation in meetings: Obtained results confirm the effectiveness of the approach.
Original language | English |
---|---|
Pages (from-to) | 386-397 |
Number of pages | 12 |
Journal | Cognitive Computation |
Volume | 4 |
Issue number | 4 |
DOIs | |
State | Published - Dec 2012 |
Keywords
- Activity detection
- Blind channel identification
- Real-time signal processing
- Speaker diarization
- Speech dereverberation
- Speech enhancement