The AV16.3 corpus is an audio-visual corpus of 43 real indoor multispeaker recordings, designed to test algorithms for audio-only, video-only and audio-visual speaker localization and tracking. Real human speakers were used. The variety of recordings was chosen to test algorithms to their limits, and to cover a wide range of applicative scenarii (meetings, surveillance). The emphasis is on overlapped speech and multiple moving speakers. Recordings include mostly dynamic scenarii, with single and multiple moving speakers. A few meeting scenarii, with mostly seated speakers, are also included. More
2011-06-28