Ivan Magrin-Chagnolleau, Aaron E. Rosenberg and S. Parthasarathy.
Detection of Target Speakers in Audio Databases.
Proceedings of ICASSP 99, Phoenix, Arizona, 1999.
Abstract: The problem of speaker detection in audio databases is addressed
in this paper. Gaussian mixture modeling is used to build target speaker
and background models. A detection algorithm based on a likelihood ratio
calculation is applied to estimate target speaker segments. Evaluation
procedures are defined in detail for this task. Results are given for
different subsets of the HUB4 broadcast news database.
For one target speaker, with the data restricted to high quality
speech segments, the segment miss rate is approximately 7\%.
For unrestricted data, the segment miss rate is approximately 27\%.
In both cases the segment false alarm rate is 4 or 5 per hour.
For two target speakers with unrestricted data, the segment miss
rate is approximately 63\% with about 27 segment false alarms
per hour. The decrease in performance for two target speakers is largely
associated with short speech segments in the two target speaker
test data which are undetectable in the current configuration of
the detection algorithm.
Contact: ivan@ieee.org