Aaron E. Rosenberg, Ivan Magrin-Chagnolleau, S. Parthasarathy and QianHuang.
Speaker Detection in Broadcast Speech Databases.
Proceedings of ICSLP 98, Sydney, Australia, 1998.

Abstract: Experiments have been carried out to assess the feasibility of
detecting target speaker segments in multi-speaker broadcast databases. The
experiemental database consists of NBC Nightly News broadcasts. The target
speaker is the news anchor, Tom Brokaw. Gaussian mixture models are constructed
from labelled training data for the target speaker as well as background models
for other speakers, commercials, and music. Four labelled 30-min. broadcasts
are used for testing. Mel-frequency cepstral features, augmented by delta
cepstral features are calculated over 20 msec. windows shifted every 10 msec.
through a broadcast. Likelihood ratio scores are calculated for each test frame
averaged over blocks of frames with a specified duration. The block scores are
input to a detection routine which returns estimates of target segments boundaries.
The range of best results obtained over the test broadcasts is 82% to 100%
detection of target segments with segment frame accuracy ranging from 86% to 95%.
0 to 2 false alarm segments are detected over each 30 min. broadcast.

Contact: ivan@ieee.org