TY - JOUR
T1 - Robust query-by-singing/humming system against background noise environments
AU - Kim, Kichul
AU - Park, Kang Ryoung
AU - Park, Sung Joo
AU - Lee, Soek Pil
AU - Kim, Moo Young
PY - 2011/5
Y1 - 2011/5
N2 - Under background noise environments, the performance of the Query-by-Singing/Humming (QbSH) system is considerably degraded. Since human pitch information is used as a feature vector for the QbSH system, a noise robust pitchestimation algorithm is inevitable. Thus, a novel pitch-estimation method is proposed by integrating temporal-autocorrelation and spectral-salience methods. As a pre-processing block, spectral smoothing is applied to enhance the stationarity of the noisy input signal. To calculate the similarity between the MIDI database and input humming signal, the dynamic time warping (DTW) algorithm is used. Jang's corpus and AURORA2 database are selected as humming and background noise signals, respectively. Compared with the standard pitch estimation algorithm in the ITU-T G.729 speech codec, the proposed pitch estimation method improves the average accuracy by 11.7% for the 0 dB signal-to-noise ratio (SNR) noise case. It also improves top-20 ratio and mean reciprocal rank (MRR) of the proposed QbSH system, on average, by 7.4% and 0.13, respectively.
AB - Under background noise environments, the performance of the Query-by-Singing/Humming (QbSH) system is considerably degraded. Since human pitch information is used as a feature vector for the QbSH system, a noise robust pitchestimation algorithm is inevitable. Thus, a novel pitch-estimation method is proposed by integrating temporal-autocorrelation and spectral-salience methods. As a pre-processing block, spectral smoothing is applied to enhance the stationarity of the noisy input signal. To calculate the similarity between the MIDI database and input humming signal, the dynamic time warping (DTW) algorithm is used. Jang's corpus and AURORA2 database are selected as humming and background noise signals, respectively. Compared with the standard pitch estimation algorithm in the ITU-T G.729 speech codec, the proposed pitch estimation method improves the average accuracy by 11.7% for the 0 dB signal-to-noise ratio (SNR) noise case. It also improves top-20 ratio and mean reciprocal rank (MRR) of the proposed QbSH system, on average, by 7.4% and 0.13, respectively.
KW - Query-by-Singing/Humming
KW - background noise
KW - dynamic time warping
KW - pitch estimation
UR - http://www.scopus.com/inward/record.url?scp=79960895771&partnerID=8YFLogxK
U2 - 10.1109/TCE.2011.5955213
DO - 10.1109/TCE.2011.5955213
M3 - Article
AN - SCOPUS:79960895771
SN - 0098-3063
VL - 57
SP - 720
EP - 725
JO - IEEE Transactions on Consumer Electronics
JF - IEEE Transactions on Consumer Electronics
IS - 2
M1 - 5955213
ER -