Vocal and Non-vocal Segmentation based on the Analysis of Formant Structure

Murthy, Y.V.S.; Koolagudi, S.G.; Swaroop, V.G.

Please use this identifier to cite or link to this item: https://idr.l1.nitk.ac.in/jspui/handle/123456789/6906

Title:	Vocal and Non-vocal Segmentation based on the Analysis of Formant Structure
Authors:	Murthy, Y.V.S. Koolagudi, S.G. Swaroop, V.G.
Issue Date:	2018
Citation:	2017 9th International Conference on Advances in Pattern Recognition, ICAPR 2017, 2018, Vol., , pp.304-309
Abstract:	The process of classifying vocal and non-vocal regions in an audio clip is the base for many Music Information Retrieval (MIR) tasks. In this work, we have computed novel features based on formant structure for segmenting the vocal and non-vocal regions of a given music clip. The features such as obtuse angles at formant peak, valley locations, convexity, and concavity have been proposed for this task after thorough analysis. The obtuse angles have been computed for second, third and fourth formants as much discrimination is not found for the first formant. The computed formant related features have been added to the base-line Mel frequency cepstral coefficients (MFCCs) in order to improve the performance. Moreover, singer formant (F5) has also been computed forming a 19-dimensional feature vector. As artificial neural networks (ANNs) are more suitable for handling nonlinear data, they have been considered as a classifier. Further, the 11-point moving window has been applied to avoid intermittent misclassifications. An accuracy of 88% is obtained using the proposed approach with a 19-dimensional feature vector. � 2017 IEEE.
URI:	http://idr.nitk.ac.in/jspui/handle/123456789/6906
Appears in Collections:	2. Conference Papers

Files in This Item:

There are no files associated with this item.

Show full item record