This study determined which feature set can best classify heartbeat auditory data using Gaussian Naïve Bayes and K-Nearest Neighbor classifiers. Different feature sets (Low-level, Mel-frequency Cepstral Coefficients, or their combination) were used with the selected machine learning techniques. Results were evaluated using both micro- (for each class) and macro-averaging (across all classes) of precision, recall, and f-score. The balanced accuracy of the trials for each feature set was also measured. Two data sets used (Set A contains 84 data, and Set B with 432 data) were processed separately with a partition of 80:20 (training:testing). For data set A, Naïve Bayes with MFCC feature set garnered the highest macro-averages of recall (40%) and balanced accuracy (38.9%), while the same method with combined feature sets resulted in the highest precision (57.3%) and f-score (42.3%). For data set B, combined feature sets used on Naïve Bayes resulted in the highest macro-averages of precision (62.69%) and balanced accuracy (49.97%), while KNN with low-level feature set resulted in the highest recall (61.54%) and f-score (59.38%). The results show that Naïve Bayes and feature set 3 garnered the highest macro-averages because the combination of the low-level and MFCC features worked well with statistical approach. In the case of KNN as an unsupervised learning method, creating clusters from identified similarities was easier with low-level characteristics.
Keywords: Machine learning techniques for classifying heartbeat, feature extraction from auditory data, multi-class metrics, feature set comparison, LibROSA[1] Bentley, P., Nordehn, G., Coimbra, M., & Mannor, S. [n. d.]. (2011). The PASCAL Classifying Heart Sounds Challenge 2011 (CHSC2011) Results. Retrieved from http://www.peterjbentley.com/heartchallenge/index.html.
[2] Dangeti, P. (2017). Statistics for Machine Learning: Techniques for Exploring Supervised, Unsupervised, and Reinforcement Learning Models with Python and R. Birmingham, UK: Packt Publishing.
[3] Herrero, G., Gotchev, A., Christov, I., & Egiazarian, K. (2005). Feature extraction for heartbeat classification using independent component analysis and matching pursuits. In Acoustics, Speech, and Signal Processing Proceedings (ICASSP’05). IEEE International Conference, IEEE, 4, 725.
[4] Kumar, D., Carvalho, P., Antunes, M., Paiva, R., & Henriques, J. (2010). Heart murmur classification with feature selection. In Engineering in Medicine and Biology Society (EMBC), 2010 Annual International Conference of the IEEE. IEEE, 4566–4569.
[5] Lin, C., & Yang, C. (2014). Heartbeat classification using normalized RR intervals and morphological features. Mathematical Problems in Engineering, 2014, Article ID 712474, DOI http://dx.doi.org/10.1155/2014/712474.
[6] McFee, B., Raffel, C., Liang, D., Ellis, D., McVicar, M., Battenberg, E., & Nieto, O. (2015). Librosa: audio and music signal analysis in python. In Proceedings of the 14th Python in Science Conference, 18–25.
[7] Mierswa, I. & Morik, K. (2005). Automatic feature extraction for classifying audio data. Machine learning, 58(2-3), 127–149.
[8] Nilsson, N. (1998). Introduction to Machine Learning. Retrieved from https://ai.stanford.edu/~nilsson/MLBOOK.pdf
[9] Singh, M., & Cheema, A. (2013). Heart sounds classification using feature extraction of phonocardiography signal. International Journal of Computer Applications, 77(4), 13-17.
[10] Sokolova, M., & Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing & Management, 45(4), 427–437.