This paper proposed a study that will assess different machine learning techniques in classifying tweets. There are four machine learning techniques that will be subjected to testing using the same set of data namely: Naive Bayes, Linear Support Vector Classifier, Stochastic Gradient Descent Classifier and Logistic Regression. It is always a challenge to identify which machine learning model will give the most efficient performance in sentiment analysis. The main objective of this paper is to find the best machine learning technique for sentiment analysis in English, Filipino and Taglish languages. The said models will be integrated to Twitter’s API for the collection of twitter data which will be subjected to data preprocessing to make the tweets analyzable and then feature extraction was done using Natural Language Processing. The performance scores of each machine learning algorithm has been computed. The four algorithms: Support Vector Classifier, Stochastic Gradient Descent, Naive Bayes and Logistic Regression were used for machine learning with an accuracy of 69%, 71%, 77%, and 81% respectively. The Logistic Regression Model has the highest accuracy and best fitted algorithm for prediction of potential mental health crisis tweets.
Keywords: Sentiment Analysis, Machine Learning Algorithms, Twitter, Tweepy, Mental Health Crisis[1] H. Saif, Y. He, M. Fernandez and H. Alani, “Semantic Patterns for Sentiment Analysis of Twitter,” The Semantic Web – ISWC 2014, Springer International Publishing, p. 324–340, 2014.
[2] F.Neri, C.Aliprandi, F.Capeci, M.Cuadros and T.By, ““Sentiment Analysis on Social Media”,” IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 919-929, 2012.
[3] P. . Domingos, “A few useful things to know about machine learning,” Communications of The ACM, vol. 55, no. 10, pp. 78-87, 2012.
[4] A. Aquino, Ma. Veronica Bautista, C. Diaz, I. Valenzuela and E. Dadios, “A Vision-Based Closed Spirulina (A. Platensis) Cultivation System with Growth Monitoring using Artificial Neural Network,” 2018 IEEE 10th International Conference on Humanoid, Nanotechnology, Information Technology,Communication and Control, Environment and Management (HNICEM), pp. 1-5, 2018.
[5] I. Valenzuela, R. Baldovino, A. Bandala and E. Dadios, “Pre-Harvest Factors Optimization Using Genetic Algorithm for Lettuce,” Journal of Telecommunication, Electronic and Computer Engineering (JTEC), pp. 1-4, 2018.
[6] A. U. Aquino, M. E. M. Fernandez, A. P. Guzman, A. A. Matias, I. C. Valenzuela and E. P. Dadios, “ An Artificial Neural Network (ANN) Model for the Cell Density Measurement of Spirulina (A. platensis),” 2018 IEEE 10th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment and Management (HNICEM), pp. 1-5, 2018.
[7] P. J. M. Loresco, I. C. Valenzuela and E. P. Dadios, “Color space analysis using KNN for lettuce crop stages identification in smart farm setup,” TENCON 2018-2018 IEEE Region 10 Conference , pp. 2040-2044, 2018.
[8] F. Y. Osisanwo, J. E. T. Akinsola, O. Awodele, J. O. Hinmikaiye, O. Olakanmi and J. Akinjobi, “Supervised machine learning algorithms: classification and comparison,” International Journal of Computer Trends and Technology (IJCTT), vol. 3, no. 48, pp. 128-138, September 2017.
[9] H. Uysal, A Genetic Programming Approach to Classification Problems, GRIN Verlag, 2016.
[10] L. Bottou, “Large-scale machine learning with stochastic gradient descent,” in Proceedings of COMPSTAT 2010, Physica-Verlag HD, 2010.
[11] V. . Bewick, L. . Cheek and J. . Ball, “Statistics review 14: Logistic regression,” Critical Care, vol. 9, no. 1, pp. 112-118, 2005.
[12] Z. . Minchen, W. . Weizhi, L. . Binghan and H. . Jingshan, “The Sigmoid function, where it is clearly demonstrated that the critical value range is [−5, 5].,” PLOS ONE, vol. , no. , p. , 2013.
[13] G. P.-S. C. M. W.J. Frawley, “Knowledge discovery in databases: An overview,” Knowledge Discovery in Databases, , pp. 1-27, 1991.
[14] L. Robinson and M. Smith, "Social Media and Mental Health - HelpGuide.org," 2020. [15] B.-G. . Hu and W. . Dong, “A study on cost behaviors of binary classification measures in class-imbalanced problems,” arXiv: Learning, vol. , no. , p. , 2014.
[16] M. Carbonero-Ruz, F. J. Martínez-Estudillo, F. FernándezNavarro, D. Becerra-Alonso and A. C. MartínezEstudillo, “A two dimensional accuracy-based measure for classification performance,” Information Sciences, vol. 382, no. , pp. 60-80, 2017.
[17] I. . Visentini, L. . Snidaro and G. L. Foresti, “Diversityaware classifier ensemble selection via f-score,” Information Fusion, vol. 28, no. , pp. 24-43, 2016.
[18] I. Rish, "An empirical study of the naive Bayes classifier," 2001.
[19] J. Shen, S. Zhao, Y. Yao, Y. Wang and L. Feng, “A novel depression detection method based on pervasive EEG and EEG splitting criterion,” Proceedings - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017, Vols. 2017-January, pp. 1879- 1886, 2017.
[20] D. e. a. Mittal, “An Effective Hybridized Classifier for Breast Cancer Diagnosis 2015,” IEEE International Conference on Advanced Intelligent Mechatronics (AIM), 2015.
[21] P. e. a. Peduzzi, “A Simulation Study of the Number of Events per Variable in Logistic Regression Analysis,” Journal of Clinical Epidemiology, vol. 49, p. 1373–1379, 1996.
[22] A. Shelar and C.-y. Huang, “Analyzing relationship: twitter tweet frequency with the stock prices of telecom companies,” Journal of Computing Sciences in Colleges, vol. 34, no. 3, pp. 129-129, 2019.
[23] W. Gordon, "Understanding OAuth: What Happens When You Log Into a Site with Google, Twitter, or Facebook," 2020.
[24] A. . PappuRajan and S. P. Victor, “Web Sentiment Analysis for Scoring Positive or Negative Words using Tweeter Data,” International Journal of Computer Applications, vol. 96, no. 6, pp. 33-37, 2014.
[25] T. . Singh and M. . Kumari, “Role of Text Pre-processing in Twitter Sentiment Analysis,” Procedia Computer Science, vol. 89, no. , pp. 549-554, 2016.
[26] J. Yi, T. Nasukawa, R. Bunescu and W. Niblack, “Sentiment analyzer: extracting sentiments about a given topic using natural language processing techniques,” Proceeding of ICDM-03, the 3ird IEEE International Conference on Data Mining, pp. 427-434, 2003.
[27] S. M. Kamruzzaman and C. M. Rahman, “Text Categorization using Association Rule and Naive Bayes Classifier,” arXiv: Information Retrieval, vol. , no. , p. , 2010.
[28] G. M. Fung, O. L. Mangasarian and J. W. Shavlik, “Knowledge-Based Support Vector Machine Classifiers,” Advances in neural information processing systems, pp. 537-544, 2002.
[29] F. Kabir, S. A. Siddique, M. R. A. Kotwal and M. N. Huda, “Bangla text document categorization using Stochastic Gradient Descent (SGD) classifier,” 2015 International Conference on Cognitive Computing and Information Processing (CCIP), pp. 1-4, 2015.
[30] P. Xu, F. Davoine and T. Denoeux, “Evidential Logistic Regression for Binary SVM Classifier Calibration,” In International Conference on Belief Functions, pp. 49- 57, 2014.