Leveraging Machine Learning Models in Developing a Web-Based Multilingual Hate Speech Detection System for Cebuano, Tagalog, and English on Social Media

Paolo Pacaldo; Junrie Matias

Authors

Paolo Pacaldo
Junrie Matias

Keywords:

Hate Speech Detection, Multilingual NLP, Machine Learning, Hyperparameter Tuning

Abstract

In this study, we explore the development of a web-based, multilingual hate speech detection system that supports Cebuano, Tagalog, and English languages. We integrated both traditional machine learning models and transformer-based deep learning approaches to assess their effectiveness in identifying hate speech from social media comments across various contexts. Specifically, we evaluated Naïve Bayes, Decision Tree, Support Vector Machine (SVM), Random Forest, MBERT, and XLM-Roberta. To prepare the data, we applied a series of preprocessing steps including tokenization, stemming, stopword removal, and TF-IDF vectorization. Feature relevance was enhanced through Chi-Square filtering, and we addressed class imbalance using the Synthetic Minority Over-sampling Technique (SMOTE), which improved recall rates for underrepresented classes. Among the traditional models, the fine-tuned SVM achieved 92.1% accuracy, while Random Forest reached 93.3%, showing strong recall performance particularly for Cebuano and English texts. Meanwhile, transformer-based models yielded superior performance following hyperparameter tuning: MBERT achieved 96.1% accuracy with an F1-score of 0.97, and XLM-Roberta obtained 95.4% accuracy with an F1-score of 0.96. These results highlight the value of combining Chi-Square feature selection, SMOTE balancing, and fine-tuning strategies to optimize multilingual hate speech detection. Despite the advancements, our findings also reveal ongoing challenges related to class imbalance, as reflected in the macro F1-scores—even in transformer-based models. Overall, we demonstrate that a well-tuned hybrid approach can provide an efficient and scalable solution for multilingual hate speech detection in diverse digital environments.

Leveraging Machine Learning Models in Developing a Web-Based Multilingual Hate Speech Detection System for Cebuano, Tagalog, and English on Social Media

Authors

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Most read articles by the same author(s)

Make a Submission

Announcements

CALL FOR PAPERS for December 2025 Issue

menu

Analytics

Downloadables

Keywords

Browse

Information