
Evaluating Transformer-Based Models for Sentiment Analysis in the Low-Resource Kazakh Language | IJCT Volume 13 – Issue 2 | IJCT-V13I2P40

International Journal of Computer Techniques
ISSN 2394-2231
Volume 13, Issue 2 | Published: March – April 2026
Table of Contents
ToggleAuthor
Alyara Abilbashar, Almas Kenes
Abstract
Sentiment analysis is now the key to understanding opinions on digital platforms. Languages with few resources, such as Kazakh, have not been studied as much in this area. Although transformer models do well in sentiment analysis, it is not clear how well they work for Kazakh sentiment analysis. We compare three transformer models to see how well they can classify sentiment in Kazakh texts. The models were trained and tested using KazSAnDRA, Kazakh sentiment dataset. We then looked at their performance using common metrics. The results show that performance differs between models and we point out which model works better in low-resource situations. The custom transformer-based model achieved the highest accuracy on the KazSAnDRA dataset, reaching 88.1%. This adds new data to Kazakh NLP research.
Keywords
Sentiment analysis, Transformer-based models, Kazakh language, Low-resource language
Conclusion
This study evaluated three different transformer based models: RemBERT, XML RoBERTa and a custom transformer-based model. These models were used for sentiment analysis in the low-resource Kazakh language using one dataset which is: KazSAnDRA, reviews based dataset. The results show that even though all models perform well consistently on positive sentiment because of class imbalance, predictions on negative sentiment remain challenging. This reflects broader problem observed in low-resource language sentiment analysis, limited annotated examples reducing model discriminative power.
In the dataset the custom Transformer-based model outperformed other models in overall performance, achieving high precision, recall, and F1-score, showing strong generalization despite being trained from scratch. RemBERT reliably performed best on negative sentiment, benefitting mostly from its multilingual pretraining. XML-RoBERTa, on the other hand, provided more balanced predictions overall, despite them being less accurate. This is probably due to its pretraining being on social media data, rather than reviews data like in RemBERT pretraining.
The findings provided two key implications. First, a custom lightweight models trained on domain specific data can compete and outperform large pretrained models applied on low-resource languages. Second, dataset quality and class distribution inside it greatly influence model performance, particu- larly when it comes to predicting neutral sentiment.
Future research may improve class-balancing strategies, domain adaptive pretraining to enhance performance for sentiment categories in lower percentile. Supporting the development of robust NLP tools for low-resource languages by expanding available Kazakh language datasets.
References
[1]Maite Taboada. Sentiment analysis: An overview from linguistics. Annual Review of Linguistics, 2(1):325–347, 2016.Bing Liu. Sentiment analysis and opinion mining. Springer Nature, 2022.
[2]Yanying Mao, Qun Liu, and Yu Zhang. Sentiment analysis methods, applications, and challenges: A systematic literature review. Journal of King Saud University-Computer and Information Sciences, 36(4):102048, 2024.
[3]Yusuf Aliyu, Aliza Sarlan, Kamaluddeen Usman Danyaro, Abdullahi Sani BA Rahman, and Mujaheed Abdullahi. Sentiment analysis in low-resource settings: a comprehensive review of approaches, languages, and data sources. IEEE Access, 12:66883–66909, 2024.
[4]Banu Yergesh, Gulmira Bekmanova, and Altynbek Sharipbay. Sentiment analysis of kazakh text and their polarity. In Web Intelligence, volume 17, pages 9–15. SAGE Publications Sage UK: London, England, 2019.
[5]Aslanbek Murzakhmetov, Maxatbek Satymbekov, Arseniy Bapanov, and Nurbol Beisov. Sentiment analysis of tourist reviews about kazakhstan using a hybrid stacking ensemble approach. Computation, 13(10):240, 2025.
[6]Wael Alosaimi, Hager Saleh, Ali A Hamzah, Nora El-Rashidy, Abdullah Alharb, Ahmed Elaraby, and Sherif Mostafa. Arabbert-lstm: improving arabic sentiment analysis based on transformer model and long short-term memory. Frontiers in Artificial Intelligence, 7:1408845, 2024.
[7]Isidoros Perikos and Athanasios Diamantopoulos. Explainable aspect-based sentiment analysis using transformer models. Big Data and Cognitive Computing, 8(11):141, 2024.
[8]Md Nesarul Hoque, Umme Salma, Md Jamal Uddin, Md Martuza Ahamad, and Sakifa Aktar. Exploring transformer models in the sentiment analysis task for the under-resource bengali language. Natural Language Processing Journal, 8:100091, 2024.
[9]Rustem Yeshpanov and Huseyin Atakan Varol. Kazsandra: Kazakh sentiment analysis dataset of reviews and attitudes. arXiv preprint arXiv:2403.19335, 2024.
Francesco Barbieri, Luis Espinosa Anke, and Jose Camacho-Collados. Xlm-t: Multilingual language models in twitter for sentiment analysis and beyond. In Proceedings of the thirteenth language resources and evaluation conference, pages 258–266, 2022.
How to Cite This Paper
Alyara Abilbashar, Almas Kenes (2026). Evaluating Transformer-Based Models for Sentiment Analysis in the Low-Resource Kazakh Language. International Journal of Computer Techniques, 13(2). ISSN: 2394-2231.








