Marzouk, N., Nayel, H., Elsawy, A. (2024). Advancing Arabic Scientific Text Analysis: Evaluating Machine Learning Models for Named Entity Recognition. Benha Journal of Applied Sciences, 9(5), 45-48. doi: 10.21608/bjas.2024.279914.1377
Nourhan Marzouk; Hamada Nayel; Ahmed Elsawy. "Advancing Arabic Scientific Text Analysis: Evaluating Machine Learning Models for Named Entity Recognition". Benha Journal of Applied Sciences, 9, 5, 2024, 45-48. doi: 10.21608/bjas.2024.279914.1377
Marzouk, N., Nayel, H., Elsawy, A. (2024). 'Advancing Arabic Scientific Text Analysis: Evaluating Machine Learning Models for Named Entity Recognition', Benha Journal of Applied Sciences, 9(5), pp. 45-48. doi: 10.21608/bjas.2024.279914.1377
Marzouk, N., Nayel, H., Elsawy, A. Advancing Arabic Scientific Text Analysis: Evaluating Machine Learning Models for Named Entity Recognition. Benha Journal of Applied Sciences, 2024; 9(5): 45-48. doi: 10.21608/bjas.2024.279914.1377
Advancing Arabic Scientific Text Analysis: Evaluating Machine Learning Models for Named Entity Recognition
1Department of Computer Science, faculty of Computers and Artificial Intelligence, Benha University, Benha, Egypt
2Computer Science Department Faculty of Computers and Artificial Intelligence Benha University Benha, Egypt
Abstract
The task of named entity recognition in Arabic text, particularly within the scientific and medical domains, presents unique challenges due to the language's rich morphology, the scarcity of resources, and dialectical diversity. This study evaluates the efficacy of Conditional Random Fields (CRF), Support Vector Machines (SVM), and Stochastic Gradient Descent (SGD) models for named entity recognition in Arabic scientific texts. These models have been implemented on a self-collected dataset consisting of Arabic abstracts of theses. The named entities identified in the dataset include proteins, DNA, RNA, cell types, and cell lines. Focusing on the scientific domain, our comparative analysis reveals significant performance differences among the models, with hybrid approaches showing promising results. SGD, SVM, and CRF achieved F1-scores of 0.96, 0.91, and 0.80, respectively. The results demonstrate the effectiveness of the proposed models. The research contributes to Arabic natural language processing by highlighting model strengths and guiding future selections and development of named entity recognition models.