پیش بینی عملکرد دانشجویان با استفاده از الگوریتم های یادگیری ماشین و داده کاوی آموزشی (مطالعه موردی دانشگاه شاهد) (مقاله علمی وزارت علوم)

درجه علمی: نشریه علمی (وزارت علوم)

نویسندگان: مژده سالاری رضا رادفر مهدی فقیهی

منبع: مطالعات مدیریت کسب و کار هوشمند سال 12 بهار 1403 شماره 47

کلیدواژه‌ها: پیش بینی عملکرد دانشجویان داده کاوی یادگیری ماشینی مدلسازی بهبود کیفیت آموزش

حوزه‌های تخصصی:

حوزه‌های تخصصی مدیریت مدیریت بازرگانی مدیریت کارآفرینی

doi: 10.22054/ims.2023.75523.2375

شماره صفحات: ۳۱۵ - ۳۶۶

دریافت مقاله تعداد دانلود : ۲۶۴

آرشیو

چکیده

هدف این تحقیق بررسی عوامل موثر در پیش بینی عملکرد تحصیلی دانشجویان مقطع کارشناسی در طبقه بندی چهار کلاسه می باشد. برای دستیابی به این هدف، مطالعه از روش داده کاوی کریسپ پیروی می کند. مجموعه داده ها از سیستم آموزشی ناد برای مقطع کارشناسی در دانشگاه شاهد برای ورودی سال های 1390 تا 1400 استخراج شده است. تعداد 1468 رکورد در داده کاوی استفاده شده است. ابتدا شاخص های مؤثر بر عملکرد تحصیلی دانشجویان استخراج شد. مدلسازی با استفاده از ابزار رپیدماینر9.9 انجام شد. برای بهبود عملکرد طبقه بندی و دقت پیش بینی رضایت بخش ، از ترکیبی از تجزیه و تحلیل مؤلفه اصلی همراه با الگوریتم های یادگیری ماشین و تکنیک های انتخاب ویژگی و الگوریتم های بهینه سازی استفاده می کنیم. عملکرد مدل های پیش بینی با استفاده از اعتبارسنجی متقاطع 10 برابری تأیید شده است. نتایج نشان داد که الگوریتم درخت تصمیم بهترین الگوریتم در پیش بینی عملکرد دانشجویان با دقت 84.71 درصد است. این الگوریتم به درستی فارغ التحصیلی 77.88 درصد از دانشجویان عالی و 85.26 درصد از دانشجویان خوب و 84.69 درصد از دانشجویان متوسط و 85.96 درصد از دانشجویان ضعیف را بر اساس معدل نهایی پیش بینی کرد.متغیر معدل دیپلم بیشترین تأثیر را در پیش بینی عملکرد دانشجویان دارد.

Predicting students' performance using machine learning algorithms and educational data mining (a case study of Shahed University)

The purpose of this research is to investigate the effective factors in predicting the academic performance of undergraduate students in the classification of four classes. To achieve this goal, the study follows the CRISP data mining method. The data set was extracted from the NAD educational system for the bachelor's degree in Shahed University for the entry of the years 2011 to 2021. 1468 records were used in data mining. First, the effective features on students' academic performance were extracted. Modeling was done using Rapidminer9.9 tool. To improve classification performance and satisfactory prediction accuracy, we use a combination of principal component analysis combined with machine learning algorithms and feature selection techniques and optimization algorithms. The performance of the prediction models is verified using 10-fold cross-validation. The results showed that the decision tree algorithm is the best algorithm in predicting students' performance with an accuracy of 84.71%. This algorithm correctly predicted the graduation of 77.88% of excellent students, 85.26% of good students, 84.69% of medium students, and 85.96% of weak students based on the final GPA. IntroductionThe main problem in this research is to identify the factors that are effective in predicting the academic performance of undergraduate students in Shahed University. Choosing the best machine learning algorithm in predicting academic performance among different modeling methods based on validation and evaluation of models is another issue in the present research. The purpose of this research is to investigate the effective factors in predicting the academic performance of undergraduate students in Shahed University using educational data mining based on classification models.Research questionsThe main question in this research is what factors affect the prediction of undergraduate students' performance and improving their performance?Sub questions1- Which modeling algorithms have better results in predicting student performance?2- What methods have been used to predict students' performance?3- What is the validity of the developed model for Shahed University students? 2- Research background1-2- Theoretical foundationsEducational data miningThe processing of educational data improves the prediction of student behavior and new approaches to educational policies (Capuano & Toti, 2019) (Viberg et al., 2018)Academic performanceAcademic performance of students means the extent to which they achieve educational goals (Banik & Kumar, 2019).2-2- review of past studiesThe highlighted cells in Table 1, based on past research, show the classification algorithms that have the most accuracy and effectiveness in predicting students' performance in the relevant research. The decision tree algorithm has been used the most in previous researches. The NB algorithm has been the most used in research after the decision tree. RF and ANN algorithms are next in use. After that, SVM and KNN algorithms have been used in researchTable 1. The results of research literature based on the use of classification algorithmsData mining algorithmDTRFNBKNNSVMANNLine RLLRAccuracy(Batool et al., 2023) * * (Marjan et al., 2023)****** (Abdelmagid & Qahmash, 2023) * ** * (Manoharan et al., 2023)** * * * (Alghamdi & Rahman, 2023)*** 99.34%(Alboaneen et al., 2022) * **** (Yağcı, 2022)* *** *70-75%(Dabhade et al., 2021)* * * 83.44%(Najafi & etal,2021)* 95%(Soltani & etal,2021)* ** (Cruz-Jesus et al., 2020) * ** *50-81%(Sokkhey & Okazaki, 2020)*** * (Rebai et al., 2020)** (Jayaprakash et al., 2020)*** (Zulfiker et al., 2020)** * (Musso et al., 2020) * (Waheed et al., 2020) * 85%(Salal & Abdullaev, 2019)* **** (Turabieh, 2019)* ** * (Xu et al., 2019)* ** (ghodoosi & etal,2019)* * (fadavi & etal,2019) * 95.84%(Ajibade et al., 2019)* *** 91.5%(Ahmad & Shahzadi, 2018) * 85%(Hasani & Bazrafshan, 2018)* * (Hussain et al., 2018)*** * (Umer et al., 2017)**** * (Khasanah, 2017)* * (Asif et al., 2017)* (Hoffait & Schyns, 2017) * * *92.34%(khosravi &etal,2017)* * (Mueen et al., 2016)* * * 86%(Amrieh et al., 2015)* ** (Yehuala, 2015)* * 92.34%(zahedi & etal,2015)* * * (Punlumjeak & Rachburee, 2015)* (Osmanbegović et al., 2014)** 71%(Shamloo & et al.,2014)* (Asadi & et al.,2013)* (Kabakchieva, 2013)* ** 60-75%(Oskouei & Askari, 2014)*** * 96%(Nghe et al., 2007)* * present research****** 94.17%3- MethodThis study follows the popular training data mining method CRISP. The data collection of Nad educational system for bachelor's degree in non-medical fields of Shahed University has been extracted from 2011 to 2021. We used the Label Encoder technique to encode the features. In this research, C4.5 and ID3 decision tree classification algorithms, random forest, Naïve Bayes, k-nearest neighbor and artificial neural network and gradient enhanced tree were used to analyze and classify students and predict the final GPA. Modeling was done using RapidMiner 9.9. To improve the classification performance and solve the misclassification problem, we use a combination of principal component analysis and feature selection techniques and optimization algorithms. In this research, prediction accuracy was evaluated using 10-fold cross-validation method for all algorithms. Also, different algorithms were compared using the analytical descriptive method and based on evaluation criteria, and the best prediction model was introduced in this research.4-Data analysis4-1 IntroductionThe best model is the model that has the best values for the selected performance measurement criteria(Lever et al., 2016). Figure 1 is a graph that compares the accuracy of the algorithms used in this research.Figure 1. Comparative chart of the accuracy of the algorithms According to Table 2, the DTC4.5 algorithm is able to predict the class of 1235 objects out of 1458, which gives it an accuracy value of 84.71%.Table 2. Confusion matrix of DT C4.5-GI&OSE research modelprecisionStudents with poor performanceStudents with average performanceStudents with good performanceStudents with excellent performance 78.64%002281Prediction 178.67%94929522Prediction 286.46%50498271Prediction 389.36%3614120Prediction 4 85.95%84.69%85.26%77.88%Recall4-2 important featuresThe prioritization of predictive variables based on their weight is as follows:Diploma GPA: 0.262Semester 1 GPA: 0.201Semester 2 GPA: 0.197Number of honors semesters: 0.122Conditional number: 0.114Year of entry: 0.1044-3 The results of the implementation of the student performance prediction modelThe results of the prediction model are shown in Table 3:Table 3. The results of the DT C4.5-GI&OSE model implementation 5- DiscussionIn the main method of research, namely DT C4.5-GI&OSE, in the classification mode of four classes, it is observed that the average of the diploma has the greatest effect on the process of predicting student performance. In response to the sub-question of a research, the best algorithm in the four-class mode is Decision Tree C4.5-GI&OSE with a prediction accuracy of 84.71. This model showed 84.17% accuracy, 83.42% sensitivity and 0.780 kappa. DT C4.5-GI&OSE technique correctly predicted the graduation of 77.88% of excellent students, 85.26% of good students, 84.69% of average students, and 85.96% of poor students.6-ConclusionThe obtained results show that there is a relationship between students' social and academic characteristics and their academic performance. DT C4.5-GI&OSE algorithm was the best algorithm for predicting the final GPA scores of students at the end of studies with a prediction accuracy of 84.71%. In this model, the average grade point average of the diploma has the greatest effect on the prediction process. Using machine learning models as a decision support tool improves the academic level of students and reduces the number of potential unsuccessful and dropout students. This study was carried out at the undergraduate level, which can be used in future research for the master's and doctoral level.Keywords: student performance prediction, data mining, machine learning, modeling, improving the quality of education