مقایسه کارایی مدل های آماری و یادگیری ماشین و انتخاب مدل بهینه در پیش بینی سود خالص و جریان های نقدی عملیاتی (مقاله علمی وزارت علوم)
درجه علمی: نشریه علمی (وزارت علوم)
آرشیو
چکیده
هدف: در پژوهش حاضر، مقایسه عملکرد مدل های یادگیری ماشین و مدل های آماری در پیش بینی سود و جریان نقد عملیاتی با استفاده از مجموعه متغیر های تعهدی و نقدی بررسی شده است.روش: روش شناسی پژوهش به سه مرحله گزینش مجموعه داده و متغیرها، مدل سازی و قیاس تقسیم بندی می شود. جامعه آماری پژوهش حاضر، شرکت های بورس اوراق بهادار تهران و داده های 184 شرکت طی بازه زمانی 1391 تا 1400 بررسی شده است.یافته ها: نتایج این پژوهش نشان دهنده آن بود که متغیرهای تعهدی توان تبیین بیشتری نسبت به متغیر های نقدی برای پیش بینی سود خالص و جریان نقد عملیاتی آتی دارد. علاوه بر این، مقایسه عملکرد مدل های یادگیری ماشین و آماری در پیش بینی سود خالص و جریان نقد عملیاتی آتی نشان دهنده آن بود که رویکرد هوش مصنوعی توانایی بیشتری دارد و بین مدل های یادگیری ماشین، رگرسیون نمادین و مدل های آماری، مدل پروبیت از عملکرد بیشتری برخوردار است؛ همچنین نتایج نشان دهنده آن بود که اگرچه به طور میانگین مدل های یادگیری ماشین عملکرد بیشتری نسبت به مدل های آماری دارد، مدل های آماری نیز عملکرد بیشتری از برخی مدل های یادگیری ماشین ارائه می دهد.Comparing the Efficiency of Statistical Models and Machine-Learning Models and Choosing the Optimal Model for Predicting Net Profit and Operating Cash Flows
                            
                                The present study compared the predictive performance of machine-learning models and statistical models for forecasting profit and operational cash flow by using a combination of accrual and cash variables. The research method encompassed 3 main stages: data set and variable selection, modeling, and estimation. The study focused on companies listed on the Tehran Stock Exchange (TSE), analyzing data from 184 companies over the period of 2012-2021. The findings indicated that accrual variables exhibited greater explanatory power than cash variables in predicting net profit and future operating cash flow. Furthermore, the comparison of machine-learning and statistical models for forecasting net profit and future operating cash flow revealed that the artificial intelligence approach exhibited superior capability. Specifically, symbolic regression among the machine-learning models and the probit model among the statistical models demonstrated higher performance. Additionally, the results indicated that certain statistical models outperformed some machine-learning models while, on average, machine-learning models outperformed statistical models.Keywords: Classification, Data Mining, Machine Learning, Net Profit Forecasting, Operating Cash Flow Forecasting. IntroductionIn the current intensely competitive business environment, precise prediction of financial outcomes has emerged as a pivotal element in organizational triumph. Projecting crucial financial indicators, such as net profit and operating cash flows, equips businesses with the insight needed to make well-informed choices regarding investment strategies, resource distribution, and comprehensive financial strategizing. The capacity to anticipate future financial performance enables organizations to streamline operations and mitigate risks. Consequently, there is an escalating need for effective forecasting models.This study had two primary objectives: firstly, assessing the predictive capability of accrual and cash variables for forecasting profit and future cash flows and secondly, comparing the efficacy of statistical models and machine-learning models in predicting net profit and operating cash flows. Statistical models seek to scrutinize historical data patterns and underlying relationships to anticipate future financial outcomes. Conversely, machine-learning models have emerged as a potent alternative, employing advanced computational techniques to glean insights from data and make predictions without explicit programming. This research was guided by four hypotheses:First hypothesis: The predictive capability of accrual variables for future net profit significantly exceeds that of cash variables. Second hypothesis: The predictive capacity of accrual variables for future operational cash flow significantly surpasses that of cash variables. Third hypothesis: Machine-learning models outperform statistical models significantly in predicting net profit. Fourth hypothesis: Machine-learning models outperform statistical models significantly in predicting operational cash flows.  Materials & MethodsThis study utilized the Bourseview software database, Rahavard Novin, and the Codal website for analyzing and drawing conclusions regarding the hypotheses. Additionally, data-mining software, such as Weka, SPM, RapidMiner, SPSS Modeler, and Eureqa, were employed for modeling, while Stata econometric and statistical software was used for the Vuong test, EViews for descriptive statistics, SPSS for mean comparison test, and Excel for data sorting and categorization. Following the application of these specified tools, 184 companies listed on the Tehran Stock Exchange (TSE) were examined. Initially, the study investigated the ability to explain each category of cash and accrual variables for net profit and future operating cash flow through special regression estimation of panel data and the Vuong test. Subsequently, the superior model was utilized for modeling and the average performance of the machine-learning models was compared with that of statistical models. FindingsThe significance of Vuong statistic in predicting net profit at a 1% significance level suggested a notable difference in the explanatory power of the two models with the model of accrual variables demonstrating higher explanatory power than that of the cash flow statement variables. Conversely, the non-significance of the Vuong statistic at the 5% significance level for predicting operational cash flow indicated no significant difference in the explanatory power of the two models. The performance results of both statistical and machine-learning models indicated that the symbolic regression classifier, utilizing the genetic algorithm to predict net profit, exhibited the best overall performance and provided valuable results in the longitudinal test sample. Following symbolic regression, the linear support vector machine and MARS ranked second and third, respectively, in overall performance. Similarly, the symbolic regression classifier, employing the genetic algorithm to predict operating cash flow, demonstrated the best overall performance in the longitudinal test samples. After symbolic regression, the deep learning classifier and MARS ranked second and third, respectively, in overall performance. Discussion & ConclusionsIn accordance with testing of the first and second hypotheses of the research, which posited that accrual variables have a greater explanatory capacity for net profit and future operating cash flow compared to cash variables, the coefficients of determination of the models were compared after estimating the appropriate panel data approach. The investigation results indicated that accrual variables indeed possessed greater explanatory power for net profit, thus providing no grounds for rejecting the first hypothesis of the study. However, in the case of operating cash flow, while the explanatory value of accrual variables surpassed that of cash variables, there was no statistically significant difference in the explanation between accrual and cash variables. Consequently, the second hypothesis of the research was rejected. In accordance with testing of the third and fourth hypotheses of the current study, which posited that machine-learning models outperform statistical models in predicting net profit and operating cash flow, the AUC criterion was derived through the implementation of both statistical and machine-learning models. By comparing the success rates of the statistical and machine-learning models, it was observed that the machine-learning models significantly outperformed statistical models in predicting net profit and operational cash flow. Therefore, there was no basis for rejecting the third and fourth hypotheses of the study.
                            
                        
                        






