تحلیل روی گردانی مشتریان مبتنی بر رویکرد داده کاوی: الگوریتم ترکیبی درخت تصمیم و شبکه بیزین (مورد مطالعه: فروشگاه های زنجیره ای) (مقاله علمی وزارت علوم)
درجه علمی: نشریه علمی (وزارت علوم)
آرشیو
چکیده
امروزه سازمان ها به این آگاهی رسیده اند که حفظ مشتریان باعث سودآوری بیشتر می شود. همچنین، افزایش رقابت نیز باعث می شود تا میزان روی گردانی مشتریان افزایش یابد؛ از این رو مطالعه عوامل مؤثر بر تمایل به روی گردانی یا عدم رو ی گردانی مشتریان برای پژوهشگران و فعّالان کسب وکار ها اهمیت دارد. در پژوهش حاضر یک مدل ترکیبی مبتنی بر رویکرد داده کاوی برای تحلیل عوامل رو ی گردانی مشتریان ارائه شده است. در گام نخست برای شناسایی عوامل با درجه اهمیت زیادتر و حذف موارد زائد از گره انتخاب ویژگی استفاده و در گام دوم نیز برای طبقه بندی و پیش بینی مشتریان به دو دسته مشتریان روی گردان و مشتریان غیر روی گردان از درخت تصمیم C5.0 و شبکه بیزین استفاده شده است. درنهایت، مدل پیشنهادی در صنعت فروشگاه های زنجیره ای به عنوان مطالعه موردی پیاده سازی شده است. یافته های پژوهش حاکی از آن است که هر دو مدل درخت تصمیم و شبکه بیزین توانایی مناسب را برای پیش بینی روی گردانی مشتریان دارد و سطح زیر منحنی عملیاتی گیرنده در مدل درخت تصمیم بیشتر از مدل شبکه بیزین بوده است؛ درنتیجه مدل درخت تصمیم عملکرد بهتری دارد. همچنین، نتایج نشان می دهد که سه عامل جنسیت، وضعیت تأهل و سن از مجموعه مشخصه های دموگرافیک و پنج عامل متوسط سطح درآمد ماهیانه، تعداد خرید در ماه، سهم خرید اینترنتی، نحوه آشنایی با فروشگاه و نوع بازار مربوط به سوابق تراکنش های مشتریان از مهم ترین عوامل مؤثر بر روی گردانی مشتریان است.Customer Churn Analysis Based on the Data-mining Approach: Hybrid Algorithm Incorporates Decision Tree and Bayesian Network
Today, companies and organizations are aware of the fact that customer retention leads to greater profitability. Increasing competition causes the rate of customer churn to grow. Therefore, studying the features influencing the tendency of customer churn is important. In the present study, a hybrid model based on the data mining approach is presented to analyze the features of churn customers. In the first step, the feature selection node has been used to identify the features with higher importance and remove redundant items. Then, the C5.0 Decision Tree and Bayesian network were used to classify the customers into two groups, turning customers, and non-turning customers. These are data mining techniques and terms that can help in forecasting. Finally, the proposed model has been implemented in the chain store industry as a case study. Key findings indicate that both the decision tree model and the Bayesian network can predict churn customers with different accuracies, the area under the receiver operating curve in the decision tree model is greater than the Bayesian network model and it has better performance. The results indicate the optimal efficiency of the proposed method. In addition, the results show that three features of gender, marital status, and age from the set of demographic characteristics and five factors of average monthly income level, number of purchases per month, the share of online shopping, how to get to know the store, type of market-related to customer transaction records are among the most effective factors. IntroductionCustomers are among the most important assets for businesses. Customer relationship management has been introduced as a comprehensive key strategy to stay focused on customers' needs and integrate methods of dealing with customers in the organization. In the key area of customer relationship management, the importance of customer orientation is very evident. This term means leaving the organization on behalf of customers and turning to competing organizations to receive services. Customers may leave the organization for various obvious or hidden reasons. The goal of organizations is to maintain existing customers by using customer retention methods. Customer loss creates a situation for competing organizations to attract customers of an organization. This situation has made the importance of predicting the churn of customers to double. Researchers have concluded that a small change in customer retention rate can have a huge impact on overall business improvement. Predicting customer churn is a suitable tool for describing the customer retention process of an organization, and the purpose of using it is to identify a group of customers who are prone to churn. Knowing this group of customers and taking preventive measures can play a very important role in preventing customers from turning away. In such a situation, predicting customer turnover has attracted a lot of attention in management and marketing studies. To efficiently manage customer churn forecasting within an organization, it is of great importance to provide an effective and highly accurate customer churn prediction model. MethodologyThe research is practical according to the purpose. Because the results of this research can be practically used, the nature of this research is post-event. Considering the existence of three types of research methods: quantitative, qualitative, and hybrid, the method of this research is quantitative. Since a specific methodology must be used to perform data mining operations, standard Crisp methodology has been used in this regard. Also, various data mining techniques including feature selection and classification have been used in this research. Finally, a case study is the method used in this research. In this research, based on the real data of the customers of the chain store industry, customer churn is predicted and the effective factors of churn are identified. In this research, a hybrid model based on a data mining approach is presented to analyze the factors of customer churn. In the first step, the Bayesian network algorithm was used to identify factors with higher importance. Bayesian network is a non-circular directed graph where each node represents a variable and arcs represent direct causal relationships between connected nodes and conditional probability tables are assigned to nodes that have conditional dependence. In the second step, the C5.0 decision tree technique was used to classify customers based on their churning status. The C5.0 decision tree is an improvement over the C4.5 and ID3 decision trees. The division of each node is calculated based on information gain. This index is used to select the fragile variable in the process of tree growth. FindingsThe results of the research indicate that the demographic characteristics and purchase records of customers are effective in the behavior of customers. Based on the results of the classification in this research and the rules provided by the decision tree, the eight key factors identified in turning away customers with a significant effect on the studied case have been analyzed in the following. The classification results show that people with a monthly income of 50 million Rials and above are among those customers who are likely to buy again. People with a monthly income of less than 50 million Rials are not customers. People whose number of purchases was more than 2 times a month are among those loyal customers who are likely to buy again, and those customers who are likely to buy less than 2 times would not repurchase and they are part of the churning customers. The obtained results show that people with an internet shopping share of more than 20% are loyal customers and people with a lower shopping share are turned away (churn) customers. The percentage of women's purchases is higher than that of men, and women are more loyal than men. Married people have a higher percentage of the store's customers and are more loyal than single people. The overall amount of people's purchases from a chain store is much higher than from a traditional supermarket. ConclusionsAnalyzing and predicting customer behavior is very important because the cost of losing a customer is very high for an organization. In this regard, in the current research, the method of combining the Bayesian network and a C5.0 decision tree has been developed for predicting the analysis of customer churn. For this purpose, the data of customers of a chain store in Mashhad City has been examined as a case study. The variable of customer repurchase probability is considered as a dependent variable and then the most important independent variables for the implementation of the C5.0 decision tree and Bayesian network are identified by the feature selection node. The results of the present research show that the application of the feature selection algorithm can help the decision makers to accurately classify the model and choose the best model and focus on the variables with the highest importance in predicting the turning of customers. The results also indicate that the eight factors of age, marital status, average monthly income, number of purchases per month, familiarity with the store, type of market, share of online shopping, and special sales are among the most important factors affecting diversion. According to the comparison of two Bayesian network algorithms and the C5.0 decision tree based on ROC diagram results, it is emphasized that the C5.0 decision tree with the highest accuracy has a better performance in identifying returning customers. Finally, a set of managerial insights for formulating marketing plans and facing all kinds of customers has been presented.