مطالب مرتبط با کلیدواژه
۱.
۲.
۳.
۴.
۵.
۶.
۷.
۸.
۹.
۱۰.
۱۱.
۱۲.
۱۳.
۱۴.
۱۵.
۱۶.
۱۷.
۱۸.
۱۹.
۲۰.
Data mining
منبع:
Iranian Journal of Finance, Volume ۳, Issue ۱, Winter ۲۰۱۹
90 - 109
حوزه های تخصصی:
In order to survive in the modern world, organizations must be equipped with the mechanisms that not only maintain their competitive advantage, but also result in their progress and improvement. Prediction of banks’ performances is an important issue, and a poor performance in banks may primarily lead to their bankruptcy, thereby affecting national economics. The bank performance prediction model uses scientific and systematic approaches to diagnose the financial operations of institutes. According to a precise and strict evaluation, the model can detect the weakness of institutions in advance and provide early warning signals to related financial governments. In the present study, we have used three data mining models to predict the future performance of the banks accepted in Tehran Stock Exchange (TSE) and Iran Fara Bourse. Initially, 53 financial ratios were selected and, consequently, reduced to 28 using the fuzzy Delphi technique. The statistical population included 18 banks listed on TSE and Iran Fara Bourse, which provided their financial statements during the period of 2011 to 2017. Data were collected from the Codal site based on 28 financial ratios using C4.5 decision tree, AdaBoost, and Naïve Bayes algorithm. According to the findings, the Naïve Bayes algorithm was the optimal predictive model with the accuracy of 88.89%.
Prediction of Osteoporosis by K- NN Algorithm and Prescribing Physical Activity for Elderly Women
حوزه های تخصصی:
Mining a Set of Rules for Determining the Waiting Time for Selling Residential Units(مقاله علمی وزارت علوم)
منبع:
Journal of System Management, Volume ۷, Issue ۱, Spring ۲۰۲۱
171 - 203
حوزه های تخصصی:
Being aware of the waiting time for selling residential units is one of the important issues in the housing sector for the majority of people, especially investors. There are several factors affecting the waiting time for selling residential units. Determining the influential factors on the time period of selling real estates can lead to an informed decision making by real estate consultants, sellers as well as those seeking to buy real estates. Using a real estate database in Iran, the present paper proposes a two-module procedure. The first module deals with implementation of association rule mining. Using the well-known association rule mining techniques namely FP-Growth, several association rules have been extracted which indicate the effective factors on the waiting time for selling residential units. Generated association rules have been evaluated based on metrics such as support, confidence and lift and finally the best rules are selected. The main objective of the second module is to develop a fuzzy inference system which can determine the factors influencing the waiting time for selling residential units from historical data, so that the model can be used to estimate the time it to sell the property for a real estate agency. Several IF-THEN rules are extracted from this module. Extracted rules can be used by real estate agencies as well as buyers and sellers of residential units to make better decisions in their investments. In conclusion section, a number of suggestions for future studies are presented. For example, machine learning algorithms such as neural networks, decision trees, etc. can also be used to predict the duration of residential units’ sale. The main objective of the second module is to develop a fuzzy inference system which can learn about the factors that influence the waiting time for selling residential units from historical data, so that the model can be used to estimate the time it takes to sell the property for a real estate agency. Several IF-THEN rules are extracted from this module. Extracted rules can be used by real estate agencies as well as buyers and sellers of residential units to make better decisions in their investments.
A Novel Scheme for Improving Accuracy of KNN Classification Algorithm Based on the New Weighting Technique and Stepwise Feature Selection(مقاله علمی وزارت علوم)
حوزه های تخصصی:
K nearest neighbor algorithm is one of the most frequently used techniques in data mining for its integrity and performance. Though the KNN algorithm is highly effective in many cases, it has some essential deficiencies, which affects the classification accuracy of the algorithm. First, the effectiveness of the algorithm is affected by redundant and irrelevant features. Furthermore, this algorithm does not consider the differences between samples, which led the algorithm to have inaccurate predictions. In this paper, we proposed a novel scheme for improving the accuracy of the KNN classification algorithm based on the new weighting technique and stepwise feature selection. First, we used a stepwise feature selection method to eliminate irrelevant features and select highly correlated features with the class category. Then a new weighting method was proposed to give authority value to each sample in train dataset based on neighbor categories and Euclidean distances. This weighting approach gives a higher preference to samples that have neighbors with close Euclidean distance while they are in the same category, which can effectively increase the classification accuracy of the algorithm. We evaluated the accuracy rate of the proposed method and analyzed it with the traditional KNN algorithm and some similar works with the use of five real-world UCI datasets. The experiment results determined that the proposed scheme (denoted by WAD-KNN) performed better than the traditional KNN algorithm and considered approaches with the improvement of approximately 10% accuracy.
An Investigation on the User Behavior in Social Commerce Platforms: A Text Analytics Approach(مقاله علمی وزارت علوم)
حوزه های تخصصی:
Nowadays, the tourism industry accounts for approximately 10% of the global GDP, while it only contributes 3% of the economy in Iran. Since the pressure of US sanctions increases day after day on the Iranian economy, the necessity of paying attention to this industry as a source of foreign currency is felt more than ever. The purpose of this research is to analyze the reviews of users of social commerce websites by using a combination of text mining and data mining techniques. For this purpose, the database of TripAdvisor website (TripAdvisor.com) was evaluated, and all profile information of users who commented on hotels in Iran was collected. These comments on all the content of the website, such as hotels, restaurants, and attractions, were then extracted and analyzed. The optimal number of clusters was considered four clusters by calculating the Davies-Bouldin index, namingly water therapy tourists, boutique hotels style and Iran urban tourists, travelholics and food tourists, business and health tourists. Every single cluster possesses unique attributes and features. Afterward, the association rules were further identified for each cluster according to the characteristics of each cluster and the information in the users' profiles. Finally, a solution is proposed to increase the participation of the users on the website, and targeted promotional plans are expressed in accordance with the well-known features of each cluster.
Exploring the Limitations of Quality Metrics in Detecting and Evaluating Community Structures(مقاله علمی وزارت علوم)
The discovery and analysis of community structures in networks has attracted increasing attention in recent years. However there are some well-known quality metrics for detecting and evaluating communities, each of them has its own limitations. In this paper, we first deeply discuss these limitations for community detection and evaluation based on the definitions and formulations of these quality metrics. Then, we perform some experiments on the artificial and real-world networks to demonstrate these limitations. Analyzed quality metrics in this paper include modularity, performance, coverage, normalized mutual information (NMI), conductance, internal density, triangle participation ratio and cut ratio. Comparing with previous works, we go through the limitations of modularity with much more accurate details. Moreover, for the first time, we present some limitations of NMI. In addition, however it is known that performance has tendency to get high values in large graphs, we explore this limitation by its formulation and discuss several specific cases in which performance even on small graphs gets high scores
An Analysis on Characteristics of Negative Association Rules(مقاله علمی وزارت علوم)
Association rules are one of the data and web mining techniques which aim to discover the frequent patterns among itemsets in a transactional database. Frequent patterns and correlation between itemsets in datasets and databases are extracted by these interesting rules. The association rules are positive or negative, and each has its own specific characteristics and definitions. The mentioned algorithms of the discovery of association rules are always facing challenges, including the extraction of only positive rules, while negative rules in databases are also important for a manager’s decision making. Also, the threshold level for support and confidence criteria is always manual with trial and error by the user and the proper place or the characteristics of datasets is not clear for these rules. This research analyses the behavior of the negative association rules based on trial and error. After analyzing the available algorithms, the most efficient algorithm is implemented and then the negative rules are extracted. This test repeats on several standard datasets to evaluate the behavior of the negative rules. The analyses of the achieved outputs reveal that some of the interesting patterns are detected by the negative rules, while the positive rules could not detect such helpful rules. This study emphasizes that extracting only positive rules for covering association rules is not enough.
Scientific Map of Papers Related to Data Mining in Civilica Database Based on Co-Word Analysis(مقاله علمی وزارت علوم)
Today, due to the large volume of data and the high speed of data production, it is practically impossible to analyze data using traditional methods. Meanwhile, data mining, as one of the most popular topics in the present century, has contributed to the advancement of science and technology in a number of areas. In the recent decade, researchers have made extensive use of data mining to analyze data. One of the most important issues for researchers in this field is to identify common mainstreams in the fields of data mining and to find active research fields in this area for future research. On the other hand, the analysis of social networks in recent years as a suitable tool to study the present and future relationships between the entities of a network structure has attracted the researcher’s scrutiny. In this paper, using the method of co-occurrence analysis of words and analysis of social networks, the scientific structure and map of data mining issues in Iran based on papers indexed during the years 1388 to 1398 in the Civilica database is drawn, and the thematic trend governing research in this area has been reviewed. The results of the analysis show that in the category of data mining, concepts such as clustering, classification, decision tree, and neural network include the largest volume of applications such as data mining in medicine, fraud detection, and customer relationship management have had the greatest use of data mining techniques.
Recommended System for Controlling Malnutrition in Iranian Children 6 to 12 Years Old using Machine Learning Algorithms(مقاله علمی وزارت علوم)
Iran is facing low levels of all three types of children's nutrition like nutrient and micronutrients deficiency and overeating. The most common nutritional problems and child deaths are vitamin deficiencies and food quality. The purpose of this research is to plan food recommended system to control malnutrition in children 6 to 12 years old using hybrid machine learning algorithms. The results of this research are applicable in terms of target research. In terms of the implementation method, it is a descriptive survey and the process of gathering information is quantitative data. The dataset used includes 1001 data points collected from the health centers of Mianeh city located in East Azerbaijan in Iran from the integrated apple web system. In this research, the Python programming language has been used to analyze the child nutrition dataset, and AdaBoost and Decision Tree hybrid algorithms have been utilized for the child nutrients recommender system. We concluded that the number of meal features using the Decision Tree algorithm with 98.5% accuracy was more important than other nutritional features of children in recognizing malnutrition in them. From a review of 1001 data into the child nutrition dataset, 807 children are underweight and malnourished, 170 children are normal weight, 20 children are obese and four children are overweight. Therefore, the high exactness of hybrid algorithms in these studies has been able to have a high alignment with the opinion of nutritionists from 2019 to 2020.
Energy Consumption Prediction in Iran: A Hybrid Machine Learning and Genetic Algorithm Method with Sustainable Development Considerations(مقاله علمی وزارت علوم)
منبع:
Environmental Energy and Economic Research, Volume 6, Issue 2, May 2022
Ensuring energy security is a major concern of policymakers and economic planners. This objective could be achieved by managing the energy supply and its demand. The latter has received less attention, especially in developing countries. Neglect of energy consumption and its accurate forecasting leads to potential outages and also unsustainable development. Nonlinear methods that are consistent with the nature of energy consumption have led to better results. Therefore, in the present study, both aspects of sustainable development in the determinants of energy demand and the nonlinear hybrid method have been used. We introduced a model based on sustainable development indicators to forecast energy consumption in Iran in which the relevant indicators are specified by the determination phase. To forecast energy consumption, we provided a new standard dataset for energy consumption in Iran (IREC) based on the data extracted from the World Bank and Ministry of Energy dataset in Iran. The highlight of this research is that it provided the most efficient features from the dataset using the genetic algorithm and five forecasting approaches based on machine learning methods. The algorithm was able to select 14 features as the most effective indicators in predicting energy consumption from all the 104 ones in the IREC with 500 repetitions. The empirical results indicated that the model can provide important indicators for energy consumption forecasting. The experiment result of the model using the GA-Based feature selection indicates that the hybrid model has had better results and GA-SVM and GA-MLP have the best result respectively.
Presenting a Model for Financial Reporting Fraud Detection using Genetic Algorithm(مقاله علمی وزارت علوم)
حوزه های تخصصی:
both academic and auditing firms have been searching for ways to detect corporate fraud. The main objective of this study was to present a model to detect financial reporting fraud by companies listed on Tehran Stock Exchange (TSE) using genetic algorithm. For this purpose, consistent with theoretical foundations, 21 variables were selected to predict fraud in financial reporting that finally, using statistical tests, 9 variables including SALE/EMP, RECT/SALE, LT/CEQ, INVT/SALE, SALE/TA, NI/CEQ, NI/SALE, LT/XINT, and AT/LT were selected as the potential financial reporting fraud indexes. Then, using genetic algorithm, the final model of fraud detection in financial reporting was presented. The statistical population of this study included 66 companies including 33 fraudulent and 33 non-fraudulent companies from 2011 to 2016. The results showed that the presented model with the accuracy of 91.5% can detect fraudulent companies. These findings extend financial statement fraud research and can be used by practitioners and regulators to improve fraud risk models.
HFC: Towards an Effective Model for the Improvement of heart Diagnosis with Clustering Techniques(مقاله علمی وزارت علوم)
Heart disease pretends great danger to people, as heart disease has recently become a dangerous disease that acts as a threat to humans. It usually affects all groups from young to old. The biggest challenge in this paper is data pre-processing and discovering a solution to the failure of records Clinical heart, where an effective high-performance model is proposed to enhance heart disease and treat failure in the clinical heart failure records. The current authors applied the techniques of clustering with k-means, expectation-maximization clustering, DBSCAN, support vector clustering, and random clustering herein. Using cluster techniques, we gained good enough results for significantly predicting and improving the performance of heart disease. The goal of the model is a suggestion of a reduction method to find features of heart disease by applying several techniques. Our most important results are to predict faster and better. It indicates that the proposed model is excellent and gives excellent results. This model demonstrated a great superiority over its counterparts through the results obtained in this research. We obtained some values of 130, 980, 183, 125.133, 133, 203, and 125.800. It confirms that this model will predict significantly and improve the performance of the data that we have worked on this.
A Combined Model for Prediction of Financial Software Learning Rate based on the Accounting Students’ Characteristics(مقاله علمی وزارت علوم)
حوزه های تخصصی:
The accounting software is considered to be of the most critical components of accounting information system, with particular significance as of accounting and financial systems. the most important problems with accounting education systems is that students do not adequately learn the financial software required by the accounting profession, which, in turn, reduces the credibility and position of the accounting profession. That the main objective of accounting software education is to educate skilled and expert accountants to enter the accounting profession, which is considered as of the success factors of country’s economy. In this study, employ data mining techniques to investigate the accuracy, precision, and recall performance measures and to predict the rate of financial software learning based on accounting students’ emotional intelligence (EI), gender and education level. Accordingly, a machine-learning-based multivariate statistical analysis is performed on 100 Iranian accounting students. The results show that emotional intelligence has the most impact on the rate of financial software learning among the variables. Gender and education level were influential. Also, among the five algorithms, the highest precision and recall are achieved by both Decision Tree and XGBoost and are presented as the most appropriate models for the prediction rate of financial software learning.
A data Mining Approach using CNN and LSTM to Predict Divorce before Marriage(مقاله علمی وزارت علوم)
Divorce will have destructive spiritual and material effects, and unfortunately, in this regard recent statistics have shown that solutions provided for its prevention and reduction have not been effective. One of the effective solutions to reduce divorce in society is to review the background of the couple, which can provide valuable experiences to experts, and used by experts and family counselors. In this article, a method has been proposed that uses data mining and deep learning to help family counselors to predict the outcome of marriage as a practical tool. Reviewing the background of thousands of couples will provide a model for the coupe behavior analysis. The primary data of this study was collected from the information of 35,000 couples registered in the National Organization for Civil Registration of Iran during 2018-2019. In the current work, we proposed a method to predict divorce by combining a convolutional neural network (CNN) and long short-term memory (LSTM). In this hybrid method, key features in a dataset are selected using CNN layers, and then predicted using LSTM layers with an accuracy of 99.67 percent. A comparison of the method used in this article and Multilayer Perceptron (MLP) and CNN suggests that it has a higher degree of accuracy.
Parallel Machine Scheduling with Controllable Processing Time Considering Energy Cost and Machine Failure Prediction(مقاله علمی وزارت علوم)
Predicting unexpected incidents and energy consumption decline is one of the current problems in the industry. The extant study addressed parallel machine scheduling by consideration of failures and energy consumption decline. Moreover, the present paper aimed at minimizing early and late delivery penalties, and enhancing tasks. This research designed a mathematical model for this problem that considered processing times, delivery time, rotation speed and torque, failure time, and machine availability after repair and maintenance. Failure times have been predicated on using machine learning algorithms. The results indicated that the proposed model can be suitably solved for the size of 10 jobs or tasks and five machines. This research addresses the problem in two parts: the first part predicts failures, and the second part includes the sequence of parallel machine scheduling operations. After the previous data were received in the first step, machine failure was predicted by using machine learning algorithms, and a set of rules were obtained to correct the process. The obtained rules were used in the model to improve the machining process. In the second step, scheduling mode was used to determine operations sequence by consideration of these failures and machinery unavailability to achieve the optimal sequence. Moreover, it is supposed to reduce energy consumption and failures. This study used the Light GBM algorithm and achieved 85% precision in failure prediction. The rules obtained from this algorithm contributed to cost reduction.
Identification of influencing factors on implementation of smart city plans based on approach of technical and social system(مقاله علمی وزارت علوم)
منبع:
Journal of System Management, Volume ۹, Issue ۲, Spring ۲۰۲۳
197 - 212
The current research seeks to identify the influencing factors on implementation of smart city plans based on approach of technical and social systems. For this goal, the library study is done and then based on that a research plan is written that include using expert opinions and data mining technique, feature selection ,clustering and also Delphi technique to identify and screen factors and then using clustering, the final factors are leveled. Here the aim is not ranking but is leveling. Meanwhile because of high numbers of factors, screening them in both steps using Delphi and feature selection is conducted. Delphi is one of the classic tests in qualitative approaches and feature selection include data mining techniques. Finally leveling factors include technical and social factors and the most influencing ones are determined. technical factors including digital infrastructure, ICT base transportation, ICT based logistic, building alarm systems, energy consumption adjustment, ICT based process are placed in level one. Social factors including digital and smart innovation, knowledge sharing, smart education, participation in sustainable development, access to educational plans, waste recycling, pollution control, productivity and flexibility of labor market are placed in level one.
Analysis of the Behavior of Tourists in Iran Based on Data Mining and Search Rate on Google(مقاله علمی وزارت علوم)
منبع:
International Journal of Digital Content Management, Vol. ۴, No. ۶, Winter & Spring ۲۰۲۳
253 - 270
حوزه های تخصصی:
Understanding tourist behavior is a requirement for marketing planning for the supply of goods and services to satisfy tourists. Nowadays, many tourists decide to travel to any place by searching through internet explorers. The present study was conducted with the aim of analyzing the behavior of tourists in Iran based on data mining (search rate on Google Trend). The research is applied and has been done with a causal descriptive method. The method of collecting data is by searching keywords on Google. . To collect the keywords needed for the research, we had to turn to the experts and authorities on the field as well as the relevant scientific articles. The process of data collection is that a series of key words in tourism were selected and accordingly, it was determined how much people used these words in different places in Iran. Correlation matrix (covariance) model has been used for data analysis. In this research, the structural information of these keywords was obtained and edited based on the keywords related to tourism extracted from Google Trend by time series and with the help of Pearson correlation matrix. Nine keywords were selected for search, including hotel, entertainment, pilgrimage, tourism, nature, places of interest, archeology, travel and travel tour. The keywords are not searched at random, but they are related and correlated and stem from a structural thinking. The results of the data analysis have shown the type and intensity of connections between the words that had a communication structure. It was also found that the words pilgrimage, recreation, and archeology have less connection with other words.
Providing a Framework of Effective Components in Iranian Championship Sports with a Data Mining Approach(مقاله علمی وزارت علوم)
حوزه های تخصصی:
The main purpose of this study was to provide a framework of effective components in Iranian championship sports with a data mining approach. The research method was qualitative. The advanced search was performed in the general framework of factors affecting the sports industry and development and integration with data-driven technologies. Based on the literature review, 15 frameworks in the field of sports and 13 frameworks for the development and integration of the sports industry with technology were identified. After the data analysis, a researcher-developed framework for the championship sports industry for the use of data-driven technologies and data mining was presented in three parts. In the first part, nine influential factors including athletes/champion teams, leagues/clubs, stadiums/sports venues, fans/spectators, brands, media, government, academic/research institutions, and technology companies, were identified. In the second part, a strategic plan based on the development and integration of sports industry with data-driven technologies and data mining was presented, which includes four stages: Identifying and selecting talents, pre-game and match preparation, in-game and match activities, and post-match and match analysis. All data-driven activities and data mining in the first and second sections were performed by the IBM data science analysis methodology presented in the third section. Then, a conceptual framework was provided to 7 experts and their opinions were collected through semi-structured interviews and focus group methods. This conceptual framework enables sports managers to plan for their organization and adopt appropriate strategies using data-driven technologies and data mining.
Presenting a Model for Recognizing Phishing Sites and Privacy Violations in the Tourism Industry(مقاله علمی وزارت علوم)
حوزه های تخصصی:
Purpose: Electronic Tourism is one of the important components of expanding Tourism by synchronizing this industry with information technology. It has not been long since its emergence. Methodology: this field is a combination of tourism and information technology that is one of the most common types of income-generating businesses which is producing job opportunities in the modern world. The advancement of science alongside communication and information technologies presented many opportunities and threats to this field due to tech such as smartphones and sensors, virtual and augmented reality tools, NFC, RFID, etc. Findings: The disclosure of the tourists' information and the possible abuse of it is one such threat. Therefore privacy and non-disclosure of information should be important factors. Recognition of reputable sites is an important factor in solving this problem. In this study, we have presented a model for recognizing fake and phishing sites which use the CFS+PSO and a combination of Info+Ranger alongside their results to reduce the test dataset features so that it could present a model for categorizing and higher accuracy in recognizing phishing sites by using the Multilayer Perceptron method. The proposed model was successful in recognizing 95.5% of phishing sites. Counclusion: The effect of information technology on the tourism industry and the usage of internet websites for selling and providing tourism services to tourists have created new security challenges. Protecting the privacy and personal information of people and tourists is one of these challenges and the disclosure of such information could lead to abuse by unqualified people and dissatisfaction and distrust of such systems.
Investigating the Impact of Learning Orientation on Market Orientation Based on Data Mining and Association Rules(مقاله علمی وزارت علوم)
منبع:
رهیافتی در مدیریت بازرگانی دوره ۴ پاییز ۱۴۰۲ شماره ۳ (پیاپی ۱۵)
374 - 392
حوزه های تخصصی:
Market orientation is the ability of appropriate response in complex conditions of the market and that is the most fundamental issue in the marketing and business literature. In fact, market orientation is considered as a practical marketing application. market orientation enables companies to learn about customers, competitors and environmental factors continuously within the existing and potential market. One of the best ways to extract significant relationships among data is to use data mining algorithms. The purpose of this research is to investigate the relationship between learning orientation and market orientation by association rules and data mining. after sampling, 132 questionnaires have been used for data analysis. After data collection the relationship between learning orientation components (commitment to learning, shared vision and open mindedness) and market orientation (customer orientation, competitor orientation and inter-functional coordination) was explored. Learning orientation is one of the factors that plays a key role in organizations’ market orientation. the relationship between learning orientation components including commitment to learning, open mindedness and inter-functional coordination was investigated using data mining. The findings showed that commitment to learning, shared vision and open mindedness lead to customer orientation, commitment to learning and open mindedness lead to competitor orientation and commitment to learning and shared vision lead to inter-functional coordination.