مطالب مرتبط با کلیدواژه
۴۱.
۴۲.
۴۳.
۴۴.
۴۵.
۴۶.
۴۷.
۴۸.
۴۹.
۵۰.
Machine Learning
حوزههای تخصصی:
The accurate imputation of missing values in time series data is paramount for maintaining the integrity and reliability of analyses and predictions. This article investigates the effica-cy of various missing values imputation methods, encom-passing well-known machine learning and statistical tech-niques. Moreover, for a better understanding, they imple-mented two financial data time series: S&P 500 and Bitcoin markets spanning from 2016 to 2023 on a daily frequency. Initially utilizing complete datasets, controlled missingness was introduced by randomly removing 45 data points. Then, these methods applied multiple imputation strategies for estimating and substituting these missing values. Experi-mental evaluation yielded insightful findings regarding the performance of the different methods. The examined ma-chine learning methods, including k-Nearest Neighbors (k-NN), Random Forest, Deep Learning, and Decision Trees, consistently outperformed their statistical counterparts, such as Mean Imputation, Regression Imputation, Hot-Deck Im-putation, and Expectation-Maximization Imputation. Nota-bly, Random Forest emerged as the most effective method, showcasing superior performance in terms of accuracy and robustness. Conversely, the Mean Imputation method exhibited com-paratively inferior outcomes, suggesting its limited suitabil-ity for financial time series data. This research contributes to the ongoing discourse on data integrity within finance ana-lytics and serves as a comprehensive guide for practitioners seeking optimal missing values imputation methods. The empirical evidence provided herein advances the under-standing of imputation techniques' relative performance and their application in financial data, facilitating enhanced de-cision-making processes and yielding more reliable predic-tions.
The Influence of Predictive Maintenance Technologies on Operational Efficiency in Manufacturing Startups
حوزههای تخصصی:
The objective of this study is to explore the influence of predictive maintenance technologies on operational efficiency in manufacturing startups, focusing on implementation processes, operational impacts, and the challenges encountered. This qualitative study employed semi-structured interviews to gather data from key stakeholders in manufacturing startups, including founders, operations managers, and maintenance engineers. A total of 22 participants were interviewed, with the sample size determined by theoretical saturation. The interviews were transcribed verbatim and analyzed using NVivo software. Thematic analysis was conducted to identify and categorize key themes and subthemes related to the implementation and impact of predictive maintenance technologies. The analysis revealed three main themes: Implementation Process, Operational Impact, and Challenges and Barriers. Within these themes, several categories and concepts emerged. The Implementation Process theme highlighted the importance of planning, technology selection, system integration, employee involvement, pilot testing, change management, and post-implementation review. The Operational Impact theme identified efficiency gains, predictive analytics, maintenance scheduling, resource optimization, and quality improvement as significant outcomes. The Challenges and Barriers theme underscored technological challenges, financial constraints, organizational resistance, skill gaps, data management issues, and the necessity of vendor support. The findings indicate that predictive maintenance technologies significantly enhance operational efficiency in manufacturing startups by reducing downtime, increasing productivity, and optimizing resource utilization.
Authentic and Fake Reviews Recognition on E-Commerce Websites through Sentiment Analysis and Machine Learning Techniques(مقاله علمی وزارت علوم)
The proliferation of e-commerce has led to an overwhelming volume of customer reviews, posing challenges for consumers who seek reliable product evaluations and for businesses concerned with the integrity of their online reputation. This study addresses the critical problem of detecting fake reviews by developing a comprehensive framework that integrates Natural Language Processing (NLP) and machine learning techniques. Our methodology centers on sentiment analysis to discern the emotional valence of reviews, coupled with Part-of-Speech (PoS) tagging to analyze linguistic patterns that may signal deception. We meticulously extract a rich set of textual and statistical features, providing a robust basis for our predictive models. To enhance classification performance, we strategically employ both traditional machine learning algorithms and powerful ensemble techniques. Experimental results underscore the efficacy of our approach in detecting fraudulent reviews. We achieved a notable F1-Score of 82.9% and an accuracy of 82.6%, demonstrating the potential to safeguard consumers from misleading information and protect businesses from unfair practices.
Tools for Consumer Preference Analysis Based in Machine Learning(مقاله علمی وزارت علوم)
حوزههای تخصصی:
Today, users generate various data increasingly using the Internet when choosing a product or service. This leads to the generation of data about the purchases and services of various consumers. In addition, consumers often leave feedback about the purchase. At the same time, consumers discuss their attitudes about goods and services on social networks, messengers, thematic sites, etc. This leads to the emergence of large volumes of data that contain useful information about various manufacturers of goods and services. Such information can be useful to both ordinary users and large companies. However, it is practically impossible to use this information due to the fact that it is located in different places, that is, it has a raw, unstructured character. At the same time, depending on the target group of users, not the entire data set is needed, but a specific target sample. To solve this problem, it is necessary to have a tool for structuring information arrays and their further analysis depending on the set goal. This can be done with the help of various frameworks that use methods of machine learning and work with data. This work is devoted to elucidating the problem of creating means for evaluating consumer preferences based on the analysis of large volumes of data for its further use by the target audience. The goal of the development of big data analysis systems is obtaining new, previously unknown information. The methodology of application of algorithms of work with large data sets and methods of machine learning is used, namely the pandas library for operations on a data set and logistic regression for information classification As a result, a system was built that allows the analysis of lexical information, translate it into numerical format and create on this basis the necessary statistical samples. The originality of the work lies in the use of specialized libraries of data processing and machine learning to create data analysis systems. The practical value of the work lies in the possibility of creating data analysis systems built using specialized machine learning libraries.
Developing Financial Distress Prediction Models Based on Imbalanced Dataset: Random Undersampling and Clustering Based Undersampling Approaches(مقاله علمی وزارت علوم)
حوزههای تخصصی:
So far, distress prediction models have been based on balanced, such sampling is not consistent with the reality of the statistical community of companies. If the data are balanced, the bias in sample selection may lead to an underestimation of typeI error and an overestimation of the typeII error of models. Although imbalanced data-based models are compatible with reality, they have a higher typeI error compared to balanced data-based models. The cost of typeI error is more important to Beneficiaries than the cost of typeII error. In this study, for reducing typeI error of imbalanced data-based models, random and clustering-based undersampling were used. Tested data included 760 companies since 2007-2007 with 4 different degrees and the results of the H1 to H3 test represented them. In all cases of the typeI error, typeII error of balanced data-based models were lower and more, respectively, compared to imbalanced data-based models; also, in most cases, the geometric mean of balanced data-based models was higher compared to imbalanced data-based models, respectively. The results of testing H4 to H6 show that in most cases, typeI error, typeII error and the geometric mean criterion of models based on modified imbalanced data were less, more, and more, respectiively compared to the models based on imbalanced data, in other words, applying Undersampling methods on imbalanced training data led to a decrease in typeI error and an increase in typeII error and geometric mean criteria. As a result using models based on modified imbalanced data is suggested to Beneficiaries
Early Warning Model for Solvency of Insurance Companies Using Machine Learning: Case Study of Iranian Insurance Companies(مقاله علمی وزارت علوم)
حوزههای تخصصی:
Stakeholders of an organization avoid undesirable outcomes caused by ignoring the risks. Various models and tools can be used to predict future outcomes, aiming to avoid the undesirable ones. Early warning models are one of the approaches that could help them in doing so. This study focuses on developing an early warning system using machine learning algorithms for predicting solvency in the insurance industry. This study analyses 23 financial ratios from Iranian general insurance companies listed on the Tehran Stock Exchange between 2015 and 2020. The model uses Decision Tree, Random Forest, Artificial Neural Networks, Gradient Boosting Machine and XGBoost algorithms, with Boruta as a feature selection method. The dependent variable is the solvency margin ratio, and the other 22 ratios are the independent variables, which Boruta reduces to 7 variables. Firstly, the performance of the machine learning models on two datasets, one with 22 independent variables and one with 7, is compared based on RMSE values. The XGBoost algorithm performs the best on both data sets. Additionally, the study predicts the 2020 values for 19 insurance companies, performs stage classifications, and compares actual stages to predicted stages. In this analysis, Random Forest has the best estimate accuracy on both data sets, while Gradient Boosting Machine has the best estimate accuracy on the Boruta data set. Finally, the study compares the machine learning models' results in terms of capital adequacy classification, where Random Forest performs the best on both data sets, and Gradient Boosting Machine on the Boruta data set.
Examining Financial Performance and Corporate Governance in Tehran Stock Exchange: A Hybrid Machine Learning and Data Envelopment Analysis Approach(مقاله علمی وزارت علوم)
حوزههای تخصصی:
In the backdrop of an ever-evolving global business landscape and intense market competition, companies are faced with the imperative of strategically managing factors that influence their financial performance. This research delves into the intricate relationship between financial performance enhancement and corporate governance, with particular attention to the mediating role of human capital. The study centers its investigation on companies listed on the Tehran Stock Exchange and comprises a comprehensive sample of 140 top-level managers. A composite sampling approach, comprising a simple random sampling technique and Morgan's table, was employed to judiciously select a representative cohort of 103 participants. In the pursuit of rigorous academic analysis, the research leverages a goal-oriented, applied methodology, employing a descriptive survey design and a quantitative approach. The primary data for the study were methodically collected through rigorously designed and standardized questionnaires. Subsequent to data acquisition, a meticulous analytical process was undertaken using the Partial Least Squares (PLS) software, aligning with the latest developments in quantitative research techniques. The results stemming from hypothesis testing offer compelling insights into the dynamic relationship between corporate governance, human capital, and financial performance enhancement. Our findings convincingly demonstrate a significant positive impact of both corporate governance and human capital on the enhancement of financial performance in the context of Tehran Stock Exchange's listed companies. Furthermore, the empirical evidence strongly suggests that human capital plays a pivotal mediating role in the relationship between corporate governance practices and financial performance improvements. This study, in its pursuit of academic rigor, underscores the effectiveness of a novel hybrid approach, thoughtfully integrating machine learning and data envelopment analysis, to comprehensively examine the intricate interplay between financial performance enhancement and corporate governance within the context of the Tehran Stock Exchange's listed companies. The study contributes to the evolving body of literature in this domain and provides valuable insights for practitioners, policymakers, and researchers.
Designing a Trading Strategy to Buy and Sell the Stock of Companies Listed on the New York Stock Exchange Based on Classification Learning Algorithms(مقاله علمی وزارت علوم)
حوزههای تخصصی:
This research investigated the development of a stock trading strategy for companies on the New York Stock Exchange (NYSE), a prominent global market. Data was acquired from established libraries and the Yahoo Finance database. The model employed technical analysis indicators and oscillators as input features. Machine learning classification algorithms were used to design trading strategies, and the optimal model was identified based on statistical performance metrics. Accuracy, recall, and F-measure were utilized to evaluate the classification algorithms. Additionally, advanced statistical methods and various software tools were implemented, including Python, Spyder, SPSS, and Excel. The Kruskal-Wallis test was employed to assess the statistical differences between the designed strategies. A sample of 41 actively traded NYSE companies across diverse sectors such as financial services, healthcare, technology, communication services, consumer cyclicals, consumer staples, and energy were chosen using a filter-based approach on June 28th, 2021. The selection criteria included a market capitalization exceeding $200 billion and an average daily trading volume surpassing 1 million shares. Evaluation metrics revealed that the designed random forest trading strategy achieved a good fit with the data and exhibited statistically significant differences from other strategies based on classification learning algorithm.
Enhancing Oncological Diagnosis by Single-Cell ATAC-seq Data for Internet of Medical Things(مقاله علمی وزارت علوم)
Early cancer detection is crucial for improving patient survival rates, as timely intervention greatly enhances treatment efficacy. One promising method for early detection is identifying cancerous cells through the detection of protein-level modifications, which serve as early indicators of malignancy. These protein modifications often result from complex biochemical processes that occurs before visible cellular abnormalities, making them critical targets for diagnostic technologies. In recent years, wireless biomedical sensors have advanced significantly, enabling precisely detecting these protein-level changes. These sensors have the potential to detect cancer at its earliest stages by monitoring the subtle alterations in protein structures and functions that distinguish healthy cells from cancerous ones. As the costs of genetic analysis continue to decrease, the development of Medical Internet of Things (MIoT) devices has become increasingly feasible. These devices are designed to perform real-time analyses of biological specimens—such as blood and urine—by detecting protein-level changes indicative of cancer. In this paper, a new machine learning method based on Extreme Randomized Trees (ERT) is developed to increase the speed of classification of cancerous cells based on single-cell Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq). The proposed method enhances the classification speed of the limited and noisy ATAC-seq data as it requires less computation to determine the best splits at each node of the decision trees. This method can significantly improve near real-time cancer risk assessment using samples collected by MIoT. Our proposed method achieves classification accuracy comparable to state of the art single-cell ATAC-seq data analysis techniques while reducing processing time by 259%, challenged by various low-data scenarios. This approach presents an efficient solution for rapid cancer monitoring within the MIoT framework.
A Combined Approach Of Adasyn And Tomeklink For Anomaly Network Intrusion Detection System Using Some Selected Machine Learning Algorithms(مقاله علمی وزارت علوم)
Securing computer networks against malicious attacks requires an efficient Network Intrusion Detection System (IDS). While machine learning techniques are commonly used for anomaly-based intrusion detection, data imbalance challenges conventional algorithms, leading to biased predictions and reduced accuracy. This study introduces a novel approach that combines ADASYN and Tomek links to address this issue, along with specific machine learning algorithms. ADASYN generates synthetic samples for the minority class to achieve dataset balance, and Tomek links eliminate redundant instances from the majority class. Four supervised machine learning algorithms (Random Forest, J48, Multilayer Perceptron, and Bagging) were assessed on both imbalanced and balanced datasets. Results show Random Forest exhibited 99.67% accuracy, while J48 and Bagging yielded 99.30%, and MLP recorded 98.53%. Notably, Random Forest emerges as a highly effective algorithm for Intrusion Detection, demonstrating flawless accuracy with balanced data. These outcomes highlight the proposed approach's ability to enhance prediction accuracy in network intrusion detection compared to imbalanced datasets, validated through a comparative analysis with state-of-the-art solutions.