توسعه یک روش انتخاب مشخصه مبتنی بر نظریه اطلاعات و الگوریتم ژنتیک (مقاله علمی وزارت علوم)

درجه علمی: نشریه علمی (وزارت علوم)

نویسندگان: مهدی جباری جلال رضایی نور امیرحسین اکبری

منبع: علوم و فنون مدیریت اطلاعات دوره 9 پاییز 1402 شماره 3 (پیاپی 32)

کلیدواژه‌ها: انتخاب مشخصه پیش پردازش داده تئوری اطلاعات الگوریتم ژنتیک کلاسبند

حوزه‌های تخصصی:

حوزه‌های تخصصی علم اطلاعات و دانش‌شناسی

doi: 10.22091/stim.2023.8708.1877

شماره صفحات: ۳۲ - ۷

دریافت مقاله تعداد دانلود : ۷۴

آرشیو

چکیده

هدف: در مواجهه با مجموعه داده های با ابعاد بالا، کاهش بُعد یک گام پیش پردازشی مهم برای حصول دقت بالا، کارایی و مقیاس پذیری در مسائل کلاسبندی است. هدف تحقیق حاضر ارائه یک روش انتخاب مشخصه در مواجهه با مجموعه داده های با ابعاد بالا، با استفاده از کاهش بُعد و الگوریتم ژنتیک است. روش: در این تحقیق یک الگوریتم ابتکاری توسعه یافته است که با استفاده از یک معیار جدید، اطلاعات متقابل بین ویژگی ها و کلاس هدف را مشخص می کند. در این روش مشخصه های جدید براساس ترکیب یا تبدیل مشخصه های اصلی تولید می شود و به این ترتیب فضای چند بُعدی، به فضایی جدید با ابعاد کمتر نگاشت پیدا می کند. همچنین علاوه بر در نظر گرفتن معیار جدید اطلاعات متقابل، از الگوریتم ژنتیک به منظور بهبود سرعت روش پیشنهادی استفاده شده است. یافته ها: عملکرد این روش بر روی مجموعه داده هایی با ابعاد مختلف، که تعداد مشخصه ها در آن ها از 13 تا 60 متفاوت بوده، ارزیابی شده است. ارزیابی روش پیشنهادی در مقایسه با روش های مشابه، از لحاظ دقت کلاسبند بررسی شده و نتایج نویدبخشی بدست آمد. نتیجه گیری: روش پیشنهادی با روش های MRMR, DISR, JMI, NJMIM در مجموعه داده های متفاوت اعمال شده است. متوسط دقت های به دست آمده از روش پیشنهادی 65.32، 74.51، 70.88 و 58.2 درصد می باشد، که حاکی از کارآمدی روش پیشنهادی است. طبق نتایج بدست آمده، به جز در مورد مجموعه داده sonar که نتیجه ای بهتر از روش پیشنهادی داشته است، متوسط عملکرد روش پیشنهادی بهتر از DISR, JMI, NJMIM و MRMR بوده است.

A Feature Selection Method Based on Information Theory and Genetic Algorithm

Purpose: When dealing with high-dimensional datasets, dimensionality reduction is a crucial preprocessing step to achieve high accuracy, efficiency, and scalability in classification problems. This research aims to introduce a feature selection method for high-dimensional datasets by employing dimensionality reduction and genetic algorithms. Method: In this study, an innovative algorithm has been developed to determine the mutual information between features and the target class using a new criterion. In this method, new characteristics are generated through the combination or transformation of the original characteristics. In this manner, the multi-dimensional space is transformed into a new space with fewer dimensions. In addition to considering the new criterion of mutual information, a genetic algorithm has been employed to enhance the speed of the proposed method. Findings: The performance of this method has been evaluated on datasets of varying dimensions, with the number of features ranging from 13 to 60. The proposed method has been evaluated in comparison to similar methods, focusing on classification accuracy. The results have been promising. Conclusion: The proposed method has been applied using MRMR, DISR, JMI, and NJMIM methods on various datasets. The average accuracies obtained from the proposed method are 65.32%, 74.51%, 70.88%, and 58.2%, indicating the efficiency of the proposed method. According to the results obtained, the proposed method outperformed DISR, JMI, NJMIM, and MRMR on average, except for the sonar data set, where the sonar data set yielded better results than the proposed method.