واکاوی تأثیر برچسب گذاری معنایی در ابهام زدایی هم نویسه های تخصصی از نظر کیفیت بازیابی (معیار F) در بازیابی متون علمی (مقاله علمی وزارت علوم)

درجه علمی: نشریه علمی (وزارت علوم)

نویسندگان: مینا رضایی دینانی

منبع: پژوهشنامه پردازش و مدیریت اطلاعات دوره 38 زمستان 1401 شماره 2 (پیاپی 112)

کلیدواژه‌ها: هم نویسه تخصصی بازیابی اطلاعات برچسب گذاری پیکره متنی معیار اف

حوزه‌های تخصصی:

doi: 10.35050/JIPM010.2022.031

شماره صفحات: ۱۲۳ - ۱۴۸

دریافت مقاله تعداد دانلود : ۲۲۶

آرشیو

چکیده

با توجه به نقش مهم و تعیین کننده واژگان تخصصی در مسیریابی دقیق و کامل پژوهش های علمی، هدف از پژوهش حاضر، تبیین میزان اثربخشی برچسب گذاری معنایی در رفع ابهام هم نویسه های تخصصی و کیفیت بازیابی حاصل از آن بود. این پژوهش از حیث هدف کاربردی و از حیث روش شناسی، از نوع کاربردشناسی تجربی یا پیکره ای است و روشی با نظارت محسوب می شود. از جمله فنون پردازش زبان طبیعی که برای دستیابی به هدف پژوهش به کار گرفته شد تحلیل ریخت شناسی و برچسب گذاری معنایی هم نویسه های تخصصی بود. جامعه پژوهش را 442 مقاله علمی در قالب دو گروه کنترل و آزمون تشکیل دادند. گروه کنترل دارای 221 متن کامل مقاله بدون برچسب و گروه تجربی دارای همان 221 مقاله اما این بار برچسب گذاری شده، بود که در نظام بازیابی اطلاعات برای تبیین اثربخشی برچسب گذاری معنایی در ابهام زدایی هم نویسه های تخصصی و کیفیت بازیابی متون علمی آزموده شدند. سطح معنی داری آزمون ویلکاکسون نشان داد که میزان کیفیت بازیابی نتایج بعد از به کارگیری پیکره تخصصی برچسب گذاری شده نسبت به قبل از آن تفاوت معنی داری دارد. بررسی رتبه های منفی و مثبت نشان داد این میزان به طور معنی داری افزایش یافته و به حد بیشینه آن یعنی 1 رسیده است. به عبارت دیگر در روش آزموده شده ی این پژوهش، فراخوانی و دقت که هر دو در تعیین میزان کیفیت بازیابی (معیار F ) تأثیر دارند در حد بهینه آن یعنی 1 به دست آمد. از یافته های پژوهش حاضر چنین برمی آید که لزوما بین فراخوانی و دقت رابطه معکوس وجود ندارد و این دو می توانند به موازات یکدیگر به حد بیشینه خود برسند. کارایی بهتر نظام بازیابی با استفاده از این رویکرد، به دلیل تجهیز نظام بازیابی به برچسب های موضوعی و در نتیجه آن توانمندسازی این نظام به تمایز موضوعی هم نویسه های تخصصی است. تعبیه مجموعه آموزش در ساختار نظام بازیابی، اطلاعات افزوده ای را فراهم می کند تا در خدمت نظام بازیابی برای تمایز بین معانی متعدد هم نویسه های تخصصی قرار گیرد. این ابزار، یکی از عناصری است که کیفیت بهینه بازیابی را موجب می شود و نظام بازیابی اطلاعات را هنگام بازیابی متون حاوی هم نویسه های تخصصی از بازیابی واژه محور به سمت بازیابی محتوامحور سوق می دهد.

Investigating the Effectiveness of Semantic Tagging in Sense Disambiguation of Specialized Homographs from the perspective of F-Measure in Retrieving scientific texts

The aim of this study was to explain the application of text corpus tagging method in Sense disambiguation from specialized homographs and increasing the retrieval F-Measure of scientific texts containing such homographs. This is an experimental study. Specialized homographs were identified by direct observation and morphological analysis of the word. The research sample consisted of 442 scientific articles of two groups of experimental group and control group. The control group had 221 full-text articles without tags and the experimental group had same 221 tagged articles, which were tested in the information retrieval system to measure the effectiveness of tagging in word sense disambiguation from specialized homographs. The level of significance of the Wilcoxon signed-rank test showed that the F-Measure of retrieval results of specialized homographs after using the tagged specialized text corpus in the information retrieval system is significantly different than before. Examination of negative and positive rankings showed that the F-Measure of the results after using the tagged specialized text corpus has increased significantly and has reached its maximum level of 1. The findings of the present study showed that there is not necessarily an inverse relationship between recall and precision, and the two can reach their maximum level of 1. The better efficiency of the retrieval system using this approach is due to the empowerment of the retrieval system in distinguishing between specialized homographs and identifying their semantic roles by using semantic tags as training data that were considered in the test and training set. Embedding the training set in the structure of the retrieval system provides additional information to serve the retrieval system to distinguish between the various meanings of specialized homographs. This tool is one of the elements that causes the optimal quality of retrieval and leads the information retrieval system from word-driven retrieval to content-driven retrieval when retrieving texts containing specialized homographs .