مطالب مرتبط با کلیدواژه

Natural Language Processing


۱.

Text complexity of reading comprehension passages in the National Matriculation English Test in China: The development from 1996 to 2020

نویسنده:

کلیدواژه‌ها: corpus linguistics High Stakes Exam Natural Language Processing Reading Comprehension Text Complexity

حوزه های تخصصی:
تعداد بازدید : ۳۷۷ تعداد دانلود : ۱۳۷
This study examined the development of text complexity for the past 25 years of reading comprehension passages in the National Matriculation English Test (NMET) in China. Text complexity of 206 reading passages at lexical, syntactic, and discourse levels has been measured longitudinally and compared across the years. The natural language processing tools used in the study included TAALES, TAALED, TAASSC, and TAACO. To compare the differences across the years at various levels of text complexity, ANOVA and MANOVA tests were conducted. The results suggested that lexical level text complexity revealed the most evident changes throughout the years, lexical sophistication, density, and diversity levels of the most recent years of reading passages have increased remarkably compared to the early years. The syntactic level text complexity indicated a moderate elevation toward the recent years of reading passages. For the discourse level text complexity, regarding cohesion, insignificant fluctuation occurred throughout the years and the general trend was not necessarily increasing. Combined, the results indicated that text complexity of the reading comprehension passages in the NMET over the past 25 years had been steadily increasing by including more low frequency and academic vocabulary, diversifying vocabulary in the passages, and complicating sentence and grammatical structures. The results were further examined against the general curriculum standards and guidelines to analyze whether the changes were reflected in the policies. It showed that the exams required a much larger vocabulary size than the number indicated in the guidelines, suggestions for test designers and pedagogical practices were provided accordingly.
۲.

Generation of Syntax Parser on South Indian Language using Bottom-Up Parsing Technique and PCFG(مقاله علمی وزارت علوم)

کلیدواژه‌ها: Natural Language Processing Artificial Intelligence Syntax Parser CYK Parsing Algorithm Probabilistic Context Free Grammar

حوزه های تخصصی:
تعداد بازدید : ۲۱۴ تعداد دانلود : ۱۳۵
In our research, we provide a statistical syntax parsing method experimented on Kannada texts, which is an official language of Karnataka, India. The dataset is downloaded from TDIL website. Using the Cocke-Younger-Kasami (CYK) parsing technique, we generated Kannada Treebank dataset from 1000 annotated sentences in the first stage. The Treebank generated in this stage contains 1000 syntactically structured sentences and it is used as input to train the syntax parser model in the second stage. We have adopted Probabilistic Context Free Grammar (PCFG) while training the parser model and extracting the Chmosky Normal Form (CNF) grammar from a Treebank dataset. The developed syntax parser model is tested on 150 raw Kannada sentences. It outputs with the most likely parse tree for each sentence and this is verified with golden Treebank. The syntax parser model generated 74.2% precision, 79.4% recall, and 75.3% F1-score respectively. The similar technique may be adopted for other low resource languages.
۳.

Preprocessing of Aspect-based English Telugu Code Mixed Sentiment Analysis(مقاله علمی وزارت علوم)

تعداد بازدید : ۱۲۴ تعداد دانلود : ۷۷
Extracting sentiments from the English-Telugu code-mixed data can be challenging and is still a relatively new research area. Data obtained from the Twitter API has to be in English-Telugu code-mixed language. That data is free-form text, noisy, lexicon borrowings, code-mixed, phonetic typing and misspelling data. The initial step is language identification and sentiment class labels assigned to each tweet in the dataset. The second step is the data normalization task, and the final step is classification, which can be achieved using three different methods: lexicon, machine learning, and deep learning. In the lexicon-based approach, tokenize each tweet with its language tag. If the language tag is in Telugu, transliterate the roman script into native Telugu words. Words are verified with TeluguSentiWordNet, and the Telugu sentiments are extracted, and English SentiWordNets are used to extract sentiments from the English tokens. In this paper, the aspect-based sentiment analysis approach is suggested and used with normalized data. In addition, deep learning and machine learning techniques are applied to extract sentiment ratings, and the results are compared to prior work.
۴.

The Study on Qur'anic surahs' Structured-ness and their Order Organization Using NLP Techniques

کلیدواژه‌ها: Natural Language Processing Word2vec Quran Topic Sameness Surahs' Structuredness TF-IDF

حوزه های تخصصی:
تعداد بازدید : ۱۶۲ تعداد دانلود : ۱۱۱
The study of surahs' structure has attracted researchers' attention in recent years. One of the theories herein is the theory of Topic Sameness which acknowledges that each surah of Quran has formed on a single topic. The theory of Introduction and Explanation as one of the most important branches of Topic Sameness, proposes that the Almighty states the topic of each surah at the first section, elaborates it at different parts of the surah in the forms such as stories, signals of nature, and future predictions, and concludes from the stated contents at the final part. In this paper, we accordingly intend to study the two theories using NLP techniques for the first time. In this regard, based on the three methods of tf-idf, word2vec and roots' accompaniment in verses, the similarity of Quranic roots is computed. Then, the amount of similarity of the concepts within surahs to each other is calculated and compared with the random mode. The results show that the studied surahs hold the inner coherence between the concepts so that they have been formed on a single topic or a few topics related to each other. In addition, the study on the similarity between the first and the body sections of each surah shows that the structure of Introduction and Explanation seems to be true for many surahs by the designed methodology. At the end, by comparing the similarity of surahs to each other versus their order distance in Quran and their revelation time distance, we realized that the whole Quran is also relatively organized in terms of surah' ordering.
۵.

Emotion Detection from the Text of the Qur’an Using Advance Roberta Deep Learning Net

کلیدواژه‌ها: Emotion detection Natural Language Processing Transformers Parts of speech Dependency Parsing Qur’an text mining

حوزه های تخصصی:
تعداد بازدید : ۱۱۰ تعداد دانلود : ۸۰
As data and context continue to expand, a vast amount of textual content, including books, blogs, and papers, is produced and distributed electronically. Analyzing such large amounts of content manually is a time-consuming task. Automatic detection of feelings and emotions in these texts is crucial, as it helps to identify the emotions conveyed by the author, understand the author's writing style, and determine the target audience for these texts. The Qur’an, regarded as the word of God and a divine miracle, serves as a comprehensive guide and a reflection of human life. Detecting emotions and feelings within the content of the Qur’an contributes to a deeper understanding of God's commandments. Recent advancements, particularly the application of transformer-based language models in natural language processing, have yielded state-of-the-art results that are challenging to surpass easily. In this paper, we propose a method to enhance the accuracy and generality of these models by incorporating syntactic features such as Parts Of Speech (POS) and Dependency Parsing tags. Our approach aims to elevate the performance of emotion detection models, making them more robust and applicable across diverse contexts. For model training and evaluation, we utilized the Isear dataset, a well-established and extensive dataset in this field. The results indicate that our proposed model achieves superior performance compared to existing models, achieving an accuracy of 77% on this dataset. Finally, we applied the newly proposed model to recognize the feelings and emotions conveyed in the Itani English translation of the Qur’an. The results revealed that joy has the most significant contribution to the emotional content of the Holy Qur’an.