مطالب مرتبط با کلیدواژه

Text Classification


۱.

A Deep Learning Model for Classifying Quality of User Replies(مقاله علمی وزارت علوم)

تعداد بازدید : ۶۹۶ تعداد دانلود : ۱۹۳
Q&A forums are designed to help users in finding useful information and accessing high-quality content posted by other users in text forums. Automatically identifying high-quality replies posted in response to the initial posts not only provides users with appropriate content, but also saves their time. Existing methods for classifying user replies based on their quality, try to extract quality features from both the textual content and metadata of the replies. This feature engineering step is a time and labor-intensive task. The current study addresses this problem by proposing new model based on deep learning for detecting quality user replies using only raw textual content. Specifically, we propose a long short-term memory (LSTM) model that exploits the embeddings from language models (ELMo) for representing words as contextual numerical vectors. We compared the effectiveness of the proposed model with four traditional machine learning models on the TripAdvisor for New York City (NYC) and the Ubuntu Linux distribution online forums datasets. Experimental results indicated that the proposed model significantly outperformed the four traditional algorithms on both datasets. Moreover, the proposed model achieved about 16% higher accuracy compared to that obtained by the traditional algorithms trained on both textual and quality dimension features.
۲.

A Movie Recommender System Based on Topic Modeling using Machine Learning Methods(مقاله علمی وزارت علوم)

تعداد بازدید : ۲۰۵ تعداد دانلود : ۱۳۳
In recent years, we have seen an increase in the production of films in a variety of categories and genres. Many of these products contain concepts that are inappropriate for children and adolescents. Hence, parents are concerned that their children may be exposed to these products. As a result, a smart recommendation system that provides appropriate movies based on the user's age range could be a useful tool for parents. Existing movie recommender systems use quantitative factors and metadata that lead to less attention being paid to the content of the movies. This research is motivated by the need to extract movie features using information retrieval methods in order to provide effective suggestions. The goal of this study is to propose a movie recommender system based on topic modeling and text-based age ratings. The proposed method uses latent Dirichlet allocation (LDA) modelling to identify hidden associations between words, document topics, and the levels of expression of each topic in each document. Machine learning models are then used to recommend age-appropriate movies. It has been demonstrated that the proposed method can determine the user's age and recommend movies based on the user's age with 93% accuracy, which is highly satisfactory.
۳.

MultiCGCN: Multi-Label Text Classification using GCNs and Heterogeneous Graphs(مقاله علمی وزارت علوم)

تعداد بازدید : ۳ تعداد دانلود : ۲
Multi-label text classification is a critical challenge in natural language processing, where the goal is to assign multiple labels to a given document. Recent advances have primarily focused on deep learning approaches, yet many fail to adequately capture the intricate relationships between documents and labels. In this paper, we propose a novel method called MultiCGCN, in which we leverage Graph Convolutional Networks (GCNs) for multi-label text classification by modeling text as a heterogeneous graph. This unified graph incorporates document similarities, label relationships, and document-label associations, enabling the model to effectively capture both document and label dependencies. We transform the multi-label classification problem into a link prediction task, using Term Frequency–Inverse Document Frequency (TF-IDF) for document similarity and applying GCNs to predict label assignments. Our empirical evaluations demonstrate that MultiCGCN achieves a significant performance boost, improving F1 score by 10% over traditional baseline models. This approach opens new avenues for enhancing the accuracy of multi-label classification in various domains.