International Journal of Web Research
International Journal of Web Research, Volume 8, Issue 3, 2025 (مقاله علمی وزارت علوم)
مقالات
This study introduces a federated learning-based architecture designed to support highly scalable and decentralized anomaly detection in IoT-integrated aquaponics systems. Emphasizing rigorous data privacy, the framework employs PrefixSpan for sequential pattern mining to extract significant temporal behaviors from heterogeneous distributed datasets. IoT sensors deployed across 11 aquaponic ponds collected extensive datasets, each exceeding 170,000 entries, capturing vital indicators such as temperature, pH, turbidity, and fish growth metrics. The proposed FL model demonstrated strong correlations—exceeding 0.9—between water quality conditions and fish development, validating the system’s predictive robustness. Notably, Pond 6 and Pond 10 yielded 1269 and 1339 sequential patterns respectively, confirming the exceptional scalability of the model. The architecture also achieved a 35% reduction in communication latency compared to conventional centralized systems, enabling responsive and efficient anomaly detection in real time. In parallel, a Top-k mining approach was employed to benchmark pattern interpretability as well as computational efficiency because it revealed trade-offs in sensitivity versus frequency-based simplification. Recent studies that focus upon aquaponics have also validated the operational superiority of the system in anomaly detection that is privacy-aware via comparison across models. The comparison highlighted its alignment to sustainable smart farming objectives. By addressing the limitations of centralized data handling, this framework offers a resilient, scalable, and privacy-aware approach to intelligent aquaponics management.
Dashboard‑Driven Machine Learning Analytics and Conceptual LLM Simulations for IIoT Education in Smart Steel Manufacturing(مقاله علمی وزارت علوم)
Through advanced analytical models such as machine learning (ML) and, conceptually, Large Language Models (LLMs), this study explores how Industrial Internet of Things (IIoT) applications can transform educational experiences in the context of smart steel production. To mitigate the shortage of authentic industrial datasets for research, we developed an industry-validated IIoT educational dataset drawn from three months of operational records at a steel plant and enriched with domain-specific annotations—most notably distinct operational phases. Building on this foundation, we propose an IIoT framework for intelligent steel manufacturing that merges ML-driven predictive analytics (employing Lasso regression to optimize energy use) with LLM-based contextualization of data streams within IIoT environments. At its core, this architecture delivers real-time process monitoring alongside adaptive learning modules, effectively simulating the dynamics of a smart factory. By promoting human–machine collaboration and mirroring quality-control workflows, the framework bridges the divide between theoretical instruction and hands-on industrial practice. A key feature is an interactive decision-support dashboard: this interface presents ML model outcomes and elucidates IIoT measurements—such as metallization levels and H 2 /CO ratios—through dynamic visualizations and scenario-based simulations that invite risk-free exploration of energy-optimization strategies. Such tools empower learners to grasp the intricate multivariate dependencies that govern steel manufacturing processes. Our implementation of the Lasso regression model resulted in a 9% reduction in energy consumption and stabilization of metallization levels. Overall, these findings underscore how embedding advanced analytics within IIoT education can cultivate a more engaging, practice-oriented learning environment that aligns closely with real-world industrial operations.
Weighted Content Similarity Feature for Software Architecture Anti-Patterns Prediction(مقاله علمی وزارت علوم)
As user needs change frequently over time, software systems must evolve; therefore, increased software complexity inevitably violates software engineering principles. The violations of these principles are called anti-patterns, which differ from bugs and faults, and can occur at various levels ofion; finally, they reduce software quality. Anti-patterns can occur in various software, including web applications, and their prediction can effectively help prevent their occurrence. The anti-patterns prediction process at different levels ofion utilizes software features, whose threshold values impact the accuracy of this process. This study presents an improved component-level feature, called weighted content similarity, to more accurately detect component dependencies by minimizing the influence of common words that are often used in comments but are worthless in identifying the relationship between components. Therefore, the comment words are weighted using TF-IDF values. F-Measure values are calculated to show the greater impact of our proposed weighted feature compared to structural, topological, and content similarity features on detecting dependencies between components of an open-source system. The prediction of component anti-patterns, such as cyclic and hub-like dependencies, will be possible with the help of dependency detection. The average F-Measure of topological features in OpenJPA 2.0.0 software is 0.73, content similarity features is 0.76, and weighted content similarity features is 0.88. Therefore, the F-Measure of our weighted content similarity feature is 0.12 higher than the unweighted content similarity feature and is 0.15 higher than the topological feature. So, it is more effective than these two features in predicting dependencies between components using machine learning algorithms.
Risk-Aware Suicide Detection in Social Media: A Domain-Guided Framework with Explainable LLMs(مقاله علمی وزارت علوم)
Nowadays, the close connection between people's lives and social media has led to the emergence of their psychological and emotional states in social media posts. This type of digital footprint creates a rich and novel entry point for early detection of suicide risk. Accurate detection of suicidal ideation is a significant challenge due to the high false negative rate and sensitivity to subtle linguistic features. Current AI-based suicide detection systems are unable to detect linguistic subtleties. These approaches do not consider domain-specific indicators and ignore the dynamic interaction of language, behaviour, and mental health. Identifying lexical and syntactic markers can be a powerful diagnostic lens for diagnosing psychological distress. To address these issues, we propose a new domain-based framework that integrates the specialized frequent-rare suicide vocabulary (FR-SL) into the fine-tuning process of large language models (LLMs). This vocabulary-aware strategy draws the model's attention to common and rare suicide-related phrases and enhances the model's ability to detect subtle signs of distress. In addition to improving performance on various metrics, the proposed framework adds interpretability for understanding and trusting the models' decisions while creating transparency. It also enables the design of a structure that is generalizable to the linguistic and mental health domains. The proposed approach offers clear improvements over baseline methods, especially in terms of reducing false negatives and general interpretability through transparent attribution.
Building Safer Social Spaces: Addressing Body Shaming with LLMs and Explainable AI(مقاله علمی وزارت علوم)
This study tackles body shaming on Reddit using a novel dataset of 8,067 comments from June to November 2024, encompassing external and self-directed harmful discourse. We assess traditional Machine Learning (ML), Deep Learning (DL), and transformer-based Large Language Models (LLMs) for detection, employing accuracy, F1-score, and Area Under the Curve (AUC). Fine-tuned Psycho-Robustly Optimized BERT Pretraining Approach (Psycho-RoBERTa), pre-trained on psychological texts, excels (accuracy: 0.98, F1-score: 0.994, AUC: 0.990), surpassing models like Extreme Gradient Boosting (XG-Boost) (accuracy: 0.972) and Convolutional Neural Network (CNN) (accuracy: 0.979) due to its contextual sensitivity. Local Interpretable Model-agnostic Explanations (LIME) enhance transparency by identifying influential terms like “fat” and “ugly.” A term co-occurrence network graph uncovers semantic links, such as “shame” and “depression,” revealing discourse patterns. Targeting Reddit’s anonymity-driven subreddits, the dataset fills a platform-specific gap. Integrating LLMs, LIME, and graph analysis, we develop scalable tools for real-time moderation to foster inclusive online spaces. Limitations include Reddit-specific data and potential misses of implicit shaming. Future research should explore multi-platform datasets and few-shot learning. These findings advance Natural Language Processing (NLP) for cyberbullying detection, promoting safer social media environments.
Unlocking Book Genre from Covers: A Multimodal Approach to Book Genre Prediction(مقاله علمی وزارت علوم)
In today’s visually driven market, book cover design plays a crucial role in conveying a work’s narrative and thematic essence. A book cover is a multimodal entity, consisting of various visual and textual elements. While conventional recommendation systems have often overlooked the semantic richness of cover imagery, prior work attempting to incorporate textual information relied on OCR to extract text from covers. However, these raw tokens capture only a fraction of the cover's meaning and often miss deeper thematic and narrative cues. Recognizing these limitations, we leverage the advanced knowledge accumulated in VLMs to derive a more comprehensive representation, using this knowledge to add it as an additional feature to the system. In this paper, we use VLM-generated descriptions and integrate these rich descriptions as a new textual feature. Our enhanced corpus comprises 57,000 book covers across 30 genres (1,900 per genre), each annotated with both raw imagery and VLM-generated narrative summaries. We fuse two state-of-the-art vision encoders (ViT and VisionMamba) with a text encoder that processes these VLM descriptions. Experimental results demonstrate a Top 1 accuracy of 63.31% and a Top 3 accuracy of 83.03%, marking a substantial improvement over the state-of-the-art variant and underscoring the value of VLM-derived context in multimodal genre classification.