Journal of Information Technology Management (مدیریت فناوری اطلاعات)

Journal of Information Technology Management (مدیریت فناوری اطلاعات)

Journal of Information Technology Management , Volume 11, Issue 4, 2018 (مقاله علمی وزارت علوم)

مقالات

۱.

The Effect of Transitive Closure on the Calibration of Logistic Regression for Entity Resolution(مقاله علمی وزارت علوم)

نویسنده:

کلید واژه ها: Entity resolution Record linking Machine Learning Logistic regression Transitive closure

حوزه های تخصصی:
تعداد بازدید : 702 تعداد دانلود : 816
This paper describes a series of experiments in using logistic regression machine learning as a method for entity resolution. From these experiments the authors concluded that when a supervised ML algorithm is trained to classify a pair of entity references as linked or not linked pair, the evaluation of the model’s performance should take into account the transitive closure of its pairwise linking decisions, not just the pairwise classifications alone. Part of the problem is that the measures of precision and recall as calculated in data mining classification algorithms such as logistic regression is different from applying these measures to entity resolution (ER) results.. As a classifier, logistic regression precision and recall measure the algorithm’s pairwise decision performance. When applied to ER, precision and recall measure how accurately the set of input references were partitioned into subsets (clusters) referencing the same entity. When applied to datasets containing more than two references, ER is a two-step process. Step One is to classify pairs of records as linked or not linked. Step Two applies transitive closure to these linked pairs to find the maximally connected subsets (clusters) of equivalent references. The precision and recall of the final ER result will generally be different from the precision and recall measures of the pairwise classifier used to power the ER process. The experiments described in the paper were performed using a well-tested set of synthetic customer data for which the correct linking is known. The best F-measure of precision and recall for the final ER result was obtained by substantially increasing the threshold of the logistic regression pairwise classifier.
۲.

Estimating the Parameters for Linking Unstandardized References with the Matrix Comparator(مقاله علمی وزارت علوم)

کلید واژه ها: Entity resolution Record linking Matrix comparator Stop words Token frequency F-measure

حوزه های تخصصی:
تعداد بازدید : 677 تعداد دانلود : 994
This paper discusses recent research on methods for estimating configuration parameters for the Matrix Comparator used for linking unstandardized or heterogeneously standardized references. The matrix comparator computes the aggregate similarity between the tokens (words) in a pair of references. The two most critical parameters for the matrix comparator for obtaining the best linking results are the value of the similarity threshold and the list of stop words to exclude from the comparison. Earlier research has shown that the standard deviation of the token frequency distribution is strongly predictive of how useful stop words will be in improving linking performance. The research results presented here demonstrate a method for using statistics from token frequency distribution to estimate the threshold value and stop word selection likely to give the best linking results. The model was made using linear regression and validated with independent datasets.
۳.

Framework for Prioritizing Solutions in Overcoming Data Quality Problems Using Analytic Hierarchy Process (AHP)(مقاله علمی وزارت علوم)

کلید واژه ها: Data quality Analytical Hierarchy Process AHP Central Statistics Agency the Republic of Indonesia

حوزه های تخصصی:
تعداد بازدید : 711 تعداد دانلود : 93
The Central Statistics Agency (BPS) is a government institution that has the authority to carry out statistical activities in the form of censuses and surveys, to produce statistical data needed by the government, the private sector and the general public, as a reference in planning, monitoring, and evaluation of development results. Therefore, providing quality statistical data is very decisive because it will have an impact on the effectiveness of decision making. This paper aims to develop a framework to determine priority of solutions in overcoming data quality problems using the Analytic Hierarchy Process (AHP). The framework is built by conducting interviews and Focus Group Discussion (FGD) on experts to get the interrelationship between problems and solutions. The model that has been built is then tested in a case study, namely the Central Jakarta Central Bureau of Statistics (BPS). The results of the study indicate that the proposed model can be used to formulate solutions to data problems in BPS.
۴.

Investigating the Role of Code Smells in Preventive Maintenance(مقاله علمی وزارت علوم)

کلید واژه ها: Preventive maintenance Code smells Machine Learning Random Forest

حوزه های تخصصی:
تعداد بازدید : 355 تعداد دانلود : 179
The quest for improving the software quality has given rise to various studies which focus on the enhancement of the quality of software through various processes. Code smells, which are indicators of the software quality have not been put to an extensive study for as to determine their role in the prediction of defects in the software. This study aims to investigate the role of code smells in prediction of non-faulty classes. We examine the Eclipse software with four versions (3.2, 3.3, 3.6, and 3.7) for metrics and smells. Further, different code smells, derived subjectively through iPlasma, are taken into conjugation and three efficient, but subjective models are developed to detect code smells on each of Random Forest, J48 and SVM machine learning algorithms. This model is then used to detect the absence of defects in the four Eclipse versions. The effect of balanced and unbalanced datasets is also examined for these four versions. The results suggest that the code smells can be a valuable feature in discriminating absence of defects in a software.
۵.

Big Data Quality: From Content to Context(مقاله علمی وزارت علوم)

نویسنده:

کلید واژه ها: Big Data Big data quality Data quality text mining

حوزه های تخصصی:
تعداد بازدید : 931 تعداد دانلود : 201
Over the last 20 years, and particularly with the advent of Big Data and analytics, the research area around Data and Information Quality (DIQ) is still a fast growing research area. There are many views and streams in DIQ research, generally aiming at improving the effectiveness of decision making in organizations. Although there are a lot of researches aimed at clarifying the role of BIG data quality for organizations, there is no comprehensive literature review that shows the main differences between traditional data quality researches and Big Data quality researches. This paper analyzed the papers published in Big data quality and find out that there is almost no new mainstream about Big Data quality. It is shown in this paper that the main concepts of data quality does not changes in Big Data context and that only some new issues have been added to this area.
۶.

Perspectives of Big Data Quality in Smart Service Ecosystems (Quality of Design and Quality of Conformance)(مقاله علمی وزارت علوم)

نویسنده:

کلید واژه ها: Big data quality Information Quality smart cities Service design Smart services Data quality model Smart service ecosystem

حوزه های تخصصی:
تعداد بازدید : 491 تعداد دانلود : 321
Despite the increasing importance of data and information quality, current research related to Big Data quality is still limited. It is particularly unknown how to apply previous data quality models to Big Data. In this paper we review Big Data quality research from several perspectives and apply a known quality model with its elements of conformance to specification and design in the context of Big Data. Furthermore, we extend this model and demonstrate it utility by analyzing the impact of three Big Data characteristics such as volume, velocity and variety in the context of smart cities. This paper intends to build a foundation for further empirical research to understand Big Data quality and its implications in the design and execution of smart service ecosystems.

آرشیو

آرشیو شماره ها:
۶۹