Identifying the relevant set of features in a dataset is an important part of data analytics. Discarding significant variables or keeping irrelevant variables has significant effects on the performance of the learning algorithm during knowledge discovery. In this paper, a feature selection method called Least Loss (L2) is proposed that significantly reduces the dimensionality of data by disposing weakly correlated variables in a robust manner without diminishing the predictive performance of classifiers. The proposed method is based on quantifying the similarity between the observed and expected probabilities and generating scores for each independent variable, which makes it simple and intuitive. The evaluation of the proposed method was done by comparing its performance against Information Gain (IG) and Chi Square (CHI) feature selection methods on 27 different datasets modeled using a probabilistic classifier. The results reveal that L2 is highly competitive with respect to error rate, precision, and recall measures while substantially reducing the number of selected variables in the datasets. Our study would be of high interest to data analysts, scholars and domain experts who deal with applications that include large numbers of features using statistical analysis methods. |
Date: Thursday, July 16, 2020 Language: English Downloded 8 times. |