Type: | Package |
Title: | Generate Simple yet Effective Metric of Feature Importance for Classification Problems |
Version: | 0.1.0 |
Description: | An intuitive and explainable metric of Feature Importance for Classification Problems. Resolution Index measures the extent to which a Feature clusters different classes when data is sorted on it. User provides a DataFrame, column name of the Class, sample size and number of iterations used for calculation. Resolution Index for each Feature is returned, which can be effectively used to rank Features and reduce Dimensionality of Training data. For more details on Feature Selection see Theng and Bhoyar (2023) <doi:10.1007/s10115-023-02010-5>. |
License: | LGPL-2 |
Encoding: | UTF-8 |
NeedsCompilation: | no |
Packaged: | 2025-03-17 08:13:17 UTC; 703254631 |
Author: | Anand Jha [aut, cre] |
Maintainer: | Anand Jha <anandorjha18@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2025-03-18 15:10:15 UTC |
RESOLUTION INDEX: A SIMPLE YET EFFECTIVE METRIC OF FEATURE IMPORTANCE FOR CLASSIFICATION PROBLEMS
Description
Provides an intuitive and explainable metric of Feature Importance for Classification Problems. Resolution Index measures the extent to which a Feature clusters different classes when data is sorted on it. User provides a DataFrame, column name of the Class, sample size and number of iterations used for calculation. Resolution Index for each Feature is returned, which can be effectively used to rank Features and reduce Dimensionality of Training data for ehnanced accuracy of Supervised Learning Models.
Usage
ResIndex(df, class, f=0.7, N=5, seed=NULL)
Arguments
df |
R dataframe used for Classification |
class |
Column Name of Class |
f |
Sample size as fraction of total data size |
N |
Number of iterations |
seed |
seed if desired |
Details
Resolution Index internally utilizes a metric called Transition. In a Classification Problem, it is count of instances where the Class changes between consecutive records. An important Feature should reduce Transition much higher than an irrelevant Feature, when data is sorted on it. This Resolution ability of a Feature to segregate different Classes is calculated as a non-dimensional metric and can be used to rank and filter Features in high dimensional data.
Value
Maximum Transition (between 0 and 1) Random Transition (between 0 and 1) Resolution Index (for each Feature)
Author(s)
Anand Jha, Senior Data Scientist
References
D. Theng and K. Bhoyar (2023) “Feature selection techniques for machine learning: a survey of more than two decades of research” <https://doi.org/10.1007/s10115-023-02010-5> J. Tang, S. Alelyani and H. Liu (2014) “Feature selection for classification: A review” <https://doi.org/10.1201/b17320>
Examples
ResIndex(iris, Species)
ResIndex(iris, Species, 0.8, 10)
ResIndex(iris, Species, 0.8, 10, seed=123)