Feature selection, Model selection and Tuning

Machine learning teaches computers to do what comes naturally to humans and animals: learn from experience. Machine learning algorithms use computational methods to “learn” information directly from data without relying on a predetermined equation as a model.

1 Like

What are all the most used methods of Dimensionality reduction or feature selection. Like for collinearity we use VIF. What all needs to be checked for feature selection and what are the methods available to select from vast range of features

Hi,

Dimensionality reduction and feature selection are two different items to understand.

Dimensionality reduction combines multiple features and reduces the dimensionality without loss of information. However, Feature selection techniques are used in eliminating unwanted features and retaining the best suitable features.

For Dimensionality Reduction, PCA is used and T-SNE is good to visualise the dimensions

For feature selection, You can try regularisation techniques like lasso regression or BORUTA is one feature selection technique using the tree models. Also, you can try variable clustering using varclus.

Hope this is clear.

1 Like

Lol… it’s important to note that machines learn without any human intervention :stuck_out_tongue:

One major way is to check the correlation among the different features & the target variable.A high negative or positive value indicates that the feature could be a strong predictor.
Some methods that are readily available for the same are: -
1)Scikit has feature selection modules that you can use out of which SelectKBest helps you in finding the best features by giving a score to each feature & you can see the top-k features best for your model.Plus if you’re specific on the no. of features you want for your model & the module directly provides with best of the features.
2)Scikit has Recursive Feature Selection module as well that enables you to do the same but it rather than giving a score provides you the ranking nothing much different than SelectKBest.
3)The last but not least is the Ensemble model.Yes just get you’re dataset feed it to an ensemble model of your choice . You can then after training your ensemble get the score for each of the feature & select the best way!!
Now when I spoke this all above methods went for something labelled as ‘Supervised Learning’ wherein obviously the decision was made keeping in mind the target variable. But there’s a different world as well of ‘Unsupervised Learning’ wherein we try to clusterize the data & nothing as target exists. So how do you plan on reducing the dimensions/features there?
Remember what our model was supposed to do in the very first place ? To reduce bias error & variance error.So you can try a lot of ways manually(if you enjoy please go ahead) but why waste time when you’ve something ready . You go on with PCA which actually changes the higher dimensions to lower so that you get a greater variance coverage with reduced variance errors.This also gives out new dimensions plus rankings to them .Select the top guys with higher eigen values(if you’re unaware consider them as scores) & that’s your dimensionality reduction or feature selection!! Hope this helps!!

So let’s talk about what does Feature selection & Dimensionality reduction means??Feature selection is the process of selecting the prominent/dominant parameters of the dataset that affect the target variable.So this features are dimensions to a model.So when we actually try to reduce feature we’re trying to reduce dimensions as well. Dimensionality reduction in a way is enforced by Feature selection.So now why do we do this Feature Selection? Just when our dataset has way too less of datapoints describing each “feature” then the actual number of features we land into a problem called “Curse of dimensionality” & that is as well a basic assumption that we do before regression or clustering to ensure our model gets enough of the random sample from the universal data so as to induce generalization.Now there are two major domains of Machine Learning “Supervised” & “Unsupervised”.Supervised usually takes target variable in consideration & tries to minimize the error to predict the target variable as closely possible.So there are different methods for Feature Selection/Dimensionality Reduction.What PCA basically belongs to is the Unsupervised domain wherein it tries to reduce the dimensionality & covering larger variance so even the dimensions that are created are completely different as well.About Lasso & Ridge those are primarily regularization techniques but the reason Lasso helps with dimensionality reduction is because Lasso’s penalty term is readily able to make the dimensions Zero!! So that’s it. Hope I helped!!