Lets Understand Lasso and Ridge Regression
In general at Regression model we usually come across the overfitting problem to reduce overfitting we tend to do more generalize model .Generalize model is good for solving different business problem statements .
in this blog we look into the how to overcome overfitting problem in regression model .the Regularization is one the method which is used to make model more generalized .
Lasso regression is also called as L1 Regularization, and ridge regression is a L2 Regularization.
Our aim is making the model more generalize for getting good results in respective business problems
Regularization is a method for “constraining” or “regularizing” the size of the coefficients, thus “shrinking” them towards zero.
It reduces model variance and thus minimizes overfitting.
How does regularization works ?
- For a regularized linear regression model, we minimize the sum of RSS and a “penalty term” that penalizes coefficient size.
- Scaling should be performed before otherwise, features would be penalized simply because of their scale.
· Shrinks coefficients all the way to zero, thus making them zero and removing them from the model
· The lasso procedure encourages simple, sparse models (i.e. models with fewer parameters
· L1 regularization, which adds a penalty equal to the absolute value of the magnitude of coefficients.
· This type of regularization can result in sparse models with few coefficients; Some coefficients can become zero and eliminated from the model. this is feature elimination which is not contributing the model performance, thus can be neglected
· Larger penalties result in coefficient values closer to zero.
· If group of predictors are highly correlated, lasso picks only one of them and shrinks the others to zero
Objective = RSS + λ * (sum of absolute value of coefficients)
· RSS refers to ‘Residual Sum of Squares’ which is nothing but the sum of square of errors between the predicted and actual values in the training data set., also known as the sum of squared residuals–essentially determines how well a regression model explains or represents the data in the model.
· A tuning parameter, λ controls the strength of the L1 penalty.
· λ is basically the amount of shrinkage
· λ = 0, no parameters are eliminated.
· The estimate is equal to the one found with linear regression.
· As λ increases, more and more coefficients are set to zero and eliminated ie feature scaling technique
· Theoretically, when λ = ∞, all coefficients are eliminated
· As λ increases, bias increases.as λ decreases, — ->variance increases
Ridge Regression -L2Regrularization
Ridge regression shrinks the value of coefficients but tend to zero but not complete zero i.e. reaches to zero ,in this factor of sum of square of coeffects can added.
if we increases λ (lambda) value imposed restriction on coefficients thus we always look to reduce the lambda λ for more generalization leads to coeffects are can be restricted
- Ridge regression solves the multicollinearity problem through shrinkage parameter λ (lambda)
The magnitude of λ will decide the weightage given to features
- |w|² equal the square of the magnitude of coefficients ,the can be of lambda can be any where from 0 to ∞