Logistic regression is used to predict the outcome variable which is categorical. A categorical variable is a variable that can take only specific and limited values like gender male or female, yes or not etc.
We have example of students who has studied for specific hours and basis on that they are marked as pass or fail.
Below is the dataset used for the example:
In the previous post, we have seen how to use linear regression method to solve the problem. Let’s use the same linear regression method for the above dataset and plot it.
As per the graph, we can’t see any relation between the pass and fail with the number of hours studied. But let’s try to plot by using our equation of line as used in the previous post.
As per the above output, the linear regression is predicting all the values starting between 0 and more than 1. But we need our answer either in 0 or 1. The predictions given by linear regression algorithm is not matching what we are looking for. So it means we need a better regression line than this which can help us provide the output either on 0 or 1. Not less than 0 and not more than 1.
So logistic regression seems to be the right choice for this example. Most often we want to predict the outcomes in yes or no. In that case we can apply the logistic regression algorithm and get the desired outcome. Logistic regression outcomes always falls between 0 to 1 and it predicts the outcomes in terms of probability also. The more the probability is the more accurate the outcome would be. This can be achieved by using Logistic Function.
Logistic Function is given by
Where L is the maximum Curve’s value, K is the steepness of the curve and x0 is x value of sigmoid’s midpoint.
A standard logistic function is called sigmoid function and let’s substitute the below values in the logistic functions and see what’s the result would be.
K = 1, x0 = 0, L = 1
If we substitute all the above values in the logistic function, we get the below function which is nothing but a sigmoid.
Let’s draw the sigmoid curve and see how it looks alike. The sigmoid is not only using to classify 0 or 1 but along with this it is also telling the probability of certain event whether it is going to occur or not.
Now let’s solve the above example with Logistic Regression and see how the curve looks like.
Now I am trying to predict if student I studying for 8.1 hours will he be fail or pass and what would be its probabilities.
I am getting the answer that this student will be having passing probability of 80% and fail probability of 20%.
No comments:
Post a Comment