Logistic Regression Algorithm

Avinash K Mishra
4 min readApr 12, 2021

Classification Technique(Using Geometry and Probability interpretation).

Objective : To find the plane(hyperplane) which separates +ve and -ve points.

Approach 1:: Geometry

Let x denotes +ve points
and circle denotes -ve points.

So, y = {+1, -1}
+1 => +ve point
-1 => -ve point

Here points are “almost” linearly separable.

We assumed Plane passed through origin so we had no constant term in the equation.

Generally, there can be many planes which can separate +ve and -ve points but we have to find the best plane P

We will find the distance of each point to the plane

Assume, B = 0 , which means Plane is passing through origin.
And W is a unit vector.

https://avinash-k-mishra.medium.com/distance-of-a-point-to-a-plane-d4b3591e7fb3

Now we know for ideal case above distance
a. If positive then its class y = +1
b. If negative then its class y = -1

Based on Plane(Decision Surface) we can have 4 different cases:

Case 1:

If Y =+1 and classifier correctly classified the point i.e., W¹*X>0

Case 2:

If Y = -1 and classifier correctly classified the point i.e., W¹*X<0

Case 3:

Missclassfied
Misclassified Point

If Y = +1 and classifier classified the point i.e., W¹*X<0

Case 4:

Missclassfied
Misclassified Point

If Y = -1 and classifier classified the point i.e., W¹*X>0

So now our objective is to maximize the number of classified point and minimize number of misclassified point. Thus we can find W so that our objective function reduces to

Signed Distance formula

But the above Objective function will fail in some scenario, Lets’ analyse that

Image 2 is giving +1 as result but here 4 points have been misclassified whereas Image 1 is giving result as -42 but only 1 point is misclassified. Thus we can say that our objective function will choose the wrong plane based on the maximizing signed distance formula.

Squashing Signed distance formula with Sigmoid function
After applying sigmoid function, we are adding a tapering behaviour which will eventually put a threshold on the maximum signed distance i.e., if the signed-distance is very large then we will use +1 and if it is very small then 0 and if the point lies on the plane then 0.5.
This also adds a probabilistic approach to our objective function

To further simply it, lets understand that for any monotonic increasing function, we can say,

log(z) is a monotonic increasing function. Thus we can write our objective function as,

Since we know for any function

And we also know log(1/x) = -log(x)

Regularization
To stop W to tends towards infinity, we have to add a regularization term to our optimal function.(Either L1 or L2)

L1-Regularization (Lambda is the hyper-parameter, L1-Norm of W)

L1 creates sparsity

L2-Regularization (Lambda is the hyper-parameter, Square of L2-Norm of W)

  • if lambda = 0 , function overfits. And when lambda is very large it underfits.

Approach 2:: Probability
Using Naive Bayes and Bernoulli random variable we can deduce

To solve the above optimization problem we will use gradient descent (GD) approach

--

--