# Simple Linear Regression from Scratch!

## This blog aims to explain the complete process of implementing Simple Linear Regression from scratch using Stochastic Gradient Descent, Batch Gradient Descent, & Mini-Batch Gradient Descent as well in python3!

The journey of Machine Learning/Artificial Intelligence/Deep Learning/Data Science/Computer Vision/Natural Language Processing appears to be very overwhelming at the first sight, but the catch here is to understand their depth that most people ignore.

Using the already created libraries like

`sklearn`

or`tensorflow`

or`pytorch`

or any other is very easy, & just after learning using these libraries, giving yourself a title of Machine Learning Engineer or Deep Learning Engineer is a complete disaster/blunder. In reality, what makes you capable of giving this title to yourself is the amount of knowledge in that field, & here, knowledge doesn’t mean the ability to use the already created libraries, here, knowledge means the internal working of the algorithms, ability to create them from scratch. Once a person is able to identify the actual working of the algorithms, then that person can solve any custom problems that can appear to be unsolvable at the first sight, or there is no already built solution for that problem.

Let’s start with the most basic algorithm in the Machine Learning world, that is, `Simple Linear Regression.`

It seems very easy to understand this & use this with the help of the `sklearn`

library, but the actual fun lies in creating this from scratch & that too by using all of the Gradient Descent Techniques that exist.

Now, that being said, let’s start with understanding what Linear Regression is?

## Pre-requisites for the best understanding of this blog:

- Basic Understanding of
**Machine Learning & its types**(no in-depth intuition is required). - Basic understanding of
**Gradient Descent**as a whole, you should be aware of the**weight & bias update process**(no in-depth intuition is required). If you want to understand Gradient Descent as a whole in-depth, refer to my blog mentioned below:

3. Basic understanding of cost functions/loss functions, especially **“mean squared error (MSE)”** for this blog. If you want to understand everything about MSE with its actual significance in Data Science/Machine Learning, refer to my blog mentioned below:

# Linear Regression

In the Machine Learning world, this is the simplest algorithm to understand that falls under the Regression category.

Linear Regression is the algorithm that has a goal to create a Machine Learning model that can provide the best fit line for our Linear Data.

In this algorithm, the equation of the line is used, i.e.,

`y = mx + c`

or`y = wx + b`

. While creating the Machine Learning Model, we have with us“feature” & “target” value that is “x” & “y” respectively.Linear Regression needs to find the “weights” & “biases” that are “m” & “c” or “w” & “b” respectivelyin the present case.

An image is shown below to showcase the sample best fit line to fit the data using linear regression:

Till now, it appears to be very simple to just calculate the parameters of the equation of the line, but, here is the point where actual fun starts.

Using the various Gradient Descent Approaches, we will calculate & optimize the weights & biases for the Linear Regression. Various approaches that are showcased in this blog to calculate the weights & biases are:

1. Stochastic Gradient Descent

2. Batch Gradient Descent

3. Mini-Batch Gradient Descent

Before moving directly to the Gradient Descent, let’s first see the implementation of the Cost/Loss Function used for all of the approaches.

# Cost/Loss Function used for all of the approaches

Mean Squared Error is used as the cost/loss function. Its implementation is present below in python3.

In the above image, a function is created to calculate the mean squared error between the actual values & the predicted values. This function is used for calculating the error/loss for all the approaches explained in this blog.

The parameters used in the above shown function are:

w = weights

b = biases

x = feature

y = target/label

With this function implemented, let’s proceed towards the implementation of Linear Regression with all the Gradient Descent Approaches.

# Linear Regression using Stochastic Gradient Descent

Stochastic Gradient Descent is that version of Gradient Descent that works by **updating the weights & biases on every iteration just by using one record/observation from the total dataset. **This process of updating weights & biases for the complete dataset is stopped when one of the below mentioned two conditions is satisfied:

- The maximum number of iterations is reached.
- Error calculated is less than the set threshold (A threshold is set which means that if we are getting the error in the model prediction below/less than this number i.e., threshold, then we are satisfied. We do this because every time the error between the prediction & actual values can’t be 0).

The implementation of the algorithm is mentioned below:

Now, let’s proceed towards the implementation of Batch Gradient Descent!

# Linear Regression using Batch Gradient Descent

Batch Gradient Descent is that version of Gradient Descent that works by **updating the weights & biases on every iteration by using the complete dataset.** This process of updating weights & biases for the complete dataset is stopped when one of the below mentioned two conditions is satisfied:

- The maximum number of iterations is reached.
- Error calculated is less than the set threshold (A threshold is set which means that if we are getting the error in the model prediction below/less than this number i.e., threshold, then we are satisfied. We do this because every time the error between the prediction & actual values can’t be 0).

The implementation of the algorithm is mentioned below:

Now, let’s proceed towards the implementation of Mini Batch Gradient Descent!

# Linear Regression using Mini Batch Gradient Descent

Mini Batch Gradient Descent is that version of Gradient Descent that works by **updating the weights & biases on every iteration by a subset of the complete dataset (known as a mini-batch).** This algorithm is a mixture of batch & stochastic gradient descent because it is taking a subset of the complete data at a given point of time, whereas in batch, complete data was present at a given point of time, & in stochastic, a single observation is present at a given point of time. This process of updating weights & biases for the complete dataset is stopped when one of the below mentioned two conditions is satisfied:

- The maximum number of iterations is reached.
- Error calculated is less than the set threshold (A threshold is set which means that if we are getting the error in the model prediction below/less than this number i.e., threshold, then we are satisfied. We do this because every time the error between the prediction & actual values can’t be 0).

The implementation of the algorithm is mentioned below:

Now, let’s see an example for all of these implementations with the comparison between all!

Very Important Note: Alwasy apply standardization to the data for getting the best results & fast processing.

# Comparison of all the Gradient Descent Approaches with an example

Let’s create a sample data with NumPy:

Just run the below-given code to create the sample data for Linear Regression:

# Importing Standard Scaler

from sklearn.preprocessing import StandardScaler# Generating the sample data

x = np.arange(50, 150)# Transforming the data using Standard Scaler

x = StandardScaler().fit_transform(x)# Generating the data for linear equation as y = 3x + 9

y = 3 * x + 9# Converting features & target values into 2-D arrays, so that they can be passed into the above functions to calculate the weights & biases according to a particular Gradient Descent Algorithm.

x = x.reshape(-1, 1)

y = y.reshape(-1, 1)

When the above data is plotted, it will look like this:

Now, apply all the functions created above for different Gradient Descent Approaches on the sample data generated above.

After applying, all the outputs are collected, & I have generated their comparison table. Check the table below to get the comparison of all the approaches.

Using Weight & Bias calculated by any of the algorithms, when you try to plot the line, you will get the 100 % perfect model (example of line plotted using the bias & weight calculated through batch gradient descent on the same data generated above is shown below).

Anyone can try this on their own, each algorithm will give you almost the right result, sometimes, the result may differ slightly because of their internal working.

A comparison graph for all the algorithms is mentioned below:

In the above graph, loss/error calculated by all the different approaches of Gradient Descent is present.

Each vertical line represents end of iterations required to calculate weights & biases for a particular Gradient Descent approach. Each algorithm loss/error showing line & line showing iteration ending is having the same colour.

# Conclusion

This article acts as proof that by having the right knowledge, we can create wonders in life.

Happy Learning & Best of luck for your future!

*I hope my article explains each and everything related to the topic with all the detailed concepts and explanations. Thank you so much for investing your time in reading my blog & boosting your knowledge. If you like my work, then I request you to applaud this blog & follow me on *** Medium**,

**,**

*GitHub*

*&**LinkedIn*

*for the more amazing content on multiple technologies and their integration!**Also, subscribe to me on Medium to get the update of all my blogs!*