This blog aims to explain the difference between one of the most encountered distributions in the Data Science World, i.e., Binomial Distribution & Bernoulli Distributions with real-life examples.

Image for post
Image for post

Whether it be probability, statistics, Data Science, Machine Learning, Deep Learning, or any other likewise field, having the knowledge of the distribution of data is a must or crucial, because it helps in dealing with data.

Since both the distributions (Binomial & Bernoulli) are very confusing at first for most people, they do not try to explore & understand them. Also, one more factor is that, at most of the sources of the content, no real-life examples are given to make the explanation more realistic. …


This blog aims to explain the problem associated with the Dummy Variables, i.e., Dummy Variable Trap. Everything related to the Dummy Variable Trap will be covered starting from the source/origin of this problem to the solution of the problem.

Image for post
Image for post
Source: via Unsplash

Data Science is the hottest & in-demand field today, to become a Data Scientist, it is crucial to know about Data Preprocessing as it is a vital component in the field of Data Science.

While doing Data Preprocessing, every time “Feature Engineering” has to be done. “Dummy Variable Trap” is the problem that occurs in Feature Engineering.

Now, that being said, let’s start this blog & first of all, let’s talk about the origin of Dummy Variable Trap.

Origin/Source of Dummy Variable Trap!

It occurs because of MultiCollinearity. If there is multicollinearity present in the dataset, then definitely Dummy Variable Trap will occur.

If you are unaware of MultiCollinearity, I would request you to please go through the below-mentioned blog on MultiCollinearity, it will cover everything about it. …


This blog aims to explain the MultiCollinearity concept which is very much important in Data Preprocessing, which is, in turn, a part of Data Science or Machine Learning/Deep Learning.

Image for post
Image for post

A huge hype has been created for Data Science in today’s world, but the sad reality is that, due to this hype, most people are not learning the actual concepts which are required, if someone is learning even, then also it is not learning in the right way that is, they do not learn the actual use-case for that.

This problem of rushing behind a buzzword without knowing the actual concepts of it makes it very difficult to actually understand it. In addition to that, this type of learning leaves a very bad impact/impression on society/other interested students. Since interested people observe the pattern of learning of the students who are learning it in the wrong way/approach, they also tend to do the same thing. …


This blog aims to explain the process of creating the multi-node cluster setup of Hadoop using Ansible which is very rarely available. Hadoop version 1.2.1 is used in this blog, you can choose your own version based on your choice.

Image for post
Image for post
Image by Author

In this rapid world, where automation is a crucial aspect for every business, therefore it is very important to provide the correct education for everyone. But, the sad reality is that the way which is used to deliver education is absurd. Most of the sources for education are very abstract which does not provide the content in-depth.

The biggest example of abstraction in education can be taken from the tool used in this blog, i.e., Hadoop, most of the sources will provide you the configuration script which will work after a lot of effort because initially, it is challenging to understand that script. …


This blog aims to explain the process of creating an architecture involving HAProxy & Apache Webserver for Load Balancer & Webserver usage respectively using Ansible!

Image for post
Image for post

In this rapid world, DevOps is a very important technology. It is one of the most demanding technologies in the present, therefore understanding it conceptually is a very important factor today for anyone.

I used “understanding” in the above paragraph because learning is not required, & also learning is not for long term goal, understanding is for the long term goal. So, everyone should focus on understanding the concepts instead of learning them.

In this blog, a complete practical will be covered which includes setting up the load balancer or reverse proxy using HAProxy software, Webserver using the Apache Webserver software, & using Ansible to automate everything. In addition to that, Web servers will be attached/bound to the load balancer dynamically, instead of making their entry in the configuration file of the Load Balancer. …


This blog aims to explain the process of launching a webserver using containerization technology(Docker) and the DevOps tool(Ansible) for automation!

Image for post
Image for post

Automation is been widely adopted & in huge demand these days due to multiple reasons, & DevOps is the key to achieve automation. Some of the reasons for adopting automation are that it enables parallel work to a very large extent, saves time in which multiple other works can be done, is less prone to errors, & saves money, etc.

Automation has established itself in the world completely, everyone is chasing it, there is a huge demand for automation, & that is the reason I have writing this blog to provide a glimpse of the automation world.

Before directly jumping to the practical, let us discuss something about Containerization & Automation tools that are used in the practical. …


This blog aims to explain the difference between the Probability & the Likelihood. This topic is very important to understand, but the problem here is that both the topics are very confusing to understand. That is why, I am writing this blog to remove the confusion, & I will explain the topics in a simple manner as possible.

Image for post
Image for post

I am very much confident that you must have encountered the terms “Probability” & “Likelihood” in your daily life, but you must have found those terms very much confusing & almost similar. For the very first time, if anyone is trying to understand these terms, it might feel like they both are similar, it is difficult to spot/understand the difference between the terms.

No worries, you have come to the right place, this blog will guide you to understand the difference between the Probability & the Likelihood.

Important note!

The biggest problem which restricts someone to understand the concepts of Data Science is the wrong approach towards learning & understanding it. …


This blog aims to explain an effective way to calculate the correlation between the features of a dataset which in turn will help to not only select specific features to improve the model training(remove the curse of dimensionality), but it will also help in improving the model performance.

Image for post
Image for post

In every data science project, Feature Engineering is a very important aspect that needs to be done in order to make an effective model. In any Data Science project, it is very important to select minimum features that are relevant to the target variable/output.

For Feature Selection, there are various techniques, among those techniques, finding correlation is very famous & widely adopted. Finding a correlation between the features of the dataset is a very interesting and important aspect.

I would request to all the readers of this blog, please read my blog on Covariance(if you haven’t already), it will build you fundamentals on correlation, & it will also help you to understand the drawbacks of Covariance which leads us to use Pearson Correlation. …


This blog aims to explain the Covariance which is a very important topic in Feature Engineering in Data Science. In addition to that, this blog will also cover its use-cases, advantages & disadvantages.

Image for post
Image for post

Data Science is a very hot topic at present. Most of the pursuing students are selecting this field as their profession, in addition to that, many corporate guys are also shifting towards this technology by seeing the scope of this field.

Since Data Science is very much famous & a hot topic, that is why it is attracting most people which is an amazing thing, but in contrary to that, most of the guys, when they start learning in this field, they have a feeling to learn it as soon as possible. …


This blog aims to implement the webserver high availability architecture on AWS using AWS CLI. Furthermore, additional EBS storage will be used to make the architecture permanent/highly available, S3 service will be used to store the static objects for the Webserver. Moreover, CloudFront will be used for CDN services.

Image for post
Image for post

Considering any business, high availability is the core requirement, irrespective of the type of business, if the business has to grow, then it has to be available, otherwise business can not grow if the business products are not available.

Also, in addition to availability, there is one more factor that is “how fast the services of the business can be accessed?”, this factor also plays a very important role in business growth in today’s rapid world.

For example, consider online e-commerce company “Flipkart”, if the site of Flipkart is not available then, then Flipkart can not grow, moreover it will vanish from the e-commerce market in few weeks even because it is not available. Now, coming to the speed of availability of the services of Flipkart, consider a case in which the site of Flipkart takes too much time to load, in this case even, Flipkart’s customers will stop accessing the website because they will be irritated/annoyed with this latency in the website loading. …

About

Harshit Dawar

Big Data Enthusiast, have a demonstrated history of delivering large and complex projects. Interested in working in the field of AI and Data Science.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store