Big Data Problems & Solutions!

This blog aims to explain everything about what is Big Data, the various problems associated with it, & what can be some of the possible solutions for those problems with some of the facts related to the real companies.

Image for post
Image for post

Ever wondered how much data is being generated in this world daily, either the answer to this question is yes or no in your case, you will definitely learn something new from this blog.

In the present situation, there is a huge problem of Big Data, most of the people in this world have a myth about the term “Big Data”, they think that it is a technology, but in actual it is a problem, & because of this problem, multiple solutions are proposed and there are still some in development.

Before discussing Big Data in detail, some of the facts about the companies related to Big Data are mentioned below, have a look at them, after that Big Data problems and solutions are discussed.

Few Facts related to Big Data!

  • In a single day, around 500 TB data is processed by Facebook.
  • 294 billion emails are sent every day.
  • 500 million tweets are processed every day.
  • 3.5 billion Google searches in a single day.
  • 450 million active users on Whatsapp in a day as of 2018 report, 65 billion Whatsapp messages every day.

Data generated per minute:

Image for post
Image for post

Big Data & its Problems/Challenges

Note: Data exists since the beginning of the Internet phase in human life, but at that time, data generated was too less as compared to the data generated now. That is why at that time this problem of Big Data never existed.

A few years ago, there used to be 3 Big Data Challenges, but now we have 5 challenges. All of those challenges are listed below.

Challenges of Big Data (5 V’s of Big Data)

  1. Volume: This challenge corresponds to the size of the data generated, some facts are already mentioned above. Therefore, to store that much huge data, it is a very big challenge today.

The biggest storage device available is of let's say 100 TB today, but the data generated is very much huge that it cannot be stored in that. Even though if a company is ready to create that much big storage device, but it is not created because of the two reasons, one of them is that it will cost like anything, & the other one is that there will be a problem of I/O i.e. Input & Output.

2. Velocity: It signifies the speed of data generation, the speed on which data is being generated is also a challenge because as fast as the data is generated, more & more storage will be required. It is stated that half of the data present today is been generated in the past 2 years, & every 2 years data will be doubled.

3. Variety: The data present today is of multiple forms, there is not structured data only which is present, multiple types of data is present today. There are 3 categories in which the data present today can be divided & they are listed below:

  • Structured
  • Semi-Structured
  • Unstructured

Because of these multiple categories, multiple tools are required to process each & every type of data efficiently. Therefore, it is also a big challenge.

4. Veracity: This challenge signifies that the data which is obtained, how it can be trusted that the data is correct, maybe the data anybody has is providing wrong information. Therefore, to verify the authenticity of the data is again a very big challenge.

5. Value: It signifies that data is present, but now, there should be some operations/processes implementation is required on that data to identify value from it or some insights from it which can provide benefit to the company/individual.

These are all of the challenges of Big Data.

Solutions to the Big Data

In the present situation, few approaches & tools have been developed to deal with the aforementioned challenges.

The core tool which acts as a base for all the other tools to solve big data challenges is Hadoop. It has its own ecosystem which enables multiple tools integration and their combination provides lot of capability to the system to solve the challenges of Big Data.

Hadoop is a open source tool which has its own distributed storage that has the capability to scale out to multiple commodity hardware machines, this solves the storage problem of the big data. Further for more processing the data we have multiple other tools like MapReduce, Apache Kafka, etc.

Apache spark is the core tool used for data processing, it is 100 times faster than MapReduce and also supports multiple other things like machine learning, graph visualisation library, support of 4 programming languages that are, R, python, Scala(Native to spark), & Java.

This is the brief overview of the tools which are used to solve the challenges of the Big Data. In future blogs, I will definitely cover more topics in detail as well.

I hope my article explains each and everything related to the topic with all the deep concepts and explanations. Thank you so much for investing your time in reading my blog & boosting your knowledge. If you like my work, then I request you to give an applaud to this blog!

Written by

Big Data Enthusiast, have a demonstrated history of delivering large and complex projects. Interested in working in the field of AI and Data Science.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store