BIG DATA: Human Business as well as Problem

Let's start with What is big data?

Big Data is known to a huge amount of data, such a huge amount of data cannot be stored and processed using a traditional approach within a given frame of time.

If you talk about the min size then there is no such.

Big data is a relative concept. It totally depends on the use case

Example:

  • Can you send 100mb of the video in WhatsApp? Here 100mb is big data for the user.
  • If you have 4TB of data and requirement to perform some operation lets say to find the most used words. If you do this somehow through your personal pc, It may take at least 2 to 3 days depending upon your spec. It cannot be done in a given time frame

so these are big data.

Bussines of Banks, General Electric, Social media company like Facebook, Instagram, Twitter, Medium, LinkedIn.....many many more never-ending list are totally depends on Data Manipulation.

But How do they process and manage their data?

If you do research on big data

  • 4 petabytes of data are created on Facebook per day
  • 4 terabytes of data are created from each connected car per day
  • 500 million tweets are sent per day
  • 294 billion emails are sent per day
  • 65 billion messages are sent on WhatsApp.
  • Google gets over 3.5 billion searches per day ie If you break this statistic down, it means that Google processes over 40,000 search query every second on average.
  • 1.7MB of data is created every second by every person during 2020.
  • Google is working on is the self-driving car. Using and generating massive amounts of data from sensors, cameras, tracking devices, and coupling this with on-board and realtime data analysis from Google Maps, Streetview and other sources allows the Google car to safely drive on the roads without any input from a human driver.
  • Huge amounts of data are recorded from every aircraft and every aspect of ground operations, which is reported in real-time and targeted specifically to recovering from disruption and returning to a regular schedule

Big data further defined with the following properties associated with it:

  • Volume: The Big word in Big data itself defines the volume. You’re not really in the big data world unless the volume of data is exabytes, petabytes, or more. Big data technology giants like Amazon, Shopify, and other e-commerce platforms get real-time, structured, and unstructured data, lying between terabytes and zettabytes every second from millions of customers especially smartphone users from across the globe. At present, the data existing is in petabytes and is supposed to increase to zettabytes in nearby future. The social networking sites existing are themselves producing data in order of terabytes every day and this amount of data is definitely difficult to be handled using the existing traditional systems.
  •  Velocity: This in Big data deals with the speed of the data coming from various sources and the speed at which the data flows. For example, the data from the sensor devices would be constantly moving to the database store and this amount won’t be small enough. Thus our traditional systems are not capable enough of performing the analytics on the data which is constantly in motion. Have you ever thought about why Google is so fast? Have you ever imagine when you start scrolling on Instagram, how they are able to show the feeds very quickly? 
  • Variety: Variety refers to the different types of data that are getting generated. It not only includes the traditional data but also the semi-structured data from various resources like web Pages, Web Log Files, social media sites, e-mail, documents, sensor devices data both from active-passive devices. These data can be classified into structured, semi structured and even unstructured data.

Big Data Analytics :

Big Data Analytics largely involves collecting data from different sources, munge it in a way that it becomes available to be consumed by analysts, and finally deliver data products useful to the organization business. The process of converting large amounts of unstructured raw data, retrieved from different sources to a data product useful for organizations forms the core of Big Data Analytics.

Every problem have some solution,so what is in this case? Distributed Storage

Distributed Storage

Some of the tools used for big data are:

  • Apache Hadoop
  • Apache Hive
  • Cassandra
  • MongoDB
  • Kafka
  • Spark
  • Splunk

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics