Intro to Hadoop and It’s Core Components

Pardhu Gundlapalli
1 min readFeb 17, 2022

One person can’t do all jobs. To increase work progress allocate works to different people

Before understanding what Hadoop is, let me know answer for this question

“Does your computer can handle GBs or TBs of Data?”

Business Scenario 1:

When we have 900 MB data, we would like Data Analytics and Machine Learning. Does traditional hardwires capable of handling 900 MB Data?

As a Data Engineer or Data Scientist, how do handle the situation?

So, how Hadoop helping us to handling huge or vast amount of data.

Hadoop is an open-source software that help us to Store and Processing large amount of data. It provides a software framework for distributed storage and processing of big data using its core components. it is a framework that allows for the Distributed Processing of large data sets across clusters of computers using simple programming models

Wait… What is store which is like DATABASE

Is Hadoop DATABASE like MySQL, PostgreSQL, Oracle, Mango DB etc.,?

No. HADOOP is a not Data Base. The principle of Hadoop is Distributed File System that can store and processes a huge amount of data clusters across the computers

Core Components of Hadoop

Hadoop Ecosystem:

  1. HDFS (Hadoop Distributed File System)
  2. MapReduce
  3. YARN
  4. HBase
  5. Pig
  6. Hive
  7. Sqoop
  8. Flume
  9. Kafka
  10. Zookeeper

Thanks for reading. In next blog we will talk about in-depth understanding of each component with business use case

--

--

Pardhu Gundlapalli

Passionate about AI Technology. Currently Pursuing Masters in Artificial Intelligence👨🏻‍💻