Home > Software > BIGDATA > HADOOP
Interview Questions   Tutorials   Discussions   Programs   Videos   Discussion   

HADOOP - Hadoop architecture

asked Experts-976 November 16, 2014 02:21 AM  

Hadoop architecture


1 Answers

answered By Experts-976   0  

This section describes the the various components of Hadoop: parts of the MapReduce job process, the handling of the data, and the architecture of the file system.

MapReduce Job Processing

An entire Hadoop execution of a client request is called a job. Users can submit job requests to the Hadoop framework, and the framework processes the jobs. Before the framework can process a job, the user must specify the following:

  • The location of the input and output files in the distributed file system
  • The input and output formats
  • The classes containing the map and reduce functions

Hadoop has four entities involved in the processing of a job:

  • The user, who submits the job and specifies the configuration.
  • Hadoop architectureThe JobTracker, a program which coordinates and manages the jobs. It accepts job submissions from users, provides job monitoring and control, and manages the distribution of tasks in a job to the TaskTracker nodes.[2] Usually there is one JobTracker per cluster.
  • The TaskTrackers manage the tasks in the process, such as the map task, the reduce task, etc. There can be one or more TaskTracker processes per node in a cluster.
  • The distributed file system, such as HDFS.

alt text

The user specifies the job configuration by setting different parameters specific to the job. The user also specifies the number of reducer tasks and the reduce function. The user also has to specify the format of the input, and the locations of the input. The Hadoop framework uses this information to split of the input into several pieces. Each input piece is fed into a user-defined map function. The map tasks process the input data and emit intermediate data. The output of the map phase is sorted and a default or custom partitioning may be applied on the intermediate data. Accordingly, the reduce function processes the data in each partition and merges the intermediate values or performs a user-specified function. The user is expected to specify the types of the output key and the output value of the map and reduce functions. The output of the reduce function is collected to the output files on the disk by the Hadoop framework.

Hadoop Distributed File System (HDFS)

Hadoop can work directly with any mountable distributed file system, but the most common file system used by Hadoop is the Hadoop Distributed File System (HDFS). It is a fault-tolerant distributed file system that is designed for commonly available hardware. It is well-suited for large data sets due to its high throughput access to application data

HDFS has the following features

Hadoop is designed to run on clusters of machines.

  • HDFS can handle large data sets.
  • Since HDFS deals with large scale data, it supports a multitude of machines.
  • HDFS provides a write-once-read-many access model.
  • HDFS is built using the Java language making it portable across various platforms.

alt text

   add comment

Your answer

Join with account you already have



Ready to start your tutorial with us? That's great! Send us an email and we will get back to you as soon as possible!