Home > Software > BIGDATA > HADOOP
Interview Questions   Tutorials   Discussions   Programs   Videos   Discussion   

HADOOP - Guide to setup Hadoop on standalone system




468
views
asked Experts-976 November 19, 2014 12:54 AM  

Guide to setup Hadoop on standalone system


           

1 Answers



 
answered By Experts-976   0  

This guide helps you to setup Hadoop on a single system also referred to as the Pseudo-Distributed-Mode.

Prerequisites:

To run the hadoop you require Sun Jdk and open ssh installed on your machine.

1 Installing JDK

a On ubuntu run the following command sudo apt-get install sun-java6-jdk

b On Fedora

Go to the following url:

http://www.oracle.com/technetwork/java/javase/downloads/jdk-6u26-download- 400750.html

and download the jdk-6u26-linux-i586-rpm.bin file .

Then go to the downloaded directory and give execute permission . Just double click on the .bin file and just follow the steps.

2 Installing open ssh

a. On ubuntu , run the following command

sudo apt-get install openssh-server openssh-client

b. On fedora

yum install openssh-server openssh-client

Initial setup: Step 1 :

Create a user called “hadoop” in every system

● on ubuntu

$sudo adduser hadoop

Give the necessary details for the user

● on fedora and centos

$su
login as a root

$adduser hadoop

$passwd hadoop #to set the password to the user “hadoop”

Step 2 : ● Installing Hadoop on every machine for “hadoop” user To login as a “hadoop” user, issue the following command on all machines:

$su -l hadoop and type in the password

1 Download the hadoop from the following link: http://www.apache.org/dyn/closer.cgi/hadoop/core/

2 Create a directory /home/hadoop/hadoop-install in every machine

3 Place the downloaded tar inside hadoop-install

4 Extract the .tar file and the directory created will be hadoop-0.21.0 referred as HADOOPCOMMONHOME from now

Setting up the configurations

1. conf/core-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hadoop-install/hadoop-data store/</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost :54310</value>
</property>
</configuration>

2. conf/mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
&l t;/property>
</configuration>

3. conf/hdfs-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
 
</configuration>

4. conf/masters

The master’s file should have the entry as: localhost

This is to identify the master for the Hadoop system.

5. conf/slaves

The slave’s file should have the entry as: localhost This is to identify the slave for the Hadoop systems. 6. conf/hadoop-env.sh

Uncomment the line where you provide the details about JAVA_HOME. It should be pointing to

sun-jdk. That is as shown below.

export JAVA_HOME=/usr/lib/jvm/java-6-sun-1.6.0.26

7. Setting the environmental variables for JDK and HADOOP

Add the below mentioned lines in .bashrc. First open the file with any editor. It is as shown.

$gedit ~/.bashrc

Append the following lines at the end of the file.

export JAVA_HOME=/usr/lib/jvm/java-6-sun-1.6.0.26 
export HADOOP_COMMON_HOME=/home/hadoop/hadoop-install/hadoop-0.21.0

To get the immediate effect of .bashrc file, following command must be run.

$source ~/.bashrc

8. Before starting the hadoop processes:

Execute the command:

$ bin/hdfs namenode -format

The above command is issued for the following reasons:

1) To create an empty filesystem.

2) To initialize the file system with default settings specified in the configuration files.

9. Starting the hadoop processes:

To start dfs use:

$ bin/start-dfs.sh

See if the namenode is started using the command

$ jps

The following processes must be running.

NameNode

SecondaryNameNode

DataNode

To start mapred use:

$ bin/start-mapred.sh

If we type the command “jps”, the following processes must be running.

SecondaryNameNode

NameNode

DataNode

TaskTracker

JobTracker

flag   
   add comment

Your answer

Join with account you already have

FF

Preview


Ready to start your tutorial with us? That's great! Send us an email and we will get back to you as soon as possible!

Alert