This guide helps you to setup Hadoop on a single system also referred to as the Pseudo-Distributed-Mode.
To run the hadoop you require Sun Jdk and open ssh installed on your machine.
1 Installing JDK
a On ubuntu run the following command sudo apt-get install sun-java6-jdk
b On Fedora
Go to the following url:
and download the jdk-6u26-linux-i586-rpm.bin file .
Then go to the downloaded directory and give execute permission . Just double click on the .bin file and just follow the steps.
2 Installing open ssh
a. On ubuntu , run the following command
sudo apt-get install openssh-server openssh-client
b. On fedora
yum install openssh-server openssh-client
Initial setup: Step 1 :
Create a user called hadoop in every system
● on ubuntu
$sudo adduser hadoop
Give the necessary details for the user
● on fedora and centos
login as a root
$passwd hadoop #to set the password to the user hadoop
Step 2 : ● Installing Hadoop on every machine for hadoop user To login as a hadoop user, issue the following command on all machines:
$su -l hadoop and type in the password
1 Download the hadoop from the following link: http://www.apache.org/dyn/closer.cgi/hadoop/core/
2 Create a directory /home/hadoop/hadoop-install in every machine
3 Place the downloaded tar inside hadoop-install
4 Extract the .tar file and the directory created will be hadoop-0.21.0 referred as HADOOPCOMMONHOME from now
Setting up the configurations
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/hadoop-install/hadoop-data store/</value> <description>A base for other temporary directories.</description> </property> <property> <name>fs.default.name</name> <value>hdfs://localhost :54310</value> </property> </configuration>
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapred.job.tracker</name> <value>localhost:54311</value> &l t;/property> </configuration>
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.replication</name> <value>2</value> </property> </configuration>
The masters file should have the entry as: localhost
This is to identify the master for the Hadoop system.
The slaves file should have the entry as: localhost This is to identify the slave for the Hadoop systems. 6. conf/hadoop-env.sh
Uncomment the line where you provide the details about JAVA_HOME. It should be pointing to
sun-jdk. That is as shown below.
7. Setting the environmental variables for JDK and HADOOP
Add the below mentioned lines in .bashrc. First open the file with any editor. It is as shown.
Append the following lines at the end of the file.
export JAVA_HOME=/usr/lib/jvm/java-6-sun-18.104.22.168 export HADOOP_COMMON_HOME=/home/hadoop/hadoop-install/hadoop-0.21.0
To get the immediate effect of .bashrc file, following command must be run.
8. Before starting the hadoop processes:
Execute the command:
$ bin/hdfs namenode -format
The above command is issued for the following reasons:
1) To create an empty filesystem.
2) To initialize the file system with default settings specified in the configuration files.
9. Starting the hadoop processes:
To start dfs use:
See if the namenode is started using the command
The following processes must be running.
To start mapred use:
If we type the command jps, the following processes must be running.
Ready to start your tutorial with us? That's great! Send us an email and we will get back to you as soon as possible!