Setup a Hadoop Single Node – Version 1.2.1

Hadoop is a Cluster Software/Structure to create a software working over Map/Reduce algorithm.

 

Recomended settings

  • Ubuntu 14.04 x64 (it’s better a Xubuntu, with XFCE GUI)
  • All repositories and packages up to date.
  • Without Java / JVM installed by repository

 

00. Preparing

Update all Ubuntu packages:

# sudo apt-get update && sudo apt-get upgrade -y

 

01. Setup JVM (Java Virtual Machine)

01.1. – Download JDK 7u40

Download “Java SE Development Kit 7u40“. File format “jdk-7u40-linux-x64.tar.gz“. You will need a free Oracle Account.

http://www.oracle.com/technetwork/java/javase/downloads/java-archive-downloads-javase7-521261.html

01.2. Extract to “/usr/lib/jvm”

Extract “jdk-7u40-linux-x64.tar.gz” content to “/usr/lib/jvm/jdk1.7.0“.

Inside the “/usr/lib/jvm/jdk1.7.0” need exist “bin/” directory like:  “/usr/lib/jvm/jdk1.7.0/bin”.

01.3. Activating JDK as a installed software

# sudo update-alternatives –install “/usr/bin/java” “java” “/usr/lib/jvm/jdk1.7.0/bin/java” 1

# sudo update-alternatives –install “/usr/bin/javac” “javac” “/usr/lib/jvm/jdk1.7.0/bin/javac” 1

# sudo update-alternatives –install “/usr/bin/javaws” “javaws” “/usr/lib/jvm/jdk1.7.0/bin/javaws” 1

 

02. Setup Hadoop

02.1. Download Hadoop 1.2.1

Access and download “hadoop-1.2.1.tar.gz“.

http://www.apache.org/dyn/closer.cgi/hadoop/common

02.2. Extract to “/opt/hadoop-1.2.1”

Extract to “hadoop-1.2.1.tar.gz” to “/opt/hadoop-1.2.1“.

Inside the “/opt/hadoop-1.2.1” need exist “bin/” directory like:  “/opt/hadoop-1.2.1/bin”.

02.3. Set “JAVA_HOME” Enviroment Variable

Add to file “/opt/hadoop-1.2.1/conf/hadoop-env.sh“.

# The java implementation to use.  Required.
export JAVA_HOME=/usr/lib/jvm/jdk1.7.0

 

03. Setup User’s Enviroment Variables

Edit user’s .bashrc file “~/.bashrc” and add to the end:

$ nano ~/.bashrc

# JAVA
export JAVA_HOME="/usr/lib/jvm/jdk1.7.0/"
export PATH="$PATH:$JAVA_HOME/bin"

# Hadoop
export HADOOP_PREFIX="/opt/hadoop-1.2.1"
export PATH="$PATH:$HADOOP_PREFIX/bin"

 

04. Setup SSH

04.1. Install SSH

# sudo apt-get install ssh rsync -y

04.2. Allow SSH access without password

$ ssh localhost
$ ssh-keygen -t dsa -P ” -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
$ exit
$ ssh localhost

 

05. Setup Hadoop Node

Add to respective files.

# nano /opt/hadoop-1.2.1/conf/core-site.xml

<configuration>
     <property>
         <name>fs.default.name</name>
         <value>hdfs://localhost:9000</value>
     </property>
</configuration>

 

# nano /opt/hadoop-1.2.1/conf/hdfs-site.xml

<configuration>
     <property>
         <name>dfs.replication</name>
         <value>1</value>
     </property>
</configuration>

 

# nano /opt/hadoop-1.2.1/conf/mapred-site.xml

<configuration>
     <property>
         <name>mapred.job.tracker</name>
         <value>localhost:9001</value>
     </property>
</configuration>

 

06. Initialize Hadoop Node

6.1. Format Node

At terminal, from SSH, insert:

$ hadoop namenode -format

6.2. Start all Hadoop services

$ start-all.sh

 

07. That’s it!

Access Hadoop overview (NameNode):
http://localhost:50070/

Access JobTracker overview:
http://localhost:50030/

 

How to create a sample project with Hadoop 1.2.1

If you want to create a simple sample project with Hadoop, see my other tutorial at:

http://blog.thenets.org/2015/11/sample-mapreduce-project-with-hadoop-1-2-1/

 

Credits and References

Hadoop Documentation. Stable 1 version:
https://hadoop.apache.org/docs/stable1/single_node_setup.html

Hadoop Single Node Setup by “Veera Sekhar”:
https://www.youtube.com/user/VeeraSekharPonakala/