Install, setup and run Hadoop 1

Install, setup and run Hadoop 1

In this section we will setup a Hadoop Cluster using Hadoop 1.2.1. Hadoop 1* series is typically referred to as Hadoop 1. This series uses Map Reduce Version 1. Map Reduce V1 was the original implementation before Yarn came in the picture.

These instructions are in many websites we will try to keep it short and simple.

We will use Centos 6.5 Machine we created just now.

Create New User and group for Hadoop (all below commands must be done as root)

Alternatively you can use any pre-existing user account.

# groupadd hadoop
# useradd hadoop –g Hadoop
# passwd hadoop
- Changing password for user hadoop.
- New password:
- Retype new password:
- passwd: all authentication tokens updated successfully.
- #
# su – hadoop

Download and install Java from Oracle’s Website

We are using the below version

jdk-8u73-linux-x64.tar.gz

Steps are

$ tar –xvf jdk-8u73-linux-x64.gz
$ ln –s jdk1.8.0_73 JDK

Download and Install Hadoop 1

We are using the below version

https://archive.apache.org/dist/hadoop/core/hadoop-1.2.1/
hadoop-1.2.1.tar.gz

Steps are

$ tar –xvf hadoop-1.2.1.tar.gz
$ ln –s hadoop-1.2.1 hadoop1

Environment Setup

$ vi ~/.bash_profile
- HADOOP_HOME=/home/hadoop/hadoop1
- JAVA_HOME=/home/hadoop/JDK
- PATH=$PATH:$HOME/bin:${HADOOP_HOME}:${HADOOP_HOME}/bin:${JAVA_HOME}:${JAVA_HOME}/bin
$ . ~/.bash_profile

Validate Java
$ java -version
- java version “1.7.0_45”
- OpenJDK Runtime Environment (rhel-2.4.3.3.el6-x86_64 u45-b15)
- OpenJDK 64-Bit Server VM (build 24.45-b08, mixed mode)
$ jps
- ###some_processid## Jps
$

vi ~/hadoop1/conf/hadoop-env.sh
- export JAVA_HOME=/home/hadoop/JDK

Validate Hadoop Setup
$ hadoop version
- Hadoop 1.2.1

Environment Setup

I had to setup the hostname for this to work with this version of Java you may not need this step
Use “ip a” to find your IP
Run Below command as Root
vi /etc/hosts
- 168.1.9 sample.centosvm.com sample
- Add above line
Validate by running below commands ( you should get some output)
hostname
- sample
hostname –f
- centosvm.com sample

Standalone Operation (no HDFS)

https://hadoop.apache.org/docs/r1.2.1/single_node_setup.html
mkdir ~/input
cp ~/hadoop1/conf/* ~/input
hadoop jar ~/hadoop1/hadoop-examples-1.2.1.jar wordcount ~/input ~/output
$ find ~/output
- /home/hadoop/output/part-r-00000
- /home/hadoop/output/_SUCCESS
$ tail -5 /home/hadoop/output/part-r-00000
- would 7
- writing, 6
- written 1
- xmlns:xsl=”http://www.w3.org/1999/XSL/Transform” 1
- you 8
$

Pseudo-Distributed Operation (Single Node Cluster)

https://hadoop.apache.org/docs/r1.2.1/single_node_setup.html
https://wiki.apache.org/hadoop/GettingStartedWithHadoop
vi ~/hadoop1/conf/hadoop-env.sh
- export JAVA_HOME=/home/hadoop/JDK
vi conf/core-site.xml
- <configuration>
- <property>
- <name>fs.default.name</name>
- <value>hdfs://192.168.1.9:9000</value>
- </property>
- <property>
- <name>hadoop.tmp.dir</name>
- <value>/home/hadoop/data1/tmp</value>
- </property>
- </configuration>
- I used the IP ( you can use localhost )
vi conf/hdfs-site.xml
- <configuration>
- <property>
- <name>dfs.replication</name>
- <value>1</value>
- </property>
- <property>
- <name>dfs.name.dir</name>
- <value>/home/hadoop/data1/name_node</value>
- </property>
- <property>
- <name>dfs.data.dir</name>
- <value>/home/hadoop/data1/data_node</value>
- </property>
- <property>
- <name>fs.checkpoint.dir</name>
- <value>/home/hadoop/data1/sec_name_name</value>
- </property>
- </configuration>
vi conf/mapred-site.xml
- <configuration>
- <property>
- <name>mapred.job.tracker</name>
- <value>192.168.1.9:9001</value>
- </property>
- </configuration>
- I used the IP ( you can use localhost )

Setup pass phrase less ssh (so that you do not have to enter passwords again and again)

$ ssh-keygen
- Generating public/private rsa key pair.
- Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
- Enter passphrase (empty for no passphrase):
- Enter same passphrase again:
- Your identification has been saved in /home/hadoop/.ssh/id_rsa.
- Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
- ::
$ ssh-copy-id `hostname -f`
- hadoop@sample.centosvm.com’s password:

Verify Now:
$ ssh `hostname -f`
$ ssh `hostname`
$ ssh 192.168.1.9 ## using your IP
- They should all work ( say “yes” first time when needed)

Start Hadoop1 Single Node Cluster

https://hadoop.apache.org/docs/r1.2.1/single_node_setup.html
Format a new distributed-filesystem:
$ hadoop namenode –format
Start the hadoop daemons:
$ start-all.sh
$ jps
- 27436 NameNode
- 27667 SecondaryNameNode
- 27768 JobTracker
- 27888 TaskTracker
- 27545 DataNode
$

Run Sample Job to test Hadoop1 Cluster

hadoop dfs –mkdir /input
hadoop dfs –put /home/hadoop/input/* /input
hadoop jar ~/hadoop1/hadoop-examples-1.2.1.jar wordcount /input /output
- 16/02/13 12:59:36 INFO mapred.JobClient: map 0% reduce 0%
- 16/02/13 13:00:38 INFO mapred.JobClient: map 100% reduce 100%
hadoop dfs –lsr /output
- -rw-r–r– 1 hadoop supergroup 0 2016-02-13 13:00 /output/_SUCCESS
- -rw-r–r– 1 hadoop supergroup 15992 2016-02-13 13:00 /output/part-r-00000
$ hadoop dfs -cat /output/part-r-00000 | tail -5
- would 7
- writing, 6
- written 1
- xmlns:xsl=”http://www.w3.org/1999/XSL/Transform” 1
- you 8
- $

Disable Security to Access the Web UI (as Root)

# service iptables stop
# service iptables status
- iptables: Firewall is not running.
#chkconfig iptables –list
#chkconfig iptables off
#chkconfig iptables –list
- All off

Web UIs

Please use your IP (you will find it using a Linux command)
Name Node http://192.168.1.9:50070/
Secondary Name Node http://192.168.1.9:50090/
Job Tracker http://192.168.1.9:50030/
Data Node http://192.168.1.9:50075/blockScannerReport

Segin Technologies