Install, setup and run Hadoop 1

Install, setup and run Hadoop 1

In this section we will setup a Hadoop Cluster using Hadoop 1.2.1.  Hadoop 1* series is typically referred to as Hadoop 1. This series uses Map Reduce Version 1. Map Reduce V1 was the original implementation before Yarn came in the picture.

These instructions are in many websites we will try to keep it short and simple.

We will use Centos 6.5 Machine we created just now.

 

Create New User and group for Hadoop (all below commands must be done as root)

Alternatively you can use any pre-existing user account.

  • # groupadd hadoop
  • # useradd hadoop –g Hadoop
  • # passwd hadoop
    • Changing password for user hadoop.
    • New password:
    • Retype new password:
    • passwd: all authentication tokens updated successfully.
    • #
  • # su – hadoop

 

Download and install Java from Oracle’s Website

We are using the below version

  • jdk-8u73-linux-x64.tar.gz

Steps are

  • $ tar –xvf jdk-8u73-linux-x64.gz
  • $ ln –s jdk1.8.0_73 JDK

 

Download and Install Hadoop 1

We are using the below version

Steps are

  • $ tar –xvf hadoop-1.2.1.tar.gz
  • $ ln –s hadoop-1.2.1 hadoop1

 

Environment Setup

  • $ vi ~/.bash_profile
    • HADOOP_HOME=/home/hadoop/hadoop1
    • JAVA_HOME=/home/hadoop/JDK
    • PATH=$PATH:$HOME/bin:${HADOOP_HOME}:${HADOOP_HOME}/bin:${JAVA_HOME}:${JAVA_HOME}/bin
  • $ . ~/.bash_profile

 

  • Validate Java
  • $ java -version
    • java version “1.7.0_45”
    • OpenJDK Runtime Environment (rhel-2.4.3.3.el6-x86_64 u45-b15)
    • OpenJDK 64-Bit Server VM (build 24.45-b08, mixed mode)
  • $ jps
    • ###some_processid## Jps
  • $

 

  • vi ~/hadoop1/conf/hadoop-env.sh
    • export JAVA_HOME=/home/hadoop/JDK

 

  • Validate Hadoop Setup
  • $ hadoop version
    • Hadoop 1.2.1

 

Environment Setup

  • I had to setup the hostname for this to work with this version of Java you may not need this step
  • Use “ip a” to find your IP
  • Run Below command as Root
  • vi /etc/hosts
    • 168.1.9 sample.centosvm.com sample
    • Add above line
  • Validate by running below commands ( you should get some output)
  • hostname
    • sample
  • hostname –f
    • centosvm.com sample

 

Standalone Operation (no HDFS)

  • https://hadoop.apache.org/docs/r1.2.1/single_node_setup.html
  • mkdir ~/input
  • cp ~/hadoop1/conf/* ~/input
  • hadoop jar ~/hadoop1/hadoop-examples-1.2.1.jar wordcount ~/input ~/output
  • $ find ~/output
    • /home/hadoop/output/part-r-00000
    • /home/hadoop/output/_SUCCESS
  • $ tail -5 /home/hadoop/output/part-r-00000
    • would 7
    • writing, 6
    • written 1
    • xmlns:xsl=”http://www.w3.org/1999/XSL/Transform” 1
    • you 8
  • $

Pseudo-Distributed Operation (Single Node Cluster)

  • https://hadoop.apache.org/docs/r1.2.1/single_node_setup.html
  • https://wiki.apache.org/hadoop/GettingStartedWithHadoop
  • vi ~/hadoop1/conf/hadoop-env.sh
    • export JAVA_HOME=/home/hadoop/JDK
  • vi conf/core-site.xml
    • <configuration>
    • <property>
    • <name>fs.default.name</name>
    • <value>hdfs://192.168.1.9:9000</value>
    • </property>
    • <property>
    • <name>hadoop.tmp.dir</name>
    • <value>/home/hadoop/data1/tmp</value>
    • </property>
    • </configuration>
    • I used the IP ( you can use localhost )
  • vi conf/hdfs-site.xml
    • <configuration>
    • <property>
    • <name>dfs.replication</name>
    • <value>1</value>
    • </property>
    • <property>
    • <name>dfs.name.dir</name>
    • <value>/home/hadoop/data1/name_node</value>
    • </property>
    • <property>
    • <name>dfs.data.dir</name>
    • <value>/home/hadoop/data1/data_node</value>
    • </property>
    • <property>
    • <name>fs.checkpoint.dir</name>
    • <value>/home/hadoop/data1/sec_name_name</value>
    • </property>
    • </configuration>
  • vi conf/mapred-site.xml
    • <configuration>
    • <property>
    • <name>mapred.job.tracker</name>
    • <value>192.168.1.9:9001</value>
    • </property>
    • </configuration>
    • I used the IP ( you can use localhost )

Setup pass phrase less ssh  (so that you do not have to enter passwords again and again)

  • $ ssh-keygen
    • Generating public/private rsa key pair.
    • Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
    • Enter passphrase (empty for no passphrase):
    • Enter same passphrase again:
    • Your identification has been saved in /home/hadoop/.ssh/id_rsa.
    • Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
    • ::
  • $ ssh-copy-id `hostname -f`
    • hadoop@sample.centosvm.com’s password:

 

  • Verify Now:
  • $ ssh `hostname -f`
  • $ ssh `hostname`
  • $ ssh 192.168.1.9 ## using your IP
    • They should all work ( say “yes” first time when needed)

Start Hadoop1 Single Node Cluster

  • https://hadoop.apache.org/docs/r1.2.1/single_node_setup.html
  • Format a new distributed-filesystem:
  • $ hadoop namenode –format
  • Start the hadoop daemons:
  • $ start-all.sh
  • $ jps
    • 27436 NameNode
    • 27667 SecondaryNameNode
    • 27768 JobTracker
    • 27888 TaskTracker
    • 27545 DataNode
  • $

Run Sample Job to test Hadoop1 Cluster

  • hadoop dfs –mkdir /input
  • hadoop dfs –put /home/hadoop/input/* /input
  • hadoop jar ~/hadoop1/hadoop-examples-1.2.1.jar wordcount /input /output
    • 16/02/13 12:59:36 INFO mapred.JobClient: map 0% reduce 0%
    • 16/02/13 13:00:38 INFO mapred.JobClient: map 100% reduce 100%
  • hadoop dfs –lsr /output
    • -rw-r–r– 1 hadoop supergroup          0 2016-02-13 13:00 /output/_SUCCESS
    • -rw-r–r– 1 hadoop supergroup      15992 2016-02-13 13:00 /output/part-r-00000
  • $ hadoop dfs -cat /output/part-r-00000 | tail -5
    • would 7
    • writing, 6
    • written 1
    • xmlns:xsl=”http://www.w3.org/1999/XSL/Transform” 1
    • you 8
    • $

Disable Security to Access the Web UI (as Root)

  • # service iptables stop
  • # service iptables status
    • iptables: Firewall is not running.
  • #chkconfig iptables –list
  • #chkconfig iptables off
  • #chkconfig iptables –list
    • All off

 

Web UIs

  • Please use your IP (you will find it using a Linux command)
  • Name Node                 http://192.168.1.9:50070/
  • Secondary Name Node http://192.168.1.9:50090/
  • Job Tracker                 http://192.168.1.9:50030/
  • Data Node http://192.168.1.9:50075/blockScannerReport

Author: Pathik Paul

Leave a Reply

Your email address will not be published. Required fields are marked *