Install, Setup and Run Hadoop 2 (Current Version)

Install, Setup and Run Hadoop 2 (Current Version)

In this section we will setup a Hadoop Cluster.

We will use Centos 6.5 Machine we created just now.

 

  • We will create a new user “hadoop:hadoop” ( you will need root access)
  • We will download and install Oracle JDK jdk-8u73-linux-x64.tar.gz

 

Download and extract Hadoop

  • https://hadoop.apache.org/releases.html
  • hadoop-2.6.3.tar.gz
  • $ tar -xvf hadoop-2.6.3.tar.gz
  • $ ln -s hadoop-2.6.3 hadoop2

Environment Setup

  • $ vi ~/.bash_profile
    • export JAVA_HOME=/home/hadoop/JDK
    • export HADOOP_HOME=/home/hadoop/hadoop2
    • export HADOOP_MAPRED_HOME=$HADOOP_HOME
    • export HADOOP_COMMON_HOME=$HADOOP_HOME
    • export HADOOP_HDFS_HOME=$HADOOP_HOME
    • export YARN_HOME=$HADOOP_HOME
    • export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
    • export HADOOP_INSTALL=$HADOOP_HOME
    • PATH=$PATH:$HOME/bin:${HADOOP_HOME}:${HADOOP_HOME}/sbin:${HADOOP_HOME}/bin:${JAVA_HOME}:${JAVA_HOME}/bin
  • $ . ~/.bash_profile

 

  • we started with jdk-8u73-linux-x64.tar.gz
  • Validate Java
  • $ java -version
    • java version “1.7.0_45”
    • OpenJDK Runtime Environment (rhel-2.4.3.3.el6-x86_64 u45-b15)
    • OpenJDK 64-Bit Server VM (build 24.45-b08, mixed mode)
  • $ jps
    • ###some_processid## Jps
  • $

 

  • vi ~/hadoop2/etc/hadoop/hadoop-env.sh
    • export JAVA_HOME=/home/hadoop/JDK

 

  • Validate Hadoop Setup
  • $ hadoop version
    • Hadoop 2.6.3

Standalone Operation ( No HDFS)

Pseudo-Distributed Operation (Single Node Cluster)

 

  • vi etc/hadoop/hadoop-env.sh
    • export JAVA_HOME=/home/hadoop/JDK
  • vi etc/hadoop/core-site.xml
    • <configuration>
    • <property>
    • <name>fs.defaultFS</name>
    • <value>hdfs://192.168.1.9:9000</value>
    • </property>
    • <property>
    • <name>hadoop.tmp.dir</name>
    • <value>/home/hadoop/data2/tmp</value>
    • </property>
    • </configuration>
    • I used the IP ( you can use localhost )
  • vi etc/hadoop/hdfs-site.xml
    • <configuration>
    • <property>
    • <name>dfs.replication</name>
    • <value>1</value>
    • </property>
    • <property>
    • <name>dfs.namenode.name.dir</name>
    • <value>/home/hadoop/data2/name_node</value>
    • </property>
    • <property>
    • <name>dfs.datanode.data.dir</name>
    • <value>/home/hadoop/data2/data_node</value>
    • </property>
    • <property>
    •    <name>dfs.namenode.checkpoint.dir</name>
    • <value>/home/hadoop/data2/sec_name_name</value>
    • </property>
    • </configuration>
  • cp etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml
  • vi etc/hadoop/mapred-site.xml
    • <configuration>
    • <property>
    • <name>mapreduce.framework.name</name>
    • <value>yarn</value>
    • </property>
    • </configuration>
  • vi etc/hadoop/yarn-site.xml
    • <configuration>
    • <property>
    • <name>yarn.nodemanager.aux-services</name>
    • <value>mapreduce_shuffle</value>
    • </property>
    • </configuration>

Start Hadoop Single Node Cluster

  • $ hdfs namenode –format
  • $ start-dfs.sh
  • $ start-yarn.sh
  • $ mr-jobhistory-daemon.sh start historyserver
  • $ jps
    • 373 NameNode
    • 642 SecondaryNameNode
    • 776 ResourceManager
    • 868 NodeManager
    • 490 DataNode
    • 3667 JobHistoryServer
  • $ hdfs dfs -mkdir /input
  • $ hdfs dfs -put ~/input/* /input
  • $ hadoop jar ~/hadoop2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.3.jar wordcount /input /output
    • 16/02/13 14:56:26 INFO mapreduce.Job: map 0% reduce 0%
    • 16/02/13 15:04:30 INFO mapreduce.Job: map 100% reduce 100%
  • $ hdfs dfs -ls -R /output
    • -rw-r–r– 1 hadoop supergroup          0 2016-02-13 15:04 /output/_SUCCESS
    • -rw-r–r– 1 hadoop supergroup      15992 2016-02-13 15:04 /output/part-r-00000
  • $ hdfs dfs -cat /output/part-r-00000 | tail -5
    • would 7
    • writing, 6
    • written 1
    • xmlns:xsl=”http://www.w3.org/1999/XSL/Transform” 1
    • you 8
  • $

Web UI (once security is disabled)

Name Node                       http://192.168.1.9:50070/

Sec Name Node                               http://192.168.1.9:50090/

Data Node                          http://192.168.1.9:50075/blockScannerReport

Resource Mgr                    http://192.168.1.9:8088/

Job History Tracker          http://192.168.1.9:19888/

Node Manager                 http://192.168.1.9:8042/

Author: Pathik Paul

Leave a Reply

Your email address will not be published. Required fields are marked *