Install, Setup and Run Hadoop 2 (Current Version)
In this section we will setup a Hadoop Cluster.
We will use Centos 6.5 Machine we created just now.
- We will create a new user “hadoop:hadoop” ( you will need root access)
- We will download and install Oracle JDK jdk-8u73-linux-x64.tar.gz
Download and extract Hadoop
- https://hadoop.apache.org/releases.html
- hadoop-2.6.3.tar.gz
- $ tar -xvf hadoop-2.6.3.tar.gz
- $ ln -s hadoop-2.6.3 hadoop2
Environment Setup
- $ vi ~/.bash_profile
- export JAVA_HOME=/home/hadoop/JDK
- export HADOOP_HOME=/home/hadoop/hadoop2
- export HADOOP_MAPRED_HOME=$HADOOP_HOME
- export HADOOP_COMMON_HOME=$HADOOP_HOME
- export HADOOP_HDFS_HOME=$HADOOP_HOME
- export YARN_HOME=$HADOOP_HOME
- export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
- export HADOOP_INSTALL=$HADOOP_HOME
- PATH=$PATH:$HOME/bin:${HADOOP_HOME}:${HADOOP_HOME}/sbin:${HADOOP_HOME}/bin:${JAVA_HOME}:${JAVA_HOME}/bin
- $ . ~/.bash_profile
- we started with jdk-8u73-linux-x64.tar.gz
- Validate Java
- $ java -version
- java version “1.7.0_45”
- OpenJDK Runtime Environment (rhel-2.4.3.3.el6-x86_64 u45-b15)
- OpenJDK 64-Bit Server VM (build 24.45-b08, mixed mode)
- $ jps
- ###some_processid## Jps
- $
- vi ~/hadoop2/etc/hadoop/hadoop-env.sh
- export JAVA_HOME=/home/hadoop/JDK
- Validate Hadoop Setup
- $ hadoop version
- Hadoop 2.6.3
Standalone Operation ( No HDFS)
- http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html
- $ hadoop jar ~/hadoop2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.3.jar wordcount ~/input ~/output2
- $ find ~/output2
- /home/hadoop/output2/part-r-00000
- /home/hadoop/output2/_SUCCESS
- $ tail -5 /home/hadoop/output/part-r-00000
- would 7
- writing, 6
- written 1
- xmlns:xsl=”http://www.w3.org/1999/XSL/Transform” 1
- you 8
- $
Pseudo-Distributed Operation (Single Node Cluster)
- http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html
- http://www.tutorialspoint.com/hadoop/hadoop_enviornment_setup.htm
- vi etc/hadoop/hadoop-env.sh
- export JAVA_HOME=/home/hadoop/JDK
- vi etc/hadoop/core-site.xml
- <configuration>
- <property>
- <name>fs.defaultFS</name>
- <value>hdfs://192.168.1.9:9000</value>
- </property>
- <property>
- <name>hadoop.tmp.dir</name>
- <value>/home/hadoop/data2/tmp</value>
- </property>
- </configuration>
- I used the IP ( you can use localhost )
- vi etc/hadoop/hdfs-site.xml
- <configuration>
- <property>
- <name>dfs.replication</name>
- <value>1</value>
- </property>
- <property>
- <name>dfs.namenode.name.dir</name>
- <value>/home/hadoop/data2/name_node</value>
- </property>
- <property>
- <name>dfs.datanode.data.dir</name>
- <value>/home/hadoop/data2/data_node</value>
- </property>
- <property>
- <name>dfs.namenode.checkpoint.dir</name>
- <value>/home/hadoop/data2/sec_name_name</value>
- </property>
- </configuration>
- cp etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml
- vi etc/hadoop/mapred-site.xml
- <configuration>
- <property>
- <name>mapreduce.framework.name</name>
- <value>yarn</value>
- </property>
- </configuration>
- vi etc/hadoop/yarn-site.xml
- <configuration>
- <property>
- <name>yarn.nodemanager.aux-services</name>
- <value>mapreduce_shuffle</value>
- </property>
- </configuration>
Start Hadoop Single Node Cluster
- $ hdfs namenode –format
- $ start-dfs.sh
- $ start-yarn.sh
- $ mr-jobhistory-daemon.sh start historyserver
- $ jps
- 373 NameNode
- 642 SecondaryNameNode
- 776 ResourceManager
- 868 NodeManager
- 490 DataNode
- 3667 JobHistoryServer
- $ hdfs dfs -mkdir /input
- $ hdfs dfs -put ~/input/* /input
- $ hadoop jar ~/hadoop2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.3.jar wordcount /input /output
- 16/02/13 14:56:26 INFO mapreduce.Job: map 0% reduce 0%
- 16/02/13 15:04:30 INFO mapreduce.Job: map 100% reduce 100%
- $ hdfs dfs -ls -R /output
- -rw-r–r– 1 hadoop supergroup 0 2016-02-13 15:04 /output/_SUCCESS
- -rw-r–r– 1 hadoop supergroup 15992 2016-02-13 15:04 /output/part-r-00000
- $ hdfs dfs -cat /output/part-r-00000 | tail -5
- would 7
- writing, 6
- written 1
- xmlns:xsl=”http://www.w3.org/1999/XSL/Transform” 1
- you 8
- $
Web UI (once security is disabled)
Name Node http://192.168.1.9:50070/
Sec Name Node http://192.168.1.9:50090/
Data Node http://192.168.1.9:50075/blockScannerReport
Resource Mgr http://192.168.1.9:8088/
Job History Tracker http://192.168.1.9:19888/
Node Manager http://192.168.1.9:8042/