Hive Install Configure Basic Tutorial

Hive Install Configure Basic Tutorial

Pre Requisites – Java

$ java –version ## from jdk-8u73-linux-x64.gz
java version “1.8.0_73”
Java(TM) SE Runtime Environment (build 1.8.0_73-b02)
Java HotSpot(TM) 64-Bit Server VM (build 25.73-b02, mixed mode)
$

Pre Requisites – Working Hadoop ( test MapReduce is working)

$ hadoop version
Hadoop 2.6.3
$

Pre Requisites – Test MapReduce is working

$ hadoop jar ~/hadoop2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.3.jar wordcout /input /output
16/03/26 09:45:18 INFO mapreduce.Job: map 100% reduce 100%
$

Pre Requisites – Test MySQL is working

For more details refer to http://segintechnologies.com/sqoop-install-configure-run-hello-world/
$ mysql –user=root –password=root
mysql>
$ mysql –user=newuser –password=password –host=localhost
mysql> show databases;
## just to see what database are there ( we will see that Hive has created a new database here )

Download and extract Hive

From https://hive.apache.org/downloads.html
apache-hive-1.2.1-bin.tar.gz
$ tar -xvf apache-hive-1.2.1-bin.tar.gz
$ ln –s apache-hive-1.2.1-bin hive

Download and extract and copy MySQL Connector

cp /home/hadoop/sqoop/lib/mysql-connector-java-5.1.38-bin.jar /home/hadoop/hive/lib/.

Configure Hive

$ cd /home/hadoop/hive/conf
$ cp hive-default.xml.template hive-site.xml
$ vi hive-site.xml

<configuration>
	<property>
		<name>javax.jdo.option.ConnectionURL</name>
		<value>jdbc:mysql://localhost/hive_metastore_db?createDatabaseIfNotExist=true</value>
	</property>
	<property>
		<name>javax.jdo.option.ConnectionDriverName</name>
		<value>com.mysql.jdbc.Driver</value>
	</property>
	<property>
		<name>javax.jdo.option.ConnectionUserName</name>
		<value>newuser</value>
	</property>
	<property>
		<name>javax.jdo.option.ConnectionPassword</name>
		<value>password</value>
	</property>
	<property>
		<name>hive.metastore.warehouse.dir</name>
		<value>hdfs://localhost:8020/user/hadoop/warehouse</value>
	</property>
</configuration>

# vi ~/.bash_profile
HIVE_HOME=/home/hadoop/hive
HIVE_CONF_DIR=/home/hadoop/hive/conf
export HIVE_CLASSPATH=${HIVE_CONF_DIR}
export HADOOP_USER_CLASSPATH_FIRST=true
. ~/.bash_profile

Start Hive Services (on 2 new windows)

$ hive/bin/hive –service metastore
Starting Hive Metastore Server
$ hive/bin/hive –service hiveserver2
** does not print any output

$ mysql –user=newuser –password=password –host=localhost
mysql> use hive_metastore_db;
mysql> show tables;
29 rows
We see that hive has created the Metastore

Start Hive

$ hive/bin/hive
hive>

Hive Create Table

hive > create external table hive_emp (col1 string, col2 string) location ‘/user/hadoop/employee4’;
hive> describe hive_emp;
col1 string
col2 string

Hive Describe Formatted

hive> describe formatted hive_emp;
# Detailed Table Information
Owner: hadoop
Location: hdfs://192.168.1.9:9000/user/hadoop/employee4
Table Type: EXTERNAL_TABLE
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
hive>

Hive Runs Map Reduce

hive> select count(*) from hive_emp;
2016-03-26 20:56:07,452 Stage-1 map = 0%, reduce = 0%
2016-03-26 20:56:22,500 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 2.99 sec
hive>

Create a text file and load it into Hive

$ cat /home/hadoop/input/1.txt
pathik*paul
ashok*sharma
imon*paul
ella*paul
$
** Note : Field Separator is Control A

hive> LOAD DATA LOCAL INPATH ‘/home/hadoop/input/1.txt’ OVERWRITE INTO TABLE hive_emp;
Loading data to table default.hive_emp
hive> select * from hive_emp;
pathik paul
ashok sharma
imon paul
ella paul
hive>

Hive Insert

hive> INSERT INTO TABLE hive_emp VALUES (‘James’,’Smith’), (‘Jane’,’Doe’);
hive> select * from hive_emp;

Hive: where does the Inserted data go

$ hdfs dfs -lsr /user/hadoop/employee4

$ hdfs dfs -cat /user/hadoop/employee4/1.txt | od -c
0000000 p a t h i k 001 p a u l \n a s h o
0000020 k 001 s h a r m a \n i m o n 001 p a
0000040 u l \n e l l a 001 p a u l \n
0000055
$

Hive Partitions

hive> create external table hive_emp1 (fname string, lname string) partitioned by (yearofbirth int) location ‘/user/hadoop/employee5’;
hive> INSERT INTO TABLE hive_emp1 PARTITION (yearofbirth = ‘2000’) VALUES (‘James’,’Smith’), (‘Jane’,’Doe’);
hive> INSERT INTO TABLE hive_emp1 PARTITION (yearofbirth = ‘1900’) VALUES (‘Pathik’,’Paul’), (‘Imon’,’Paul’);
hive> select * from hive_emp1;
Pathik Paul 1900
Imon Paul 1900
James Smith 2000
Jane Doe 2000
hive>

$ hdfs dfs -ls -R /user/hadoop/employee5
drwxr-xr-x – hadoop supergroup 0 2016-03-26 22:00 /user/hadoop/employee5/yearofbirth=1900
-rwxr-xr-x 1 hadoop supergroup 22 2016-03-26 22:00 /user/hadoop/employee5/yearofbirth=1900/000000_0
drwxr-xr-x – hadoop supergroup 0 2016-03-26 22:00 /user/hadoop/employee5/yearofbirth=2000
-rwxr-xr-x 1 hadoop supergroup 21 2016-03-26 22:00 /user/hadoop/employee5/yearofbirth=2000/000000_0
$

Hive Views

CREATE VIEW hive_emp_view1 AS SELECT fname, yearofbirth from hive_emp1 where yearofbirth>=1800
hive> select * from hive_emp_view1;
Pathik 1900
Imon 1900
James 2000
Jane 2000

Digression into Unix Patterns

Let us take the /etc/passwd file which has the following fields
Username:Password:UID:GID:UserIdComment:HomeDirectory:DefaultShell

$ head /etc/passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin

Digression into Unix Patterns: Find the Default Shell

cat /etc/passwd | sed ‘s/^.*:\(.*$\)/\1/’
/bin/bash
/sbin/nologin
/sbin/nologin

Digression into Unix Patterns: Find the Home Directory

$ cat /etc/passwd | sed ‘s/^.*:\(.*\):\(.*\):\(.*$\)/\1/’ | head
root
bin
daemon

Digression into Unix Patterns: Why not pick any field

$ cat /etc/passwd | sed ‘s/\(^.*\):\(.*\):\(.*\):\(.*\):\(.*\):\(.*\):\(.*$\)/\5/’ | head
root
bin
daemon

Hive SerDE

CREATE TABLE hive_passwd (
Username string,
Password string,
UID string,
GID string,
UserIdComment string,
HomeDirectory string,
DefaultShell string
)
ROW FORMAT SERDE ‘org.apache.hadoop.hive.serde2.RegexSerDe’
WITH SERDEPROPERTIES (
“input.regex” = “(^.*):(.*):(.*):(.*):(.*):(.*):(.*$)”
)
STORED AS TEXTFILE location ‘/user/hadoop/passwd’;

hive> LOAD DATA LOCAL INPATH “/etc/passwd” INTO TABLE hive_passwd;
hive> select * from hive_passwd order by UID desc LIMIT 3;
** Map Reduce **
nobody x 99 99 Nobody / /sbin/nologin
postfix x 89 89 /var/spool/postfix /sbin/nologin
dbus x 81 81 System message bus / /sbin/nologin
hive> select * from hive_passwd LIMIT 3;
** No Map reduce **
root x 0 0 root /root /bin/bash
bin x 1 1 bin /bin /sbin/nologin
daemon x 2 2 daemon /sbin /sbin/nologin

Ref:
http://www.grymoire.com/Unix/Sed.html
http://www.grymoire.com/Unix/Regular.html
https://cwiki.apache.org/confluence/display/Hive/GettingStarted
http://www.tutorialspoint.com/hive/hive_quick_guide.htm

Author: Pathik Paul

Leave a Reply

Your email address will not be published. Required fields are marked *