Pig Install Configure Run Hello World

Pig Install Configure Run Hello World

In this blog we will install configure and run a basic pig script.

Versions we used for this exercise
$ hadoop version
Hadoop 2.6.3

$ java –version ## From jdk-8u73-linux-x64.gz
java version “1.7.0_45”
pig:: pig-0.15.0.tar.gz

Pre Requisites
We should have a Running Hadoop cluster
Let us load some data in HDFS and run a simple map reduce job to verify that Hadoop is working
$ jps
2704 NameNode
2989 SecondaryNameNode
2829 DataNode
3209 ResourceManager
3309 NodeManager
4717 JobHistoryServer
$

$ mkdir ~/input
$ cp ~/hadoop2/etc/hadoop/* ~/input/.
$ hdfs dfs -mkdir /input
$ hdfs dfs -put ~/input/* /input

$ hadoop jar /home/hadoop/hadoop-2.6.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.3.jar grep /input /output ‘dfs[a-z.]+’
16/02/28 22:18:19 INFO mapreduce.Job: map 0% reduce 0%
16/02/28 22:19:48 INFO mapreduce.Job: map 100% reduce 100%

$ hdfs dfs -cat /output/part-r-00000 | head -3
1 dfsadmin
1 dfs.server.namenode.
1 dfs.replication
$

Download Pig
https://pig.apache.org/releases.html
http://mirror.cc.columbia.edu/pub/software/apache/pig/pig-0.15.0/
pig-0.15.0.tar.gz

Install and Setup Environment variables

Installation is simple.
Please note we will need to “export” JAVA_HOME
$ tar -xvf soft/pig-0.15.0.tar.gz

$ ln -s pig-0.15.0 pig

$ vi ~/.bash_profile

export JAVA_HOME=/home/hadoop/jdk
export PIG_HOME=/home/hadoop/pig
export PATH=$PATH:$PIG_HOME/bin
$ . ~/.bash_profile

Run Pig in Local Mode

$ pig -x local
2016-02-28 22:22:29,888 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine – Connecting to hadoop file system at: file:///
grunt>
grunt> A = load ‘/etc/passwd’ using PigStorage(‘:’);
grunt> B = foreach A generate $0 as id;
grunt> dump B;
Successfully read 34 records from: “/etc/passwd”
(root)
(bin)
(daemon)
::
(mysql)
grunt>
grunt>
grunt> store B into ‘/home/hadoop/output/id.out’;
Successfully read 34 records from: “/etc/passwd”
Successfully stored 34 records in: “/home/hadoop/output/id.out”
grunt>

$ find /home/hadoop/output/id.out
/home/hadoop/output/id.out/part-m-00000
/home/hadoop/output/id.out/_SUCCESS
$ head /home/hadoop/output/id.out/part-m-00000
root
bin
daemon
$

Run Pig in Distributed Mode

Setup the HDFS Data
$ hdfs dfs -rm -R /input /output
Deleted /input
Deleted /output
$ hdfs dfs -mkdir /input
$ hdfs dfs -put /etc/passwd /input
$

Start Pig

$ pig
2016-02-28 22:35:57,158 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine – Connecting to hadoop file system at: hdfs://192.168.1.10:8020
grunt>
grunt>
grunt>
grunt>
grunt>
grunt> A = load ‘/input/passwd’ using PigStorage(‘:’);
grunt> B = foreach A generate $0 as id;
grunt> dump B;
2016-02-28 22:41:11,896 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – 0% complete
2016-02-28 22:41:27,375 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – 100% complete
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.6.3 0.15.0 hadoop 2016-02-28 22:41:10 2016-02-28 22:41:27 UNKNOWN
Input(s):
Successfully read 34 records (2003 bytes) from: “/input/passwd”

Output(s):
Successfully stored 34 records (390 bytes) in: “hdfs://192.168.1.10:8020/tmp/temp850075013/tmp-822016882”
grunt>
grunt>
grunt> store B into ‘/output/id.out’;
2016-02-28 22:45:31,210 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – 0% complete
2016-02-28 22:45:51,588 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – 100% complete
Input(s):
Successfully read 34 records (2003 bytes) from: “/input/passwd”

Output(s):
Successfully stored 34 records (220 bytes) in: “/output/id.out”
grunt>
grunt>
$
$ hdfs dfs -ls -R /output/id.out
-rw-r–r– 1 hadoop supergroup 0 2016-02-28 22:45 /output/id.out/_SUCCESS
-rw-r–r– 1 hadoop supergroup 220 2016-02-28 22:45 /output/id.out/part-m-00000
$
$ hdfs dfs -cat /output/id.out/part-m-00000 | head -3
root
bin
daemon
$

Pig is a full-fledged scripting language. This tutorial is a “hello world” for pig.

Author: Pathik Paul

One thought on “Pig Install Configure Run Hello World

  1. I don’t even know the way I stopped up here,
    however I assumed this put up was once great.
    I don’t realize who you’re but definitely you’re
    going to a well-known blogger if you happen to are not already.
    Cheers!

Leave a Reply

Your email address will not be published. Required fields are marked *