Uncategorized

Kafka with Twitter

Posted on

Kafka with Twitter In the previous post, we setup a single node and then a multi node kafka cluster. We also built and ran custom Producers and Consumers in Scala. In this blog, we will take kafka one step further. We will read live twitter feeds and feed them to Kafa Pre-Requisites All the machines […]

Uncategorized

Simple Kafka Demo

Posted on

Simple Kafka Demo In this blog, we will walk you through a simple Kafka Demo Create 3 Virtual Machines We will create a two node kafka cluster We will use the third node for compile the Java / Scala producers and consumers Please refer to the Vagrant Page if you need more details We used […]

Uncategorized

Create Multiple Linux Machines using Vagrant

Posted on

Create Multiple Linux Machines using Vagrant Any type of demo or learning in the big data area needs multiple machines to be fully configured and setup. The initial setup task of configuring the IPs and connecting all the machines to the internet is fairly laborious and time-consuming. In this post, I will provide simple step […]

Uncategorized

Big Data Meetup : 2016-05-17 : Qubole Pig Oozie Greenplum

Posted on

Big Data: Qubole Pig Oozie Greenplum Tuesday 17th May 7:00pm EST Minnie B Veal Recreation Center, Edison, NJ Topics: 7:00 – 7:10 – Introduction & Recap 7:10 – 7:55 – Qubole -(Speaker : Phil D’Agostino) 7:55 – 8:05 – Break 8:05 – 8:30 – Pig and Oozie (Speaker: Ankur Raj) 8:30 – 9:00 – Greenplum […]

Uncategorized

Hive Install Configure Basic Tutorial

Posted on

Hive Install Configure Basic Tutorial Pre Requisites – Java $ java –version ## from jdk-8u73-linux-x64.gz java version “1.8.0_73” Java(TM) SE Runtime Environment (build 1.8.0_73-b02) Java HotSpot(TM) 64-Bit Server VM (build 25.73-b02, mixed mode) $ Pre Requisites – Working Hadoop ( test MapReduce is working) $ hadoop version Hadoop 2.6.3 $ Pre Requisites – Test MapReduce […]

Uncategorized

Sqoop Install Configure Run Hello World

Posted on

Sqoop Install Configure Run Hello World Sqoop is a tool designed to transfer data between Hadoop and Relational Databases or Mainframes. You can use Sqoop to import data from a Relational Database System (RDBMS) such as MySQL or Oracle or a mainframe into the Hadoop Distributed File System (HDFS), transform the data in Hadoop MapReduce […]

Uncategorized

Pig Install Configure Run Hello World

Posted on

Pig Install Configure Run Hello World In this blog we will install configure and run a basic pig script. Versions we used for this exercise $ hadoop version Hadoop 2.6.3 $ java –version ## From jdk-8u73-linux-x64.gz java version “1.7.0_45” pig:: pig-0.15.0.tar.gz Pre Requisites We should have a Running Hadoop cluster Let us load some data […]