Interview Questions   Tutorials   Discussions   Programs   Videos   

Pig - Introduction to PIG?




310
views
asked marvit November 23, 2014 07:31 AM  

Introduction to PIG?


           

1 Answers



 
answered By Experts-976   0  

• A scripting language to manipulate large dataset using Hadoop.

• Dataflow language: domain specific

• No control flow(if/then/else)

• Uses an existing Hadoop installation and requires minimal configuration for setting up.

• Supports both interactive and batch mode of execution.

PIG script usage with examples:

  1. Pig Latin Statements A Pig Latin statement is an operator that takes a relation as input and produces another relation as output. They are generally organized in the following manner:

    1. A LOAD statement reads data from the file system.
    2. A series of "transformation" statements process the data
    3. Retrieving Pig Latin Results : A STORE statement writes output to a file on the file system; or, a DUMP statement displays output to the screen.

    1.1. Running Pig Latin

    • Using grunt shell or command line

    • In mapreduce mode or local mode :

    Pig scripts can be run in 2 modes Local Mode: To run the scripts in local mode, no Hadoop or HDFS installation is required. All files are installed and run from your local host and file system. Mapreduce Mode: To run the scripts in mapreduce mode, you need access to a Hadoop cluster and HDFS installation.

    • Either interactively or in batch

.

Eg :
● Grunt Shell - interactive, mapreduce mode (because mapreduce mode is the default you
do not need to specify)
● Grunt Shell - batch, local mode (see the exec and run commands)
$ pig -x local
grunt> exec myscript.pig;
or
grunt> run myscript.pig;
● Command Line - batch, mapreduce mode
$ pig myscript.pig
● Command Line - batch, local mode mode
$ pig -x local myscript.pig

. 1.2 Processing Pig Latin statements

  • Pig validates the syntax and
    semantics of all statements.

  • If Pig encounters a DUMP or STORE, Pig will execute the statements.

.

Eg: Pig will validate, but not execute, the LOAD and FOREACH statements.
A = LOAD 'student' USING PigStorage() AS (name:chararray, age:int, gpa:float);
B = FOREACH A GENERATE name;
Eg: Pig will validate and then execute the LOAD, FOREACH, and DUMP statements.
A = LOAD 'student' USING PigStorage() AS (name:chararray, age:int,gpa:float);
B = FOREACH A GENERATE name;
DUMP B;

. 1.3 Storing Intermediate Data

Pig stores the intermediate data generated between MapReduce jobs in a temporary location on HDFS. This location must already exist on HDFS prior to use. This location can be configured using the pig.temp.dir property. The property's default value is "/tmp"

flag   
   add comment

Your answer

Join with account you already have

FF

Preview

 Write A Tutorials
Online-Classroom Classes
www.writeabc.com


  1 person following this question

  1 person following this tag

  Question tags

pig × 1

Asked 3 years and 19 days ago ago
Number of Views -310
Number of Answers -1
Last updated
3 years and 19 days ago ago

  Similar questions


Ready to start your tutorial with us? That's great! Send us an email and we will get back to you as soon as possible!

Alert