Pig
Launch CLI
pigChange execution engine
At CLI launch
pig -x <mode>In script
set exectype=tez;Execute Pig script
pig script.pigLoad data from HDFS
<var> = load 'path/to/file';Load data from HDFS with schema
Load data from Hive
Group data
Transform schema
Filter data
Order
Limit the number of rows
Split
Remove duplicates
Inner join
Right/Left/Full Outer join
Cross join
Join options
Dump data on console
Store data into HDFS
Store data into Hive
Specify nb reducers
Debug
Register a UDF Jar
Use an UDF
Create an alias for function
Import a macro/UDF from another script
Last updated