Pig
Your general purpose analytics pig-tool
Launch CLI
Change execution engine
At CLI launch
In script
mode: local, mapreduce, tez, tez_local
Execute Pig script
Load data from HDFS
Load data from HDFS with schema
Load data from Hive
start pig with hcatalog
Group data
Transform schema
Filter data
Order
Limit the number of rows
Split
Remove duplicates
Inner join
Right/Left/Full Outer join
Cross join
Join options
replicated dosn't work on tez. use mapreduce mdoe
Dump data on console
Store data into HDFS
Store data into Hive
Specify nb reducers
Add 'parallel ' to any reducer operator: group, distinct, order, join
Debug
Register a UDF Jar
Register the jar (eg: PiggyBank.jar).
Use an UDF
Like any other function, just it might require the full package name
Create an alias for function
Import a macro/UDF from another script
Last updated
Was this helpful?