Sqoop
Ingest data from RBDMS into HDFS/Hive
Imports
Import into HDFS
sqoop import --driver com.mysql.jdbc.Driver --connect jdbc:mysql://host:port/<database> --username <user> --password <pass> --table <table> [-m 1 / --split-by <col>] [--delete-target-dir]
Import into HDFS from query
sqoop import --driver com.mysql.jdbc.Driver --connect jdbc:mysql://host:port/<db> --query 'select * from <table> where <condition> AND $CONDITIONS' --target-dir <hdfs://user/_dir> -m 1
Import into Hive with automatic schema creation
sqoop import --driver com.mysql.jdbc.Driver --connect jdbc:mysql://localhost/<db> --table <table> --username <user> --password <pass> --fields-terminated-by ',' --hive-import [--hive-overwrite]
Create schema in Hive
Runs the schema creation from the --hive-import
arg, but doesn't load the data
sqoop create-hive-table --driver com.mysql.jdbc.Driver --connect jdbc:mysql://host:port/<database> --username <user> --password <pass> --table <table> --fields-terminated-by ','
Exports
Export into MySQL table with insert
Table must exist before. Doesn't check for duplicates (updates)
sqoop export --driver com.mysql.jdbc.Driver --connect jdbc:mysql://host:port/<db> --username <user> --password <pass> --export-dir <hdfs/dir> --table <table>
Export into MySQL table with update/upsert
Table must exist before. Performs updates/upserts
sqoop export --driver com.mysql.jdbc.Driver --connect jdbc:mysql://host:port/<db> --username <user> --password <pass> --export-dir <hdfs/dir> --table <table> --update-key <col_pka> --update-mode updateonly(default)/allowinsert
Last updated
Was this helpful?