Sqoop

Ingest data from RBDMS into HDFS/Hive

Imports

Import into HDFS

sqoop import --driver com.mysql.jdbc.Driver --connect jdbc:mysql://host:port/<database> --username <user> --password <pass> --table <table> [-m 1 / --split-by <col>] [--delete-target-dir]

Import into HDFS from query

sqoop import --driver com.mysql.jdbc.Driver --connect jdbc:mysql://host:port/<db> --query 'select * from <table> where <condition> AND $CONDITIONS' --target-dir <hdfs://user/_dir> -m 1

Import into Hive with automatic schema creation

sqoop import --driver com.mysql.jdbc.Driver --connect jdbc:mysql://localhost/<db> --table <table> --username <user> --password <pass> --fields-terminated-by ',' --hive-import [--hive-overwrite]

Create schema in Hive

Runs the schema creation from the --hive-import arg, but doesn't load the data

sqoop create-hive-table --driver com.mysql.jdbc.Driver --connect jdbc:mysql://host:port/<database> --username <user> --password <pass> --table <table> --fields-terminated-by ','

Exports

Export into MySQL table with insert

Table must exist before. Doesn't check for duplicates (updates)

sqoop export --driver com.mysql.jdbc.Driver --connect jdbc:mysql://host:port/<db> --username <user> --password <pass> --export-dir <hdfs/dir> --table <table>

Export into MySQL table with update/upsert

Table must exist before. Performs updates/upserts

sqoop export --driver com.mysql.jdbc.Driver --connect jdbc:mysql://host:port/<db> --username <user> --password <pass> --export-dir <hdfs/dir> --table <table> --update-key <col_pka> --update-mode updateonly(default)/allowinsert

Last updated

Was this helpful?