🪅
Hadoop tools
  • Introduction
  • Ingestion
    • Sqoop
    • Flume
  • Transformation
    • Pig
    • Hive
    • Spark - Scala
      • Examples
    • Spark - Python
  • NoSQL
    • HBase
  • Big Data Principles
  • Big Data Architectures
Powered by GitBook
On this page
  • Imports
  • Import into HDFS
  • Import into HDFS from query
  • Import into Hive with automatic schema creation
  • Create schema in Hive
  • Exports
  • Export into MySQL table with insert
  • Export into MySQL table with update/upsert

Was this helpful?

  1. Ingestion

Sqoop

Ingest data from RBDMS into HDFS/Hive

Imports

Import into HDFS

sqoop import --driver com.mysql.jdbc.Driver --connect jdbc:mysql://host:port/<database> --username <user> --password <pass> --table <table> [-m 1 / --split-by <col>] [--delete-target-dir]

Import into HDFS from query

sqoop import --driver com.mysql.jdbc.Driver --connect jdbc:mysql://host:port/<db> --query 'select * from <table> where <condition> AND $CONDITIONS' --target-dir <hdfs://user/_dir> -m 1

Import into Hive with automatic schema creation

sqoop import --driver com.mysql.jdbc.Driver --connect jdbc:mysql://localhost/<db> --table <table> --username <user> --password <pass> --fields-terminated-by ',' --hive-import [--hive-overwrite]

Create schema in Hive

Runs the schema creation from the --hive-import arg, but doesn't load the data

sqoop create-hive-table --driver com.mysql.jdbc.Driver --connect jdbc:mysql://host:port/<database> --username <user> --password <pass> --table <table> --fields-terminated-by ','

Exports

Export into MySQL table with insert

Table must exist before. Doesn't check for duplicates (updates)

sqoop export --driver com.mysql.jdbc.Driver --connect jdbc:mysql://host:port/<db> --username <user> --password <pass> --export-dir <hdfs/dir> --table <table>

Export into MySQL table with update/upsert

Table must exist before. Performs updates/upserts

sqoop export --driver com.mysql.jdbc.Driver --connect jdbc:mysql://host:port/<db> --username <user> --password <pass> --export-dir <hdfs/dir> --table <table> --update-key <col_pka> --update-mode updateonly(default)/allowinsert
PreviousIngestionNextFlume

Last updated 5 years ago

Was this helpful?