# Sqoop

Ingest data from RBDMS into HDFS/Hive

## Imports

### Import into HDFS

```
sqoop import --driver com.mysql.jdbc.Driver --connect jdbc:mysql://host:port/<database> --username <user> --password <pass> --table <table> [-m 1 / --split-by <col>] [--delete-target-dir]
```

### Import into HDFS from query

```
sqoop import --driver com.mysql.jdbc.Driver --connect jdbc:mysql://host:port/<db> --query 'select * from <table> where <condition> AND $CONDITIONS' --target-dir <hdfs://user/_dir> -m 1
```

### Import into Hive with automatic schema creation

```
sqoop import --driver com.mysql.jdbc.Driver --connect jdbc:mysql://localhost/<db> --table <table> --username <user> --password <pass> --fields-terminated-by ',' --hive-import [--hive-overwrite]
```

### Create schema in Hive

Runs the schema creation from the `--hive-import` arg, but doesn't load the data

```
sqoop create-hive-table --driver com.mysql.jdbc.Driver --connect jdbc:mysql://host:port/<database> --username <user> --password <pass> --table <table> --fields-terminated-by ','
```

## Exports

### Export into MySQL table with insert

Table must exist before. Doesn't check for duplicates (updates)

```
sqoop export --driver com.mysql.jdbc.Driver --connect jdbc:mysql://host:port/<db> --username <user> --password <pass> --export-dir <hdfs/dir> --table <table>
```

### Export into MySQL table with update/upsert

Table must exist before. Performs updates/upserts

```
sqoop export --driver com.mysql.jdbc.Driver --connect jdbc:mysql://host:port/<db> --username <user> --password <pass> --export-dir <hdfs/dir> --table <table> --update-key <col_pka> --update-mode updateonly(default)/allowinsert
```
