Hive

Run script file

$ hive -f script.hql

Run non-interactiv query

$ hive -e "<query>"

Metadata

show tables;

describe extended <table>

describe formatted <table>

show partitions <table>

Create database

create database <db>

Use database

Create Hive managed table

Create externally managed table

  • The data is managed by the filesystem, and not by Hive.

  • Issuing a table drop will not destroy the data

  • Can have multiple table definitions/tools use the same data in parallel

Create table with partitions

  • Partitions are extra physical folders to split the data more logically.

  • While querying the data, they will appear as columns

Create table with buckets

  • Buckets enable more efficient queries + joining

  • Buckets act on columns by hashing the value, similarly to a hashmap

  • Need to set hive.enforce.bucketing=true property, to create the needed buckets at table declaration

Create table with different text delimiters (for TEXTFILE format)

Create table with a different storage format

Stored as to change the storage file type. Default is text file. Options:

  • textfile - default

  • rcfile / orcfile / orc

  • avro

  • sequencefile

  • parquet

Create external table

Data is stored externally to hive, and location folder needs to be specified at creation time (doesn't have to exist) to not copy from HDFS, or it will create a folder in normal Hive space on HDFS

Create table as select

Table will have the schema according to the select query and also be populated in 1 go (CTS create table as select statement)

Create table with the same schema

Create transactional table

Drop table

Drop schema (and data if managed by Hive)

Truncate table

Drop data and keep schema. Works for internal tables

Load data from local FS

Load data from HDFS

Load/Insert into partition dynamicaly

Insert data values

Insert data as select

Insert compressed data

Data saved as gzip or bzip2 is automatically decompressed when inserting into the table

Update

  • Update requires transactional behaviour for a table

    • stored as orc

    • has buckets

    • transactional property

  • Doesn't work to update the bucket col

Delete data

Alter table add column

Alter table rename

Alter table drop column

To drop a column you need to re-write the schema with the columns that you want to keep

Select

Select and sort globally

Select and sort per reducer

Joins

Inner join

Join keyword

SQL way

Outer join

Includes rows from the left table which don't have a match with the right table, and the values are set to null, or the reverse

Cross join

Change execution engine

Optimizations

vectorization

Vectorized query execution streamlines operations by processing a block of 1024 rows at a time.

Debug

Explain

Detail the execution plan

Last updated

Was this helpful?