Hive

Quick Start

Run a single node Hive session with a Tabular credential:

docker run -it -e TABULAR_CREDENTIAL=<credential> -e WAREHOUSE=sandbox tabulario/hive

Hive Install from Binaries

Note: The quick start assumes you have no installation of Hadoop, Hive, Iceberg or a metastore. The settings and steps may be different for environments with existing configurations.

# Create a working directory
mkdir tabular-hive
cd tabular-hive

# Verify Java Version and JAVA_HOME
java --version # must be java 1.7/1.8
echo $JAVA_HOME # must point to java 1.7/1.8

# Download dependencies
# Download Hadoop
# amd64
curl -O https://dlcdn.apache.org/hadoop/common/hadoop-3.3.5/hadoop-3.3.5.tar.gz
# aarch64
# curl -O https://dlcdn.apache.org/hadoop/common/hadoop-3.3.5/hadoop-3.3.5-aarch64.tar.gz
tar -xzf hadoop-3*.tar.gz
export HADOOP_HOME=$PWD/hadoop-3.3.5

# Download Hive
curl -O https://dlcdn.apache.org/hive/hive-3.1.3/apache-hive-3.1.3-bin.tar.gz
tar -xzf apache-hive-3.1.3-bin.tar.gz
export HIVE_HOME=$PWD/apache-hive-3.1.3-bin

# Download Iceberg dependencies
curl  --create-dirs -O --output-dir apache-hive-3.1.3-bin/lib https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-hive-runtime/1.1.0/iceberg-hive-runtime-1.1.0.jar 
curl  --create-dirs -O --output-dir apache-hive-3.1.3-bin/lib https://tabular-repository-public.s3.amazonaws.com/releases/io/tabular/tabular-client-runtime/1.4.3/tabular-client-runtime-1.4.3.jar

# COPY CONFIG FROM BELOW
# Update the properties for credential and warehouse name
vim $HIVE_HOME/conf/hive-site.xml

# Initialize the hive metastore
cd $HIVE_HOME
./bin/schematool -initSchema -dbType derby

# Run metastore in background
./bin/hive --service metastore&

# Run hive shell
./bin/hive
CREATE EXTERNAL TABLE h1 (i int) 
    STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
    -- Hive requires a location, but this is ignored
    LOCATION 'file:///tmp/hive/warehouse/h1' 
    -- the following identifies that we want to use the Tabular catalog from the configuration
    TBLPROPERTIES ('iceberg.catalog'='tabular');    

Requirements

Current Limitations

  • Hive is limited to a single global catalog concept unlike Tabular and Iceberg which supports multiple catalogs/warehouses.
  • Each table must specify the iceberg.catalog table property as the catalog reference to signal to Hive that the table is externally managed. (See Creating Tables).
  • Tables created outside of Hive require manually linking (See Linking Existing Tables).
  • Additional Iceberg compatibility information

Configuration

$HIVE_HOME/conf/hive-site.xml

<configuration>
  <property>
    <name>iceberg.catalog.tabular.catalog-impl</name>
    <value>org.apache.iceberg.rest.RESTCatalog</value>
  </property>
  <property>
    <name>iceberg.catalog.tabular.uri</name>
    <value>https://api.tabular.io/ws</value>
  </property>
  <property>
    <name>iceberg.catalog.tabular.warehouse</name>
    <value><your-warehouse-name></value>
  </property>
  <property>
    <name>iceberg.catalog.tabular.credential</name>
    <value><your-tabular-credential></value>
  </property>
  <property>
    <name>iceberg.engine.hive.enabled</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.vectorized.execution.enabled</name>
    <value>false</value>
  </property>
  <property>
    <name>hive.vectorized.execution.enabled</name>
    <value>false</value>
  </property>
  <property>
    <name>tez.mrreader.config.update.properties</name>
    <value>hive.io.file.readcolumn.names,hive.io.file.readcolumn.ids</value>
  </property>

  <property>
    <name>hive.metastore.uris</name>
    <value>thrift://localhost:9083</value>
    <description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
  </property>

  <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>file:///tmp/hive/warehouse</value>
    <description>location of default database for the warehouse</description>
  </property>
  <property>
    <name>hive.metastore.warehouse.external.dir</name>
    <value>file:///tmp/hive/external/warehouse</value>
    <description>Default location for external tables created in the warehouse. If not set or null, then the normal warehouse location will be used as the default location.</description>
  </property>
  <property>
    <name>hive.variable.substitute</name>
    <value>true</value>
    <description>This enables substitution using syntax like ${var} ${system:var} and ${env:var}.</description>
  </property>
</configuration>

Creating Tables

Creating a table as described below will create the table in Tabular. The namespaces and table names must match exactly.

CREATE EXTERNAL TABLE default.h1 (i int) 
    STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
    -- Hive requires a location, but this is ignored
    LOCATION 'file:///tmp/hive/warehouse/h1' 
    -- the following identifies that we want to use the Tabular catalog from the configuration
    TBLPROPERTIES ('iceberg.catalog'='tabular');      

Linking Existing Tables

A table created externally to hive metastore can be linked to the metastore without providing information like the schema, location, but the database and table name must match exactly.

CREATE EXTERNAL TABLE default.h2 
    STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
    -- Hive requires a location, but this is ignored
    LOCATION 'file:///tmp/hive/warehouse/h2' 
    -- the following identifies that we want to use the Tabular catalog from the configuration
    TBLPROPERTIES ('iceberg.catalog'='tabular');