On this page:


To connect Apache Spark to a Tabular warehouse, simply add the catalog to your Spark session using the following configuration:

spark.sql.catalog.<your-warehouse>               org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.<your-warehouse>.credential    <your-tabular-credential>
spark.sql.catalog.<your-warehouse>.warehouse     <your-warehouse-name>

Your databases and tables will be addressable at <your-warehouse-name>.<db>.<table>.

Using Docker

The easiest way to get up and running with Spark is to use our provided docker image, which bundles Spark, Iceberg, and the Tabular catalog client. The following command creates a Spark environment and a jupyter notebook server available at http://localhost:8888 with some example notebooks.

docker run -it -p 8888:8888 -p 8080:8080 -p 18080:18080 tabulario/spark-tabular notebook <your-tabular-credential> <your-warehouse-name>

You can also replace “notebook” in the above command with any of the following values for interactive spark shells:

  • spark-sql
  • spark-shell
  • pyspark


docker run -it -p 8888:8888 -p 8080:8080 -p 18080:18080 tabulario/spark-tabular spark-sql <your-tabular-credential> <your-warehouse-name>

Running Locally

Tabular works with any Spark runtime environment including running open source Spark from your local machine.

Download the latest versions of the Spark runtime and Tabular client jars from the Download Resources page.

Download the latest supported version of Spark (currently 3.4.0) by going to

Untar the tgz file

tar -xvzf spark-3.4.0-bin-hadoop3.tgz`

Copy the two jar files you downloaded in Step 1 into the spark jars directory: ./spark-3.4.0-bin-hadoop3/jars/

Change to the spark home directory

cd spark-3.4.0-bin-hadoop3

Start a spark-sql shell by running the following:

./bin/spark-sql \
--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
--conf spark.sql.defaultCatalog=tabular \
--conf spark.sql.catalog.tabular=org.apache.iceberg.spark.SparkCatalog \
--conf \
--conf spark.sql.catalog.tabular.uri= \
--conf spark.sql.catalog.tabular.credential=<your-tabular-credential> \
--conf spark.sql.catalog.tabular.warehouse=<your-warehouse-name>

Once the spark-sql shell has launched, you can verify that you have access to your Tabular data warehouse by running SHOW CURRENT NAMESPACE.

spark-sql> show current namespace;
Time taken: 0.217 seconds, Fetched 1 row(s)


You can access your Tabular data warehouse from Spark running on AWS EMR clusters using the same jars and configurations as you would for any other Spark deployment.

We recommend using EMR 6.11 or later with Spark 3.3 or later.

Example EMR Configuration

      "spark.sql.catalog.tabular.credential": "<your-tabular-credential>",
      "spark.sql.catalog.tabular.warehouse": "<your-warehouse-name>"