Athena PySpark

This document describes the steps required to configure Athena PySpark notebooks to authenticate with Tabular and execute queries against Tabular-managed Iceberg tables.

Athena PySpark notebooks are hosted by AWS. But they leverage standard open source Apache Spark and Iceberg to connect to Tabular’s REST catalog as a data source for queries. This integration makes it easy to leverage the full capabilities of Spark, Iceberg, and notebooks without the need to manage any infrastructure.

On this page:


  1. Athena workgroup using the PySpark engine.
  2. Service account or AWS IAM role mapping configured in Tabular for authentication.


Navigate to the Athena Notebook Explorer.

  1. Click on Create notebook.
  2. Enter a Name for notebook.
  3. Open Apache Spark properties and click Custom.
  4. Click the Edit in Table toggle to set it to Edit in JSON.
  5. Copy and paste the spark configuration below, but replace <warehouse> with your Tabular warehouse name.
  6. Click Create notebook.
      "spark.sql.catalog.<warehouse>": "org.apache.iceberg.spark.SparkCatalog",
      "spark.sql.catalog.<warehouse>.catalog-impl": "",
      "spark.sql.catalog.<warehouse>.credential": "<credential>",
      "spark.sql.catalog.<warehouse>.uri": "",
      "spark.sql.catalog.<warehouse>.warehouse": "<warehouse>",
      "spark.sql.defaultCatalog": "<warehouse>",
      "spark.sql.extensions": "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions"

Query your Iceberg tables with Athena PySpark

In addition to using standard pyspark syntax for interacting with your warehouse, in the Athena Pyspark notebook environment you can also use cell magics.

DESCRIBE TABLE examples.nyc_taxi_yellow;
create table if not exists default.taxi_tip_rate_per_passenger_spark as
    count(*) as trip_count,
    avg(tip_amount / fare_amount) as tip_rate
from examples.nyc_taxi_yellow
    pickup_time between '2020-03-01' and '2020-03-31'
  and passenger_count > 0
  and fare_amount > 0
group by passenger_count;
  format_number(tip_rate * 100, 2) as tip_percentage
from default.taxi_tip_rate_per_passenger_spark 
order by passenger_count;