On this page:
Overview
To query Tabular tables from Redshift, you must set up AWS Glue mirroring to sync
the databases and tables in your Tabular warehouse to the Glue catalog. Then you create an
external schema
in Redshift that points to the synced Glue schema.
Terminology: A Tabular database is semantically similar to a Redshift schema.
Current Limitations
- Glue mirroring is a one-way replication; mirrored tables should be considered read-only.
- Redshift does not leverage Tabular’s permission system to access tables. Rather, Redshift’s access to tables is governed by the IAM role used to create the external schema in Redshift.
- Redshift’s preview for reading Apache Iceberg tables does not support compressed metadata files or Apache Parquet files using zstd compression.
Configuration
Step 1: Enable AWS Glue mirror [Tabular]
Enabling the AWS Glue mirror is available on the warehouse Settings page.
Step 2: Create an external schema [Redshift]
External schemas in Redshift are essentially pointers to schemas in AWS Glue along with an IAM role used to access the data files on S3.
Run the following Redshift query to create an external schema that is linked to the corresponding schema in Glue.
CREATE EXTERNAL SCHEMA <schema_name>
FROM data catalog
DATABASE '<schema_name>'
IAM_ROLE '<iam_role_arn>'
<schema_name>: replace with the corresponding schema name in Glue (‘default’ is a reserved word in Redshift; in that case replace the first instance with a different name such as ‘default_rs’.
<iam_role_arn>: replace with an IAM role arn that has access to AWS Glue and read access to the Tabular warehouse S3 location.
Example:
CREATE EXTERNAL SCHEMA default_rs
FROM data catalog
DATABASE 'default'
IAM_ROLE 'arn:aws:iam::111111111111:role/service-role/AmazonRedshift-CommandsAccessRole'
Explore & Query Tables
Once you have successfully created an external schema, all tables in that schema are immediately available for querying just like any other Redshift table. Redshift is directly querying the Tabular managed Iceberg tables stored in S3, so any updates made to the tables from other engines such as AWS Athena or Apache Spark are immediately reflected.
Examples:
-- Assuming we have previously created the taxi_tip_rate_per_passenger table
-- in the default database in our Tabular warehouse.
SELECT
passenger_count,
trip_count,
format('%,.2f', tip_rate * 100) as tip_percentage
FROM default_rs.taxi_tip_rate_per_passenger
ORDER BY passenger_count;