Data Access Flow

The Tabular Data Access Flow

Tabular provides a single platform for connecting the engine of your choice to the data you have in Iceberg tables stored in AWS S3. There’s no need to reimplement access controls or rely on pre-configured access policies, in turn improving performance.

Broadly, here’s Tabular’s data access flow implementation:

  1. Customers send a SELECT request for data from Tabular, typically via any one of multiple search engines.
  2. Tabular authenticates and validates the requestor. It then verifies that the requestor has the appropriate permissions to:
    • access the requested table
    • determine whether any labels assigned to fields could change access to those fields
  3. Tabular pre-signs a request to S3 for the requested data. This is what renders moot the need for any pre-configured access policies to the S3 data or reimplement access controls on your side; the pre-signed request enables the requestor to get data access directly to S3.
    • This service is low latency and can scale.

More specifically, here are the under-the-hood steps:

  1. Obtain a Tabular user session token. To do this, log into your AWS account via SSO.

  2. Run a SQL query. To do this, use an engine such as Trino, Spark, or Apache Flink.

  3. The query engine uses your session token to get both table metadata and a table access token.

  4. The signer service then uses the table access token to send a read request to the query engine.

  5. The query engine uses the signed request to read a data or metadata file directly from S3.

  6. Steps 4 and 5 repeat until the query completes.

  7. The Tabular AWS account service account issues a user session token via the OAuth2 client credentials flow. Here’s a sample flow:

    curl --request POST --url 'https://api.tabular.io/ws/v1/oauth/tokens' \
    --header 'content-type: application/x-www-form-urlencoded' \
    --data grant_type=client_credentials \
    --data client_id=${CLIENT_ID} \
    --data client_secret=${CLIENT_SECRET}
    
  8. The catalog service loads the table and creates a table access token.

  9. Based on the table access token, the signer service signs requests to read files in the table.

button

query data access flow in Tabular