Tabular Table Optimizer

On this page:

The Tabular optimizer is a service that runs automatically on tables in a Tabular warehouse and figures out the optimal table layout by

  • capturing various table metrics
  • sampling existing files in your table
  • running experiments with different clustering and compression algorithms.

Based on this analysis, the optimizer service determines the ideal sort order and compression codec, and target file size for each table and applies it automatically. From then on, all future writes including maintenance operations respect these new settings.

There is usually a trade-off between storage size and write costs. The optimized tables strike the best balance between storage and compute, favoring reduced storage costs.

The Tabular table optimization process


Table optimization process

The process begins when a query engine calls the Iceberg API to plan a table scan. Iceberg captures detailed metrics about the scan, such as planned scan size, planning duration, and fields accessed. The Tabular catalog service captures these scan metrics and persists them for use by the optimizer.

The optimizer service uses a random sample of the source data to avoid bias. The service then runs experiments to determine the ideal sort order. After each run, it analyzes the resulting data for size and captures the CPU used for the write.

Based on the results of the analysis, the optimizer service sets the write order on the table and the optimal compaction strategy.

Optimization occurs weekly on all tables that have been updated within the prior 7 days.

Optimizer properties

You can access optimization properties from the Table > Settings page in Tabular.

You may also be able to set these manually via Trino or Spark.

optimizer.enabledtrueWhether recommended optimizations are automatically applied to this table
write.parquet.compression-codeczstdTarget compression codec to be used when writing files for this table
write.object-storage.enabledtrueThe file paths for this table will be prepended with a hash component optimized for object storage