File Loader provides a simple way to load data into a Tabular table by automatically loading data files from a designated s3 location.
On this page:
How does File Loader work?
Currently, the File Loader uses S3 notifications configured for a specific path within your bucket to “listen” for new files being created. By using S3 notifications to signal files that need to be loaded, the File Loader enables low latencies for data from newly created files to be available within the Tabular table, typically within 5 minutes.
Initial Setup
In order to use an s3 bucket as a source for File Loader, you must configure the AWS IAM role for a Tabular warehouse with appropriate permissions to the s3 bucket and create an s3 bucket notification event to notify Tabular when new data files are available to load.
If the File Loader location is in the same bucket as the table being loaded, then no additional IAM privileges are required; just the s3 bucket notification.
IAM Policy
Tabular requires the following actions on an s3 bucket as a File Loader source:
- s3:ListBucket
- s3:GetBucketLocation
- s3:GetBucketNotification
- s3:PutBucketNotification
- s3:GetObject
Example IAM policy (substitute <s3-bucket-name>
with your s3 bucket name)
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation",
"s3:GetBucketNotification",
"s3:PutBucketNotification"
],
"Resource": [
"arn:aws:s3:::<s3-bucket-name>"
]
},
{
"Effect": "Allow",
"Action": [
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::<s3-bucket-name>/<file-loader-prefix>/*"
]
}
]
}
Event Notification for File Loader
When creating the s3 bucket event notification, use the following settings:
- Prefix (optional):
<file-loader-prefix>/
- Event Types:
All object create events
- Destination:
SQS topic
- SQS topic ARN:
arn:aws:sqs:us-east-1:237881912361:tabular-loader-s3-notifications-queue
The
<file-loader-prefix>
is optional. If provided, only files that are created under that prefix will be eligible for auto-loading. The default prefix istabular/staged/
.
Configuring File Loader for a Table
In the Tabular app, navigate to the table you want to enable File Loader for and
click on the SETTINGS
tab. Scroll down to the File Loader section that should look like
below:

File Loader Settings
Click EDIT
in the upper right-hand corner,
You must select a file format for the input data files. See below for supported file formats.
Choose a write-mode
that fits your use case:
append
- File Loader inserts data from files created into the table.replace
- File Loader fully replaces the existing data in the table with data from files created within the 5-minute window.
By default, the File Loader looks in a predefined path for files to load into the table with the following format:
s3://{warehouse-bucket}/tabular/staged/{warehouse-name}/{database-name}/{table-name}
You can also use the Tabular Python Library to enable and use the File Loader.
Supported File Formats
File Loader currently supports loading data from the following file formats:
- CSV (*)
- JSON
- Parquet
*For CSV file format, you can set the column delimiter using the table
property fileloader.csv.column-delimiter
(e.g. for TSV, this is set to \t
).
This table property is only applicable if the file format is CSV, which the default
expects comma-delimited.
Excluding Files From Loading
By default, the File Loader processes and loads any file into the table that is
created within the configured File Loader path for that table. However, there may
be cases where files are created within the table’s file load path that you do not
want to have loaded into the table (e.g. temporary or intermediate files created by the
process creating the files). You can set the table property
fileloader.file-exclude-glob-filter
to a glob pattern, and the File Loader will
ignore any files it finds that match that pattern.
For example, setting fileloader.file-exclude-glob-filter
to **/_*
will
ignore any file name that starts with an underscore. If you wanted to ignore all
files within a certain “folder” within the table’s file load path (e.g. /tmp/),
you would set that table property to /tmp/*
or something similar.
Loading Hive Partition Values
For use cases that load data files dropped into a Hive directory
structure (e.g. mirroring a frequently updated Hive table), the File Loader allows
loading Hive-like partition columns and values into your Tabular table by setting the
table property fileloader.parse-path-values
to true
(default is not set false
).
If this is set to true
, the File Loader expects these partition columns to be in your
Tabular table’s schema.
For example, for the following file:
<table_file_load_path>/dt=2023-01-21/cat=foo/file1.csv
If fileloader.parse-path-values
is true
, the File Loader sets column dt
to 2023-01-21
and cat
to foo
for all rows when loading this data file.
File Loader FAQ
How does the File Loader handle schema?
For the most part, the File Loader expects files it tries to load in to match the table schema. However, the File Loader will try and make a best effort to coerce the incoming data values to match the expected types for each field (this can be dependent on the file-format of the input files). The File Loader will also ignore extra fields and insert null values for missing fields.