Understanding Storage Profiles

On this page:

Understanding Storage Profiles

A storage profile is a discrete, reusable group of configuration settings that manage Tabular’s connection with the cloud storage containing your data.

“Storage profile” is a conceptual term that is foundational in Tabular. It is what governs the connection between Tabular and your cloud storage so you can access and work with the stored data. A storage profile consists of the name of the object store instance (such as a bucket name) plus several configuration attributes, including an authorization for Tabular to access your data store. The specific attributes vary somewhat, depending on your cloud; they may, for example, include a role name. Regardless, they comprise an access policy that enables Tabular to interact with the cloud object store, and ultimately grant data access to you.

When you create a warehouse or add cloud storage, Tabular prompts you for the information it needs to generate a storage profile. When you create a warehouse and follow the prompts to identify a cloud object store, specify a path, and identify a role that can access that storage, that’s the storage profile. The storage profile enables and defines access, and you must complete the storage profile before you can finish creating the warehouse.

Storage profiles extend Tabular’s system of role-based access controls to your storage on S3, GCS, or ADLS. Only members of the Tabular role specified in the storage profile can use the storage profile as the source for file loading (whether as a one-off or continuous). That’s why the object store instance is just one component of a storage profile; you cannot use or configure cloud storage on a standalone basis without the role information.

Note    A role’s ability to use a storage profile to load data has no bearing on the ability to read and write data in warehouse tables. That ability is controlled by privileges on the individual Tabular resource – the warehouse, the database, or the table.

Storage profiles are a necessary precursor for a range of activities, such as

  • Managing data operations
  • Establishing control over who can access buckets
  • Creating warehouse storage
  • Ingesting data from source object stores via Tabular File loader
  • Importing existing data directly into a Tabular table

Storage profiles are useful in several ways:

  • Allow unmanaged data to be ingested into managed Tabular warehouses. Multiple individuals in an organization can access data directly from cloud storage without having to create and populate their own warehouse. You can also extend this to automated data loaders (for example, for CDC data). For example, multiple loaders can share a warehouse yet be restricted to having their own “namespace.”
  • Further refine data access. You can separate ingestion from analytics, for example, which can in turn reduce the load on your data platform. You can also enable more granular analytics – for example, discrete hubspot.* and airbyte.* tables.
  • Help security admins reduce inadvertent access to people who don’t need it (or shouldn’t have it in the first place).

Generating a storage profile

You can create a storage profile for AWS, GCS, or ADLS. While the steps are similar in most cases, we’ll break out the differences where appropriate.

To generate a storage profile

  1. Navigate to Connections > Storage > +.
button

Create a new storage profile

  1. In the window that opens, select your cloud object store. The information you must provide is similar, but not the same, across the cloud providers.
    • For AWS, click Use Connection Assistant and follow the connection wizard. The Tabular Connection Assistant leads you through the process and also automatically connects you with your cloud account.
    • For ADLS and for GCS, click Manually Set Up Bucket. (You can also do this with AWS, if you prefer.)
    • The manual method shows you the information Tabular needs; you must toggle back and forth between Tabular and your cloud provider to generate the required cloud assets. This includes copying and pasting code snippets.
  2. When prompted, select a Tabular role; members of the role you select have full access to the storage profile.
  3. Tabular performs a series of checks validating that the bucket, role access, and permissions are all configured correctly.
  4. Click Create Profile.

When the process completes, the storage profile displays in its proper order in the alphabetized storage profile list.

Note    When you create a new warehouse, you must either assign the warehouse to an existing object store or create a new object store. Assigning it to an existing object store essentially connects it to the existing storage profile for that store. Creating a new object store leads you through the procedure for generating a new storage profile.

To create a storage profile from a new warehouse

  1. Navigate to Data > Warehouses > +.
  2. Enter a warehouse name and click the Storage Bucket box.
  3. Scroll to the bottom of the list that displays and click + Add new bucket. The process then picks up with step 2 of the prior procedure (To generate a storage profile).

When you have created the storage profile, Tabular adds the storage profile and can begin loading and processing the data.

You can define only one storage profile for each unique cloud object store within your organization. However, you can assign a single storage profile to multiple warehouses.

Saving an unfinished profile

There may be times when you proceed through the entire process of configuring a new storage profile, whereupon one or more validation checks fail. But you do not have to keep troubleshooting or risk losing your work and re-creating the new storage profile from scratch.

Instead, you can create an unvalidated profile. This is roughly equivalent to a “save for later” function. If a validation check fails, you can click Save Unvalidated Profile. The profile is added to the full storage profile list. Then you can return to it on your own time, review the error messages, and make the appropriate fixes on the cloud storage side. When you have finished doing that you can re-validate the storage profile.

Establishing storage profile access

While access to Tabular resources (tables, databases, warehouses) generally is governed by a discrete set of access controls, Tabular also requires a discrete set of access controls unique to storage profiles. This applies to any data in the object store that is not subject to the warehouse-specific access controls – that is, any data path outside of the specific Tabular warehouse – certain JSON files, for example.

In these cases, there are 2 levels of storage profile-specific access:

  1. Usage – members of roles with Usage access have read access to any data from any path within the specified object store. They can also use the storage profile to create a warehouse.
  2. Admin – members of roles with Admin access have the ability to grant such access to other roles.

These controls are independent; you can have usage access or Admin access or both. They also work in exactly the same fashion as the access controls for Tabular resources.

Important    In both these cases, access is read only. Tabular does not allow write access outside of the bucket and warehouse path. For example, if you’re setting up Tabular File Loader to create a table manually from existing files, or wish to browse through your object store bucket, you must have usage access on the storage profile for those object stores. Make certain that the people you wish to access a storage profile belong to a role with Usage access.

To configure access for a storage profile

  1. Navigate to Connections > Storage and click the profile you want.
  2. When the storage profile details page displays, toward the top click Access Control.

This page functions the same way as other access controls in Tabular.

Viewing all warehouses associated with a storage profile

The storage profile overview page also displays the number of warehouses associated with a particular storage profile. Navigate to the individual storage profile details page to view details on which warehouses those are.

To view details on a storage profile’s warehouses

  1. Navigate to the details page of the storage profile you want.
  2. Toward the top click Warehouses.

The storage profile warehouse page lists all associated warehouses, including their storage profile status and the size of the data each one holds.

To drill further down into individual warehouse contents, click the warehouse name to display the list of databases in the warehouse. From here you can drill down further, if you wish.

Adding a warehouse directly to a storage profile

When you create a warehouse from the warehouse overview page (via Data > Warehouses > +) you must select the storage profile that goes along with it. However, you can warehouses directly from the storage profile details page, and every warehouse you create that way is automatically added to that storage profile

To add a warehouse directly to a storage profile

  1. Navigate to Connections > Storage and select the profile you want.
  2. On the storage profile details page, click Warehouses, and then click Plus icon.
  3. Follow the prompts to create a warehouse, which automatically is associated with this storage profile.

Verifying storage profiles

You can view a list of all storage profiles, each with indicators of its basic state and function, so you can quickly monitor status and drill down if and where needed to make changes. Simply navigate to ConnectionsStorage. All storage profiles display on this page.

button

storage profile overview page

Right from this page you can see at a glance:

  • The region where your data is stored.
  • The status of the warehouses associated with this storage profile (indicates what you can do with this storage profile).
    • Ready means all configurations are valid and you can read and write to the specified warehouses.
    • Not ready means at least one validation check failed. Click the storage profile name to open a details page to help you troubleshoot.
    • Not available does not mean the storage profile is out of commission. Instead, it means it is read-only and therefore unavailable for writing. You can still import data from it.
  • The number of warehouses connected to the storage profile.
    • If you do not see a number, the storage profile has no warehouses connected to it but it is ready to connect to one.
  • Whether the warehouse is ready
  • Whether the storage profile is ready for you to import data.
  • When the storage profile was created, by whom (if you hover over the creation date), and the time of its most recent validation. Tabular re-validates each storage profile every 24 hours (usually end of day Pacific time), but you can re-validate manually at any time. To do this, click the refresh icon.

Troubleshooting storage profiles

When you see a storage profile is in an error state, you can view specific details about that profile to help you troubleshoot. Typically that means returning to your cloud provider interface templates and checking configuration settings.

To troubleshoot a storage profile From the Storage Profiles overview page, click the name of the storage profile you want. The details page for that profile displays. These details are cloud-specific and vary somewhat between AWS, GCS, and ADLS. But in all cases, you can see at a glance:

  • All the validations that ran, and whether the profile is ready to attach a warehouse (that is, to read and write) or ready for data import (read only).
  • Not available – the warehouse is read only. Did you wish to write to it, instead? Look for certain checks that fail if it is read-only, including deletes and puts. If you do not intend to write to this warehouse, you do not need to do anything; you can leave it in this state.
  • Error messages – use these to help you hone in on an issue on your cloud provider side.
  • Basic details about your cloud storage account, some of which you can click and copy as you modify configurations.

Again, the details page for AWS, GCS, and ADLS storage profiles vary somewhat, including certain of the validation checks. But they all display data you previously entered via the connection wizard or manual templates when you first set up the profile.

In all cases, an AssumeRole failure essentially means Tabular assumed an ID that lacks permission to access the object store to provide any additional information. It cannot run any further validation checks.

Revalidating a storage profile

When you’re done fixing your configurations, re-validate the storage profile to ensure you’ve addressed all issues and to ready the storage profile for use. You can do this either from the storage profile overview page or from an individual storage profile details page.

To re-validate a storage profile from the storage profiles overview page, to the right of the profile you want, click the refresh icon.

To re-validate a storage profile from a storage profile details page, toward the top right click the refresh icon.

When the display indicates all checks have passed, Tabular displays a message indicating a successful validation, and you can begin using the storage profile.

Deleting a storage profile

Deleting a storage profile is a simple matter, although not everyone is allowed to delete a storage profile (see the note below).

Deleting a storage profile does not affect the data.

To delete a storage profile

  1. Navigate to Connections > Storage.
  2. Next to the storage profile you wish to delete, click X, and confirm the delete when prompted.

Important    You cannot delete a storage profile if:

  • You are not a Security Admin for your organization
  • There is at least one warehouse attached to the storage profile you wish to delete (indicated by a warehouse count on the storage profile overview page). You must remove all warehouses from a storage profile before you can delete it.

Using existing storage profiles

Storage profiles also come into play when you enable Tabular File Loader for a table. When you do this, you provide a path for File Loader from an existing storage profile.

To enable File Loader for a table

  1. Navigate to the table you want.
  2. Click Settings.
  3. Scroll down to the File Loader section and click Enabled.
  4. Click Browse, and then select the bucket you want from the list that displays.
button

Enabling File Loader

Now you can use File Loader to load data into a table. You can do this from the overview page of either the database or the table.

To use File Loader from the Overview page of a database

  1. Navigate to the database you want.
  2. Click menu icon.
  3. Click Load Table from Source. Then click Connected Storage.
  4. Tabular prompts you to select the bucket from which you wish to load data. Tabular leverages the bucket storage profiles to do this.

To use File Loader from the Overview page of the table

  1. Navigate to the table you want.
  2. Click menu icon.
  3. From the list that displays click Load Table. The Select storage bucket location window displays. Tabular leverages the bucket storage profiles to do this.

Viewing storage profile details

To view the storage profile associated with a bucket:

  1. Navigate to DataWarehouses, then over the name of the warehouse you want. The bucket information displays.
  2. Note the bucket name. Then navigate to ConnectionsStorage. Find the bucket you want and hover over it. Tabular displays the information comprising the storage profile.

Note    Only Tabular Security Admins can delete a storage profile.

Reference – storage profile components

Storage profiles consist of:

Profile name/id
account_idaws account id (new - parsed from initial role arn)
regionaws bucket region (from warehouse)
bucketaws bucket name (new - parsed from initial S3 URI)
role_arnrole needed for operations on the bucket (from warehouse)
external_idaws external id for signing (from warehouse)