Query your Parquet exports directly from S3 using Snowflake external tables.
Prerequisites
- Snowflake account with ACCOUNTADMIN role (for initial setup)
- AWS IAM role configured for Snowflake access
- Database and schema created in Snowflake
Quick start
The export automatically generates init_snowflake_s3.sql with all configuration needed.
Step 1: Download the initialization script
# Download from S3
aws s3 cp s3://<YOUR-BUCKET>/trelica/2025-10-22/init_snowflake_s3.sql .Step 2: Create the storage integration
User: arn:aws:iam::829545265689:user/externalstages/... is not authorized to perform: sts:AssumeRole
First, create only the storage integration in Snowflake (requires ACCOUNTADMIN role):
-- Copy ONLY the storage integration part from init_snowflake_s3.sql
CREATE STORAGE INTEGRATION IF NOT EXISTS dw_s3_integration
TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = 'S3'
ENABLED = TRUE
STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::908027406242:role/TrelicaDWRole'
STORAGE_ALLOWED_LOCATIONS = ('s3://<YOUR-BUCKET>/trelica/');Step 3: Configure AWS IAM trust policy
Now get Snowflake's AWS identity so you can configure the IAM trust policy.
3a. Get Snowflake's AWS identity
In Snowflake, run:
DESC STORAGE INTEGRATION dw_s3_integration;Look for these two properties in the output:
-
STORAGE_AWS_IAM_USER_ARN- The IAM user ARN that Snowflake will use (e.g.,arn:aws:iam::829545265689:user/externalstages/cic8na0000) -
STORAGE_AWS_EXTERNAL_ID- The external ID for additional security (e.g.,ABC12345_SFCRole=1_abcdefg)
Copy these values - you'll need them in the next step.
3b. Update the IAM role trust policy in AWS
You need to update the trust policy of the IAM role specified in your storage integration (e.g., TrelicaDWRole).
Option 1: AWS Console
- Navigate to AWS IAM Console > Roles.
- Search for your role, for example
TrelicaDWRole. - Select the role name.
- Open the Trust relationships tab.
- Select Edit trust policy.
- Update the policy to include another
Statementfor Snowflake's IAM user:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::829545265689:user/externalstages/cic8na0000"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": "ABC12345_SFCRole=1_abcdefg"
}
}
}
]
}Replace the AWS value with your STORAGE_AWS_IAM_USER_ARN and the sts:ExternalId value with your STORAGE_AWS_EXTERNAL_ID from step 3a.
Select Update policy.
Option 2: AWS CLI
# First, get your current trust policy
aws iam get-role --role-name TrelicaDWRole --query 'Role.AssumeRolePolicyDocument' > trust-policy.json
# Edit trust-policy.json to add the Snowflake user (see JSON above)
# Update the trust policy
aws iam update-assume-role-policy --role-name TrelicaDWRole --policy-document file://trust-policy.json3c. Verify the trust policy is configured
The AWS trust policy update should be immediate. You can proceed to step 4.
Step 4: Complete the Snowflake setup
Now that the IAM trust policy is configured, run the rest of the init_snowflake_s3.sql script in Snowflake:
-- Copy and paste the rest of init_snowflake_s3.sql
-- (file format, stage, external tables, and tables)The script will:
- Create file format for Parquet
- Create stage pointing to your S3 location to validate access via IAM trust.
- Create materialized tables by loading data from Parquet files (fast queries for BI tools)
If you get authorization errors when creating tables:
- Double-check the IAM user ARN matches exactly in the trust policy
- Verify the External ID is correct. External IDs are case-sensitive.
- Make sure the IAM role has S3 read permissions (see troubleshooting section)
You can verify the stage has access and tables were created:
-- List files in stage
LIST @dw_stage;
-- Show created tables
SHOW TABLES;Step 5: Query your data
-- Query tables normally (fast - data is materialized in Snowflake)
SELECT * FROM person LIMIT 10;
-- Join across tables
SELECT
p.name,
p.email,
t.name as team_name
FROM person p
JOIN person_team pt ON p.person_id = pt.person_id
JOIN team t ON pt.team_id = t.team_id;Common queries
Schema inspection
-- Show tables
SHOW TABLES;
-- Describe table structure
DESCRIBE TABLE person;
-- Show views
SHOW VIEWS;Data queries
-- Count records
SELECT COUNT(*) FROM person;
SELECT COUNT(*) FROM app_account;
-- Find people by email domain
SELECT name, email, department
FROM person
WHERE email LIKE '%@example.com';
-- License usage by team
SELECT
t.name as team,
COUNT(DISTINCT p.person_id) as person_count,
COUNT(DISTINCT au.app_account_id) as app_account_count
FROM person p
LEFT JOIN person_team pt ON p.person_id = pt.person_id
LEFT JOIN team t ON pt.team_id = t.team_id
LEFT JOIN app_account au ON p.person_id = au.person_id
GROUP BY t.name
ORDER BY person_count DESC;Working with multiple export dates
Query across multiple export dates by updating the stage URL:
-- Create stage for different date
CREATE STAGE dw_stage_oct21
URL = 's3://<YOUR-BUCKET>/trelica/2025-10-21/'
STORAGE_INTEGRATION = dw_s3_integration
FILE_FORMAT = parquet_format;
-- Create view for historical data
CREATE VIEW person_oct21 AS
SELECT
$1:person_id::VARCHAR as person_id,
$1:email::VARCHAR as email,
-- ... other columns ...
FROM @dw_stage_oct21/person.parquet;
-- Compare counts across dates
SELECT
'Oct 21' as export_date,
COUNT(*) as person_count
FROM person_oct21
UNION ALL
SELECT
'Oct 22' as export_date,
COUNT(*) as person_count
FROM person;Data refresh
When new exports are uploaded to S3, refresh the tables to load the latest data:
Option 1: Re-run the entire script
# Download the latest script from new export date
aws s3 cp s3://<YOUR-BUCKET>/trelica/2025-10-23/init_snowflake_s3.sql .
# Run in Snowflake (only the table creation parts)
# This recreates all tables with fresh dataOption 2: Update stage and refresh individual tables
-- Update stage to point to new date
CREATE OR REPLACE STAGE dw_stage
URL = 's3://<YOUR-BUCKET>/trelica/2025-10-23/' -- New date
STORAGE_INTEGRATION = dw_s3_integration
FILE_FORMAT = parquet_format;
-- Refresh specific table by recreating it
CREATE OR REPLACE TABLE person AS
SELECT
$1:person_id::VARCHAR as person_id,
$1:email::VARCHAR as email,
$1:name::VARCHAR as name,
-- ... all other columns ...
FROM @dw_stage/person.parquet;
-- Check updated data
SELECT COUNT(*) FROM person;Tip: Save the CREATE TABLE statements from the init script to make refreshes easy.
Performance optimization
Partitioning
For large datasets across multiple dates, use multiple stages or wildcard patterns:
-- Create views that span multiple dates
CREATE VIEW person_all_dates AS
SELECT
$1:person_id::VARCHAR as person_id,
$1:email::VARCHAR as email,
-- ... other columns ...
REGEXP_SUBSTR(METADATA$FILENAME, '\\d{4}-\\d{2}-\\d{2}') as export_date
FROM @dw_stage/../*/person.parquet;
-- Query with date filter
SELECT * FROM person_all_dates
WHERE export_date = '2025-10-22';Clustering
For frequently queried columns, materialize the view data:
-- Materialize view data
CREATE TABLE person_materialized AS
SELECT * FROM person;
-- Add clustering
ALTER TABLE person_materialized
CLUSTER BY (email, department);Cost management
Tables are loaded once from S3, then stored in Snowflake:
- One-time S3 costs: Data transfer when tables are created/refreshed
- Ongoing Snowflake costs: Storage and compute for queries
Cost optimization tips:
- Use clustering for frequently filtered columns (email, department, etc.)
- Refresh only when needed - Don't recreate tables unnecessarily
- Select specific columns - Only query columns you need
- Drop unused tables - Remove old tables if no longer needed
-- Good: Efficient filtering
SELECT email FROM person WHERE department = 'Engineering';
-- Bad: Reads entire table then filters
SELECT * FROM person LIMIT 1000000;Troubleshooting
Access denied errors
Error: "User: arn:aws:iam::829545265689:user/externalstages/... is not authorized to perform: sts:AssumeRole"
This error occurs when creating tables or listing the stage if the IAM role trust policy hasn't been configured.
When does this happen?
- When running
CREATE TABLEcommands (which read from the stage) - When running
LIST @dw_stage - When querying stage files
Solution: Configure the IAM role trust policy (Step 3 in Quick Start above)
Quick fix:
- In Snowflake:
DESC STORAGE INTEGRATION dw_s3_integration; - Copy the
STORAGE_AWS_IAM_USER_ARNandSTORAGE_AWS_EXTERNAL_IDvalues - Update your IAM role's trust policy in AWS Console (IAM → Roles → TrelicaDWRole → Trust relationships)
- Add the Snowflake user ARN and external ID to the trust policy
- Retry creating the tables
Other access errors
-- Check storage integration is configured
SHOW INTEGRATIONS LIKE 'dw_s3_integration';
-- Verify stage can access S3
LIST @dw_stage;If LIST fails after trust policy is configured:
- Verify IAM role has S3 read permissions on the bucket
- Check the External ID matches exactly (case-sensitive)
- Confirm the IAM user ARN matches exactly
- Ensure the S3 bucket exists and the path is correct
No data in tables
-- Check if files exist in stage
LIST @dw_stage;
-- Verify table exists
SHOW TABLES LIKE 'person';
-- Check table row count
SELECT COUNT(*) FROM person;
-- Verify stage is accessible
SELECT * FROM @dw_stage/person.parquet LIMIT 1;If tables are empty, recreate them using the CREATE TABLE statements from the init script.
Slow query performance
Tables are already materialized, so queries should be fast. If still slow:
-- Check query execution plan
EXPLAIN SELECT * FROM person WHERE email LIKE '%@example.com';
-- Add clustering keys for frequently filtered columns
ALTER TABLE person
CLUSTER BY (email, department);
-- Verify clustering
SELECT SYSTEM$CLUSTERING_INFORMATION('person', '(email, department)');Security best practices
- Use storage integrations instead of access keys
- Limit STORAGE_ALLOWED_LOCATIONS to specific prefixes
- Grant minimal permissions on tables and views
- Use row-level security for multi-tenant access
- Enable query auditing for compliance
-- Grant read access to analyst role
GRANT USAGE ON DATABASE data_warehouse TO ROLE analyst;
GRANT USAGE ON SCHEMA data_warehouse.public TO ROLE analyst;
GRANT SELECT ON ALL VIEWS IN SCHEMA data_warehouse.public TO ROLE analyst;
-- Row-level security example
CREATE ROW ACCESS POLICY person_department_policy
AS (department VARCHAR) RETURNS BOOLEAN ->
CURRENT_ROLE() = 'ACCOUNTADMIN'
OR department = CURRENT_USER();
ALTER VIEW person ADD ROW ACCESS POLICY person_department_policy ON (department);
Comments
0 comments
Please sign in to leave a comment.