Reporting on data in DuckDB

Note: 1Password SaaS Manager was previously named Trelica. Some commands and integrations still use or refer to Trelica. Follow directions as written to avoid errors.

Query your Parquet exports directly from S3 without downloading files.

Feature currently in closed beta.

Prerequisites

# Install DuckDB (one-time)
# macOS
brew install duckdb

# or Linux:
# curl -L https://github.com/duckdb/duckdb/releases/latest/download/duckdb_cli-linux-amd64.zip -o duckdb.zip && unzip duckdb.zip

Setup AWS credentials

DuckDB reads AWS credentials from standard locations:

# Option 1: AWS CLI (recommended)
aws sso login
eval $(aws configure export-credentials --format env)
aws sts get-caller-identity

# Option 2: Environment variables
export AWS_ACCESS_KEY_ID="AKIA..."
export AWS_SECRET_ACCESS_KEY="..."
export AWS_DEFAULT_REGION="us-east-1"

Recommended setup – automated initialization

The export generates initialization scripts to help you query the data. Choose based on your use case:

Local files

init_duckdb_local.sql - Creates views that query local Parquet files directly (no data duplication)

S3 files

For BI tools: Use init_duckdb_s3.sql (TABLES)

  • Downloads data once from S3 into DuckDB for fast queries
  • Best for: Dashboards, reporting, BI tools in containers (e.g. Metabase)
  • Performance: Fast (data cached in DuckDB)
  • Trade-off: Requires manual refresh to see new data

For ad-hoc queries: Use init_duckdb_s3_views.sql (VIEWS)

  • Queries S3 Parquet files directly on each SELECT
  • Best for: DuckDB CLI exploration, always-fresh data
  • Performance: Slower (reads from S3 on each query)
  • Trade-off: Always shows latest data without refresh

Example: BI tool setup (tables)

# Download the init script (tables version)
aws s3 cp s3://<YOUR-BUCKET>/trelica/2025-10-22/init_duckdb_s3.sql .

# Initialize the database
duckdb trelica_dw.db < init_duckdb_s3.sql

Example: ad-hoc queries (views)

# Download the init script (views version)
aws s3 cp s3://<YOUR-BUCKET>/trelica/2025-10-22/init_duckdb_s3_views.sql .

# Initialize the database
duckdb trelica_dw.db < init_duckdb_s3_views.sql

Now you can query all tables using simple names:

-- Start DuckDB
duckdb trelica_dw.db -ui

-- Query tables directly
SELECT * FROM person WHERE email LIKE '%@example.com';

-- Join across tables
SELECT
    p.name,
    p.email,
    t.name as team_name
FROM person p
JOIN person_team pt ON p.person_id = pt.person_id
JOIN team t ON pt.team_id = t.team_id;

Query multiple dates

Track changes over time by querying multiple export dates:

-- Count persons in each export
SELECT
    regexp_extract(filename, '(\d{4}-\d{2}-\d{2})', 1) as export_date,
    COUNT(*) as person_count
FROM read_parquet(
    's3://<YOUR-BUCKET>/trelica/*/person.parquet',
    filename=true
)
GROUP BY export_date
ORDER BY export_date;

-- Query all data across dates
SELECT * FROM read_parquet('s3://<YOUR-BUCKET>/trelica/*/person.parquet');

-- Include export date in results
SELECT
    *,
    regexp_extract(filename, '(\d{4}-\d{2}-\d{2})', 1) as export_date
FROM read_parquet(
    's3://<YOUR-BUCKET>/trelica/*/person.parquet',
    filename=true
);

Common queries

Using the automated setup, query with simple table names:

-- Count records per table
SELECT COUNT(*) FROM person;
SELECT COUNT(*) FROM app_user;

-- Check schema
DESCRIBE SELECT * FROM person;

-- Find people by email domain
SELECT name, email, department
FROM person
WHERE email LIKE '%@example.com';

-- License usage by team
SELECT
    t.name as team,
    COUNT(DISTINCT p.person_id) as person_count,
    COUNT(DISTINCT au.app_user_id) as app_user_count
FROM person p
LEFT JOIN person_team pt ON p.person_id = pt.person_id
LEFT JOIN team t ON pt.team_id = t.team_id
LEFT JOIN app_user au ON p.person_id = au.person_id
GROUP BY t.name
ORDER BY person_count DESC;

Export query results

-- Export to CSV
COPY (
    SELECT * FROM person
) TO 'person_export.csv' (HEADER, DELIMITER ',');

-- Export filtered results
COPY (
    SELECT * FROM person WHERE department = 'Engineering'
) TO 'engineering_team.csv' (HEADER, DELIMITER ',');

-- Export joined data to Parquet
COPY (
    SELECT
        p.name,
        p.email,
        t.name as team
    FROM person p
    JOIN person_team pt ON p.person_id = pt.person_id
    JOIN team t ON pt.team_id = t.team_id
) TO 'person_teams.parquet' (FORMAT PARQUET);

Refreshing data

Tables (init_duckdb_s3.sql)

When using tables, you need to manually refresh to see new data:

# Re-run the init script to reload data from S3
duckdb trelica_dw.db < init_duckdb_s3.sql

This is ideal for BI tools where you want controlled refreshes (e.g., nightly updates).

Views (init_duckdb_s3_views.sql)

When using views, data is always fresh - no refresh needed! Each query reads directly from S3.

Advanced: IAM role assumption

If using IAM role instead of access keys:

-- Set role ARN in DuckDB
SET s3_role_arn='arn:aws:iam::1234567890:role/TrelicaDWRole';

-- Query as normal - DuckDB will assume the role automatically
SELECT * FROM person LIMIT 10;

Troubleshooting

Access denied

# Verify AWS credentials work
aws s3 ls s3://<YOUR-BUCKET>/trelica/

# If SSO, ensure session is active
aws sso login
-- In DuckDB, check region is set correctly
  SELECT current_setting('s3_region');

  -- Update if needed (must match bucket region)
  SET s3_region='us-east-1';

httpfs extension not found

-- Install and load the extension
  INSTALL httpfs;
  LOAD httpfs;

IO Error: No files found

Check that:

  1. You used an S3 init script (init_duckdb_s3.sql or init_duckdb_s3_views.sql), not the local one (init_duckdb_local.sql)
  2. The export date exists in S3
  3. httpfs is installed and loaded
# Verify the date exists
aws s3 ls s3://<YOUR-BUCKET>/trelica/2025-10-22/

# Verify init scripts are uploaded
aws s3 ls s3://<YOUR-BUCKET>/trelica/2025-10-22/init_duckdb_s3.sql
aws s3 ls s3://<YOUR-BUCKET>/trelica/2025-10-22/init_duckdb_s3_views.sql

Performance tips

1. Filter early - DuckDB pushes filters down to Parquet

-- Good: Filter pushed to Parquet
SELECT * FROM person WHERE email LIKE '%@example.com';

-- Bad: Reads entire file first
SELECT * FROM (SELECT * FROM person) WHERE email LIKE '%@example.com';

2. Select only needed columns - Parquet is columnar

-- Only reads name and email columns from Parquet
SELECT name, email FROM person;

3. Use date partitioning - Read only relevant dates

-- Read only October 2025 exports
SELECT * FROM read_parquet('s3://<YOUR-BUCKET>/trelica/2025-10-*/person.parquet');

Integration with other tools

Once tables or views are created, connect any tool that supports DuckDB:

  • Python: duckdb.connect('trelica_dw.db')
  • Tableau: DuckDB JDBC driver
  • Superset: DuckDB database connection
  • Jupyter: %load_ext sql + %sql duckdb:///trelica_dw.db

Example Python:

import duckdb

con = duckdb.connect('trelica_dw.db')
df = con.execute("""
    SELECT * FROM person
    WHERE email LIKE '%@example.com'
""").df()
print(df)

By accessing or using 1Password Developer Tools, you agree to the API and SDK Terms of Service.

Was this article helpful?

0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.