Title: Snowflake
Locale: en
URL: https://sensorswave.com/en/docs/data-center/pipeline/destinations/snowflake/
Description: Configure a Snowflake export pipeline to regularly sync event and user data to your Snowflake data warehouse
The Snowflake connector syncs event and user data from Sensors Wave to your Snowflake data warehouse on a recurring schedule, enabling your data team to run deep SQL analysis and join with other business datasets.

Typical use cases:

- Sync user behavior events to Snowflake to join with orders and CRM data
- Export user profile data for dbt modeling and data mart construction
- Retain historical event data from Sensors Wave to meet compliance or long-term analytics requirements

The Snowflake connector supports two export types:

- **Event export**: Incrementally syncs user behavior events to the `SENSORSWAVE_EVENTS` table (default, customizable at creation)
- **User export**: Incrementally syncs user profile attributes to the `SENSORSWAVE_USERS` table (default, customizable at creation)

Each export type requires a separate pipeline.

## Prerequisites

Before creating a pipeline in Sensors Wave, complete the following setup in your Snowflake account. We recommend creating a dedicated role, user, and database for Sensors Wave to follow the principle of least privilege.

### Step 1: Create a Virtual Warehouse

Skip this step if you already have a warehouse to use.

```sql
CREATE WAREHOUSE SENSORSWAVE_WH
  WITH WAREHOUSE_SIZE = 'XSMALL'
  AUTO_SUSPEND = 600
  AUTO_RESUME = TRUE
  INITIALLY_SUSPENDED = TRUE;
```

### Step 2: Create the Target Database and Schema

```sql
CREATE DATABASE SENSORSWAVE_DB;
CREATE SCHEMA SENSORSWAVE_DB.SENSORSWAVE_SCHEMA;
```

### Step 3: Create a Dedicated Role and Grant Permissions

```sql
CREATE ROLE SENSORSWAVE_ROLE;

GRANT USAGE ON WAREHOUSE SENSORSWAVE_WH TO ROLE SENSORSWAVE_ROLE;
GRANT USAGE ON DATABASE SENSORSWAVE_DB TO ROLE SENSORSWAVE_ROLE;
GRANT USAGE ON SCHEMA SENSORSWAVE_DB.SENSORSWAVE_SCHEMA TO ROLE SENSORSWAVE_ROLE;
GRANT CREATE TABLE ON SCHEMA SENSORSWAVE_DB.SENSORSWAVE_SCHEMA TO ROLE SENSORSWAVE_ROLE;
```

### Step 4: Generate an RSA Key Pair

The Snowflake connector uses Key Pair Authentication and does not support username/password login. Run the following commands in your local terminal to generate a key pair:

```bash
# Generate a 2048-bit RSA private key
openssl genrsa -out sensorswave_rsa_key.p8 2048

# Extract the public key
openssl rsa -in sensorswave_rsa_key.p8 -pubout -out sensorswave_rsa_key.pub
```

Extract the public key content (without PEM header/footer lines and newlines) for the next step:

```bash
grep -v "PUBLIC KEY" sensorswave_rsa_key.pub | tr -d '\n'
```

> **Tip**: To add password protection to the private key, include the `-aes256` flag in the generation command, for example: `openssl genrsa -aes256 -out sensorswave_rsa_key.p8 2048`. If you set a passphrase, you must enter it in the `Private Key Passphrase` field under "Configuration Reference".

### Step 5: Create a User and Assign the Public Key

```sql
CREATE USER SENSORSWAVE_USER
  DEFAULT_ROLE = SENSORSWAVE_ROLE
  DEFAULT_WAREHOUSE = SENSORSWAVE_WH
  RSA_PUBLIC_KEY = 'MIIBIjANBgkqh...';    -- Replace with your public key content (no header/footer)

GRANT ROLE SENSORSWAVE_ROLE TO USER SENSORSWAVE_USER;
```

> **Note**: The private key file (`sensorswave_rsa_key.p8`) is a sensitive credential. Store it securely. Sensors Wave encrypts private keys at rest and will not display them again after saving.

## Create a Pipeline

After completing the Snowflake prerequisites, create a pipeline in Sensors Wave.

1. Go to **Data Center → Pipeline** and click **New Pipeline**.
2. In the dialog, select **Snowflake** as the destination and click **Next**.
3. Select the **data source type**:
   - **Events**: Sync user behavior events
   - **Users**: Sync user profile attributes
4. Fill in the Snowflake connection settings (see "Configuration Reference" below).
5. Click **Test Connection** and wait for verification to pass.
6. Configure the **execution frequency** (see "Execution Frequency" below).
7. To sync only specific events, configure event filtering in **Advanced Options** (events export only).
8. Click **Save**. The pipeline starts immediately and automatically creates the target table in Snowflake.

## Configuration Reference

### Connection Settings

| Field | Required | Description | Example |
|-------|----------|-------------|---------|
| `Account URL` | Yes | Your Snowflake account URL | `https://abc12345.us-east-1.snowflakecomputing.com` |
| `User` | Yes | Snowflake username | `SENSORSWAVE_USER` |
| `Private Key` | Yes | PEM-format private key content (including header/footer lines) | `-----BEGIN RSA PRIVATE KEY-----\n...` |
| `Private Key Passphrase` | No | Passphrase for an encrypted private key; leave blank if unencrypted | — |
| `Database` | Yes | Target database name | `SENSORSWAVE_DB` |
| `Warehouse` | Yes | Compute warehouse name | `SENSORSWAVE_WH` |
| `Schema` | Yes | Target schema | `SENSORSWAVE_SCHEMA` |
| `Table` | Yes | Target table name, customizable; defaults to `SENSORSWAVE_EVENTS` (events) or `SENSORSWAVE_USERS` (users); created automatically on save | `SENSORSWAVE_EVENTS` |
| `Role` | No | Snowflake role to use; defaults to the user's default role | `SENSORSWAVE_ROLE` |

### Execution Frequency

**Interval Execution**: Triggered at fixed intervals (1–23 hours) starting from when the pipeline is created.

| Option | Recommended Use |
|--------|----------------|
| Every 1 hour (default) | Most analytics use cases |
| Every 2 hours | Reduce frequency for high-volume data |
| Every 6 hours | Regular incremental sync |
| Every 12 hours | Low-frequency updates such as user profiles |
| Every 24 hours | Daily batch export |

**Scheduled Execution**: Triggered at a fixed time in a specified timezone, ideal for off-peak execution windows. Select a timezone and time; the pipeline runs once daily at that moment.

### Timezone Configuration

Scheduled execution requires a timezone. The timezone is resolved using the following priority:

> **User-configured timezone > Project timezone > System default timezone (CN clusters: `Asia/Shanghai`; overseas clusters: `UTC`)**

If no timezone is configured, the project-level timezone setting is used.

## Data Model

### Events Table (default `SENSORSWAVE_EVENTS`)

Event data is written to this table in append-only mode. The examples below use the default table name.

| Column | Type | Description |
|--------|------|-------------|
| `time` | `TIMESTAMP_NTZ` | Event timestamp (local time in SENSORSWAVE's server timezone) |
| `distinct_id` | `VARCHAR(128)` | Client-side user identifier |
| `trace_id` | `VARCHAR(128)` | Request trace ID |
| `ssid` | `BIGINT` | Server-side user identifier (SSID) |
| `anon_id` | `VARCHAR(128)` | Anonymous ID |
| `login_id` | `VARCHAR(128)` | Login ID |
| `event` | `VARCHAR(128)` | Event name, e.g., `$pageview`, `purchase` |
| `properties` | `VARIANT` | Event properties JSON |
| `user_properties` | `VARIANT` | Snapshot of user properties at event time |
| `received_at` | `TIMESTAMP_NTZ` | Server receive time (local time in SENSORSWAVE's server timezone) |

Example: querying `VARIANT` properties:

```sql
-- Query order_id and total_amount from event properties
SELECT
    time,
    event,
    properties:order_id::STRING AS order_id,
    properties:total_amount::FLOAT AS total_amount
FROM SENSORSWAVE_EVENTS
WHERE event = 'purchase'
  AND time >= '2026-04-01'
LIMIT 100;
```

### Users Table (default `SENSORSWAVE_USERS`)

User data is written to this table using MERGE (matched by `ssid`: update if exists, insert if not).

| Column | Type | Description |
|--------|------|-------------|
| `ssid` | `BIGINT` | Server-side user identifier, primary key |
| `login_id` | `VARCHAR(128)` | Login ID |
| `anon_id` | `VARCHAR(128)` | Anonymous ID |
| `properties` | `VARIANT` | User profile attributes JSON |
| `created_at` | `TIMESTAMP_NTZ` | User creation time |
| `updated_at` | `TIMESTAMP_NTZ` | Last updated time for user attributes |

Example: querying `VARIANT` user properties:

```sql
-- Query membership level from user properties
SELECT
    ssid,
    login_id,
    properties:membership_level::STRING AS membership_level,
    updated_at
FROM SENSORSWAVE_USERS
WHERE properties:membership_level IS NOT NULL
LIMIT 100;
```

## Event Filtering

Event filtering applies to **event exports only** and supports two modes:

- **Include events (allowlist)**: Export only the events listed; all other events are skipped
- **Exclude events (blocklist)**: Export all events except those listed

Configure the filter mode and event names (e.g., `$pageview`, `purchase`) in **Advanced Options**.

> **Note**: Filtering changes take effect in the next sync window after the current window completes; already-exported data is not reprocessed.

## Test Connection

After clicking **Test Connection**, the system validates your Snowflake configuration in the following steps:

1. **Establish connection**: Verify that the `Account URL`, username, and private key are valid
2. **Execute SQL**: Verify the user has basic query permissions
3. **Check warehouse**: Verify the specified `Warehouse` exists and the user can use it
4. **Check database**: Verify the specified `Database` exists and the user can access it
5. **Check schema**: Verify the specified `Schema` exists, the user can access it, and has `CREATE TABLE` permission

Each step's result is shown in real time to help you quickly pinpoint issues.

### Common Connection Errors

| Error | Possible Cause | Resolution |
|-------|---------------|------------|
| Connection failed | Incorrect `Account URL` format or mismatched private key | Check the URL format; verify you are using the correct private key file |
| Warehouse not found | Typo in warehouse name or warehouse does not exist | Check the warehouse name (case-sensitive); verify it exists in Snowflake |
| No CREATE TABLE permission | Role lacks `CREATE TABLE` permission | Run `GRANT CREATE TABLE ON SCHEMA ... TO ROLE ...` |
| Invalid private key format | Key content was corrupted during copy | Ensure the private key includes the full header and footer lines (`-----BEGIN RSA PRIVATE KEY-----`) |

## Pipeline Management and Monitoring

### Pipeline Status

A pipeline's lifecycle states are:

```
New → Running → Stopped (irreversible)
```

> **Note**: Stopping a pipeline is irreversible. All scheduling stops and no new sync tasks are created. To resume exporting, you must create a new pipeline. Historical run records remain accessible after stopping.

### Viewing Run History

In the pipeline detail page's **Runs** tab, view execution details for each sync task:

- **Status**: Running / Success / Failed
- **Data range**: The time window covered by this sync
- **Rows exported**: Number of rows successfully written to Snowflake
- **Execution time**: Task start and end timestamps
- **Logs**: Click a row to expand and view detailed execution logs

### Viewing Metrics

The **Metrics** tab shows overall pipeline performance (defaults to the last 30 days):

- Success / failure run count trend charts
- Daily exported rows trend chart
- Supports custom date range filtering

## Historical Data Backfill

### What is Backfill

After a pipeline is created, incremental sync only processes new data from the pipeline creation time forward. To load data for a period before the pipeline was created, use the backfill feature.

### Steps

1. In the pipeline detail page, switch to the **Backfill** tab.
2. Click **Start Backfill** and select the date range to backfill (start date and end date).
3. Confirm the estimated number of windows and click **Confirm**.

Backfill uses the project timezone to split the range into calendar-day windows and processes them serially.

### Backfill and Incremental Sync Interaction

- While a backfill is running, the pipeline's scheduled incremental sync **automatically pauses**—no data is lost; sync resumes from where it left off after the backfill completes or is cancelled
- Only one backfill can run at a time per pipeline
- If a backfill is interrupted (e.g., service restart), it automatically resumes from the last completed window—completed windows are not re-processed

### Cancel Backfill

Click **Cancel** while a backfill is in progress to stop it. The cancellation takes effect after the current window finishes; the in-progress window is not interrupted.

## Delivery Semantics and Data Consistency

| Data Type | Write Method | Delivery Semantics |
|-----------|-------------|-------------------|
| Events | `COPY INTO` (append) | at-least-once |
| Users | `MERGE INTO` (matched by `ssid`) | exactly-once |

## Timezone Handling

### Timezone of Time Columns

All time columns in Snowflake (`time`, `received_at`, `created_at`, `updated_at`) use `TIMESTAMP_NTZ` (no timezone information) and store the **local time in SENSORSWAVE's server timezone**. CN clusters typically use `Asia/Shanghai` (UTC+8); overseas clusters typically use `UTC`.

**Example (CN cluster)**: If the server timezone is `Asia/Shanghai` (UTC+8) and an event occurred at 2026-04-13 15:30:00 Beijing time, Snowflake stores `2026-04-13 15:30:00`.

### What Timezone Affects

| Setting | Timezone Impact |
|---------|----------------|
| Scheduled execution trigger time | Calculated in the configured timezone (e.g., `Asia/Shanghai` 08:00 fires at 8 AM Beijing time) |
| Backfill date range | Split into calendar-day windows (00:00–24:00) using the project timezone |
| Time column values in Snowflake | Fixed to the SENSORSWAVE's server timezone, independent of user-configured timezone |

## Notes

- **Stopping a pipeline is irreversible**: Stopping requires creating a new pipeline to resume; historical run records are retained
- **Private key format**: `Private Key` must be the full PEM format including `-----BEGIN RSA PRIVATE KEY-----` and `-----END RSA PRIVATE KEY-----` header/footer lines
- **Target table is auto-created**: The target table is created automatically in Snowflake when you save the pipeline—no manual table creation needed
- **Incremental sync pauses during backfill**: Scheduled sync is paused while backfill runs and resumes automatically after—no data is lost
- **Time columns are in local time**: Snowflake time columns store SENSORSWAVE's server timezone, not UTC; account for this when doing cross-timezone analysis
- **Duplicate event rows**: `SENSORSWAVE_EVENTS` may contain a small number of duplicate rows.

  **Why this happens**: Incremental sync reads from a real-time data stream; backfill reads from historical storage. These two sources are independent and cannot be precisely aligned at time boundaries. If a backfill covers a time range that overlaps with data already processed by incremental sync, the same event may be written twice. User data is written via MERGE and is not affected.

  **Scope**: Affects only the events table, and only when the backfill range overlaps with data already synced incrementally.

  **Recommendation**: Create a deduplication view in Snowflake and use it as the default entry point for queries:

  ```sql
  CREATE VIEW SENSORSWAVE_EVENTS_DEDUP AS
  SELECT * EXCLUDE (rn)
  FROM (
      SELECT *,
             ROW_NUMBER() OVER (
                 PARTITION BY time, event, distinct_id, trace_id
                 ORDER BY received_at DESC NULLS LAST
             ) AS rn
      FROM SENSORSWAVE_EVENTS
  )
  WHERE rn = 1;
  ```

## FAQ

**Q: Test Connection fails with "No CREATE TABLE permission". What should I do?**

Verify that you ran `GRANT CREATE TABLE ON SCHEMA ... TO ROLE ...`, and confirm the role used by the pipeline matches the role that was granted the permission.

**Q: The pipeline is "Running". Can I modify the connection settings or event filters?**

Yes. Go to the pipeline detail page and click **Edit** to update the configuration. Changes take effect in the next sync window after the current one completes; frequency changes take effect immediately.

**Q: There are duplicate rows between backfill and incremental sync data. How do I handle this?**

Use the deduplication view `SENSORSWAVE_EVENTS_DEDUP` from the "Notes" section above for everyday queries. Alternatively, deduplicate using `ROW_NUMBER()` on the combination of `time`, `event`, `distinct_id`, and `trace_id`, keeping the record with the latest `received_at`.

**Q: How do I query data inside `VARIANT` columns (`properties` and `user_properties`)?**

Use Snowflake's colon (`:`) path syntax to access nested fields, and `::TYPE` for type casting:

```sql
-- Access the page_url field (string type) from properties
SELECT properties:page_url::STRING FROM SENSORSWAVE_EVENTS;

-- Access a numeric field
SELECT properties:item_count::INT FROM SENSORSWAVE_EVENTS;
```

**Q: What timezone is the `time` column in Snowflake?**

The `time` column uses `TIMESTAMP_NTZ` and stores the local time in SENSORSWAVE's server timezone, with no timezone offset. CN clusters typically use `Asia/Shanghai` (UTC+8); overseas clusters typically use `UTC`. To convert to another timezone, adjust the source timezone based on your actual cluster, for example (CN cluster):

```sql
CONVERT_TIMEZONE('Asia/Shanghai', 'America/New_York', time)
```

---

**Last updated**: April 13, 2026