Title: SQL Query Locale: en URL: https://sensorswave.com/en/docs/analytics/sql-query/ Description: Use SQL for flexible custom data queries SQL Query is the most flexible and powerful data analysis tool in Sensors Wave. When predefined analysis models cannot meet your needs, you can use SQL to directly query the underlying data for arbitrarily complex custom analysis. With SQL Query, you have full control over the analysis logic, easily handling any complex data analysis requirement. SQL Query is built on the Apache Doris query engine, supports standard SQL syntax, and provides a user-friendly editor and visualization tools for rapid query development, execution, and management. ## Typical Use Cases SQL Query can help you solve the following analysis problems: **Complex cross-table join analysis**: - Join user and event tables to analyze event performance across different user attribute groups - Join event tables with custom dimension tables to enrich event data dimensions - Multi-table JOINs for organizing complex analysis logic **Advanced data aggregation and statistics**: - Multi-dimensional pivoting and cross-analysis - Custom aggregate functions and window functions - Complex grouping, sorting, and filtering conditions **Cohort and audience analysis**: - Define Cohorts based on behavioral characteristics - Calculate cumulative user metrics and lifetime value - Implement complex user segmentation logic **Performance metrics and monitoring**: - Custom key business metric calculations - Real-time monitoring and alert data generation - Period-over-period benchmark analysis **Data quality and auditing**: - Detect data anomalies and duplicates - Query log analysis and performance diagnostics - User behavior compliance auditing ## Prerequisites Before using SQL Query, make sure you have: - Completed [SDK integration](../data-integration/client-sdks/javascript.mdx) and started receiving data - Understood the core concepts of the [Data Model](../data-integration/data-model.mdx) - Have view or analysis permissions for the project, see [Roles and User Permissions](../10-组织与项目/03-角色和用户权限.md) - Familiarity with basic SQL syntax (optional, but recommended) ## Quick Start ### Open the SQL Query Editor 1. Click **Insights** in the left navigation bar 2. Click the **New Analysis** button in the upper right corner 3. Select the **SQL Query** option 4. The system will open the SQL editor page ### Write and Execute Your First Query 1. In the left-side metadata browser, view available tables and fields 2. Enter your SQL statement in the upper-right editor 3. Click the **Execute** button (or use the shortcut `Ctrl+Enter`) to run the query 4. View the query results in the lower-right results area **Simple example**: Query the 10 most active users in the past 7 days ```sql SELECT login_id, COUNT(*) as event_count FROM events WHERE time >= DATE_SUB(NOW(), INTERVAL 7 DAY) AND login_id IS NOT NULL GROUP BY login_id ORDER BY event_count DESC LIMIT 10 ``` ### Save and Reuse Queries After writing a SQL query, you can: 1. Click the **Save** button to save the query with a name to "My Queries" 2. Click the **Visualize** button to convert results into a chart 3. Click the **Add to Report** button to add the chart to a report 4. Click the **Download** button to export query results as CSV or Excel files ## Interface Features in Detail ### 1. Left Side: Metadata Browser The left-side metadata browser displays all queryable tables and their column information, helping you quickly understand the data structure. #### Table List The system provides the following four tables by default: | Table Name | Description | Row Count Example | |-----|------|--------| | **events** | Event table, records all user behavior events | ~194 rows | | **users** | User table, stores user property information | ~94 rows | | **cohorts** | Cohort table, stores Cohort results | ~1 row | | **query_log** | Query log table, records all SQL query history | ~31 rows | > **Tip**: The numbers in parentheses indicate the row count in the table. These are illustrative only — actual row counts depend on your data volume. #### Field Information Click a table name to expand and view all its fields: - **Field name**: The column name - **Field type**: The column data type (e.g., varchar, int, datetime, etc.) - **Description**: The meaning and purpose of the field You can: - Click a field name to quickly copy it to the editor - Search fields to quickly locate the column you need - Review field data types to choose appropriate SQL functions ### 2. Upper Right: Editor and Execution Area The upper-right editor and execution area provides complete SQL editing and running capabilities. #### Feature Buttons **Editing features**: - **Undo/Redo**: Undo or redo editing operations - **Format**: Auto-format SQL statements for better readability - **Copy**: Copy SQL statements to the clipboard **Execution features**: - **Execute**: Run the current SQL query (shortcut: Ctrl+Enter) - **Stop**: Cancel a running query - **Save**: Save the query to the "My Queries" list **Visualization and sharing**: - **Visualize**: Convert query results into charts (Line, Column, etc.) - **Chart Settings**: Configure chart title, axes, colors, etc. - **Add to Report**: Add the generated chart to an existing report #### SQL Editor The editor supports the following advanced features: **Code completion**: - Keyword completion (SELECT, WHERE, GROUP BY, etc.) - Table name completion (events, users, etc.) - Field name completion (based on the current table context) **Syntax highlighting**: - SQL keywords displayed in blue - Strings displayed in red - Numbers displayed in green - Improves code readability **Error hints**: - When SQL syntax is incorrect, the editor shows a red wavy underline on the error line - Hover over the error to view detailed error information **Keyboard shortcuts**: - `Ctrl+Enter`: Execute query - `Ctrl+/`: Quick comment/uncomment - `Ctrl+Z`: Undo - `Ctrl+Y`: Redo ### 3. Lower Right: Query Results Area The lower-right results area displays the SQL query execution results. #### Results Table - **List display**: Displays query results in table format, with each row corresponding to a data record - **Auto-fit columns**: The system automatically adjusts column widths to fit content - **Column sorting**: Click column headers to sort ascending or descending - **Column show/hide**: Right-click column headers to hide or show specific columns #### Pagination and Download - **Pagination controls**: Query results are paginated by default (20 rows per page); select page numbers at the bottom to navigate - **Row count display**: Shows the current page's row range and total row count - **Download**: Click the **Download** button to export results as: - CSV format: Easy to open in Excel or other tools - Excel format: Auto-formatted tables - Image format: Export as PNG if results have been converted to a chart #### Execution Information After query execution, the system displays the following information: - **Execution time**: Query duration (milliseconds) - **Returned rows**: Number of rows in the query result - **Query ID**: System-assigned unique query ID for tracking in the query log ## Queryable Tables in Detail ### events Table (Event Table) The events table is the core of the data model, recording all user behavior events. For detailed field descriptions, see [Data Model - Events Table](../data-integration/data-model.mdx#事件表-events). **Core field quick reference**: | Field Name | Data Type | Description | Example | |-------|---------|------|------| | `time` | datetime | Event occurrence time | 2026-01-27 10:30:00 | | `event` | varchar | Event name | PageView, Purchase | | `event_id` | int | Event ID | 12345 | | `distinct_id` | varchar | Currently effective user unique identifier | abc123 | | `anon_id` | varchar | Anonymous ID (device identifier) | device_xyz | | `login_id` | varchar | Login ID (user account) | user_12345 | | `ssid` | bigint | Server-generated unique user ID | 9876543 | | `trace_id` | varchar | Event deduplication ID | trace_abc123 | | `received_at` | datetime | Server received time | 2026-01-27 10:30:05 | | `e_*` | various | Event Properties (prefixed with e_) | e_product_id, e_category | | `u_*` | various | User Property snapshot (prefixed with u_) | u_age, u_vip_level | **Common query examples**: ```sql -- Query trigger count and unique users for each event SELECT event, COUNT(*) as event_count, COUNT(DISTINCT ssid) as user_count FROM events WHERE time >= DATE_SUB(NOW(), INTERVAL 7 DAY) GROUP BY event ORDER BY event_count DESC; -- Query the last 10 actions of a specific user SELECT time, event, e_product_id, u_vip_level FROM events WHERE login_id = 'user_12345' ORDER BY time DESC LIMIT 10; -- Query the distribution of an Event Property value SELECT e_category, COUNT(*) as count FROM events WHERE event = 'PageView' AND time >= DATE_SUB(NOW(), INTERVAL 30 DAY) GROUP BY e_category ORDER BY count DESC; ``` ### users Table (User Table) The users table stores basic user information and property data. For detailed field descriptions, see [Data Model - Users Table](../data-integration/data-model.mdx#用户表-users). **Core field quick reference**: | Field Name | Data Type | Description | Example | |-------|---------|------|------| | `ssid` | bigint | Unique user ID (primary key) | 9876543 | | `login_id` | varchar | Login ID | user_12345 | | `anon_id` | array | Device ID list | ['device_xyz'] | | `$first_seen_time` | datetime | First Seen time | 2025-01-01 14:30:00 | | `$last_seen_time` | datetime | Last active time | 2026-01-27 09:15:00 | | `created_at` | datetime | User record creation time | 2025-01-01 14:30:05 | | `updated_at` | datetime | User record last update time | 2026-01-27 09:15:05 | | `u_*` | various | User Properties (prefixed with u_) | u_age, u_vip_level | **Common query examples**: ```sql -- Query active user statistics SELECT COUNT(*) as total_users, COUNT(DISTINCT login_id) as active_users, ROUND(DATEDIFF(NOW(), AVG($first_seen_time))) as avg_days_active FROM users WHERE $last_seen_time >= DATE_SUB(NOW(), INTERVAL 30 DAY); -- Query user registration source distribution SELECT u_signup_channel, COUNT(*) as user_count, ROUND(100.0 * COUNT(*) / (SELECT COUNT(*) FROM users), 2) as percentage FROM users GROUP BY u_signup_channel ORDER BY user_count DESC; -- Query new vs. existing user comparison SELECT CASE WHEN DATEDIFF(NOW(), $first_seen_time) | List of event names involved in the query | | `cohorts` | array | List of Cohort IDs involved in the query | | `event_properties` | array | List of Event Properties involved in the query | | `user_properties` | array | List of User Properties involved in the query | | `hit_sql_cache` | tinyint | Whether SQL function cache was hit | | `error_code` | int | Doris error code (0 on success) | | `error_message` | varchar | Doris error message | **Common query examples**: ```sql -- Query the 10 slowest queries SELECT query_id, start_time, cost_time_ms, account, status FROM query_log WHERE status = 'SUCCESS' ORDER BY cost_time_ms DESC LIMIT 10; -- Analyze user query patterns SELECT account, COUNT(*) as query_count, ROUND(AVG(cost_time_ms)) as avg_cost_ms, MAX(cost_time_ms) as max_cost_ms, SUM(CASE WHEN status = 'SUCCESS' THEN 1 ELSE 0 END) as success_count FROM query_log WHERE start_time >= DATE_SUB(NOW(), INTERVAL 7 DAY) GROUP BY account ORDER BY query_count DESC; -- Query high memory consumption queries SELECT query_id, start_time, query_type, cost_time_ms, peak_memory_bytes, scan_rows, return_rows FROM query_log WHERE peak_memory_bytes > 1024 * 1024 * 1024 -- Greater than 1GB AND status = 'SUCCESS' ORDER BY peak_memory_bytes DESC LIMIT 20; -- Analyze query cache hit rate SELECT DATE(start_time) as query_date, COUNT(*) as total_queries, SUM(CASE WHEN cache_hit = true THEN 1 ELSE 0 END) as cache_hit_count, ROUND(100.0 * SUM(CASE WHEN cache_hit = true THEN 1 ELSE 0 END) / COUNT(*), 2) as cache_hit_rate FROM query_log WHERE start_time >= DATE_SUB(NOW(), INTERVAL 30 DAY) GROUP BY DATE(start_time) ORDER BY query_date DESC; ``` ## Table Join Examples ### Joining Events and Users Tables Join events and user properties for analysis: ```sql SELECT e.login_id, u.u_vip_level, u.u_city, COUNT(*) as event_count, COUNT(DISTINCT e.event) as event_variety FROM events e INNER JOIN users u ON e.ssid = u.ssid WHERE e.time >= DATE_SUB(NOW(), INTERVAL 30 DAY) GROUP BY e.login_id, u.u_vip_level, u.u_city ORDER BY event_count DESC LIMIT 20; ``` ### Joining Events and Cohorts Tables Analyze event performance for a specific Cohort: ```sql SELECT e.event, COUNT(*) as event_count, COUNT(DISTINCT e.ssid) as user_count FROM events e INNER JOIN cohorts c ON e.ssid = c.ssid WHERE c.cohort_123 = true -- Assuming Cohort ID is 123 AND e.time >= DATE_SUB(NOW(), INTERVAL 7 DAY) GROUP BY e.event ORDER BY event_count DESC; ``` ### Multi-table Join Join events, users, and Cohorts for deep analysis: ```sql SELECT u.u_city, c.cohort_name, COUNT(*) as event_count, ROUND(AVG(e.e_purchase_amount), 2) as avg_amount FROM events e INNER JOIN users u ON e.ssid = u.ssid INNER JOIN cohorts c ON e.ssid = c.ssid WHERE e.event = 'Purchase' AND e.time >= DATE_SUB(NOW(), INTERVAL 30 DAY) GROUP BY u.u_city, c.cohort_name ORDER BY event_count DESC; ``` ## SQL Best Practices ### 1. Query Performance Optimization **Use time range filters**: ```sql -- [Recommended] Good practice: Explicitly specify time range SELECT * FROM events WHERE time >= DATE_SUB(NOW(), INTERVAL 30 DAY) AND event = 'Purchase'; -- [Avoid] Avoid: Querying all data SELECT * FROM events WHERE event = 'Purchase'; ``` **Use indexed fields**: ```sql -- [Recommended] Prioritize indexed fields (time, event, ssid, etc.) for filtering SELECT * FROM events WHERE time >= '2026-01-01' AND event = 'PageView' AND ssid = 12345; -- [Avoid] Avoid: Filtering only on high-cardinality fields (e_product_id) SELECT * FROM events WHERE e_product_id = 'SKU123'; ``` **Use GROUP BY wisely**: ```sql -- [Recommended] Good practice: Group by low-cardinality fields SELECT event, COUNT(*) FROM events GROUP BY event; -- [Avoid] Avoid: Grouping by high-cardinality fields SELECT e_product_id, COUNT(*) FROM events GROUP BY e_product_id; -- May produce millions of groups ``` ### 2. Data Accuracy **Handle null values properly**: ```sql -- [Recommended] Use IS NOT NULL or IS NULL to explicitly handle null values SELECT COUNT(DISTINCT ssid) FROM events WHERE login_id IS NOT NULL AND time >= DATE_SUB(NOW(), INTERVAL 7 DAY); -- [Note] Note: Null values may be ignored during aggregation SELECT COUNT(DISTINCT e_product_id) FROM events; -- Behavior when NULL values exist ``` **Use DISTINCT for deduplication**: ```sql -- [Recommended] Count distinct users SELECT COUNT(DISTINCT ssid) FROM events; -- [Avoid] Avoid: Direct count may include duplicates SELECT COUNT(ssid) FROM events; -- This includes duplicate users ``` **Validate data consistency**: ```sql -- [Recommended] Periodically check event-user table relationships SELECT COUNT(*) as orphaned_events FROM events e LEFT JOIN users u ON e.ssid = u.ssid WHERE u.ssid IS NULL; -- Check for orphaned events (no corresponding user) ``` ### 3. Code Readability **Use meaningful column aliases**: ```sql -- [Recommended] Use meaningful aliases SELECT login_id as user_account, COUNT(*) as total_events, COUNT(DISTINCT event) as event_types FROM events GROUP BY login_id; -- [Avoid] Avoid: Meaningless or overly long aliases SELECT login_id as a, COUNT(*) as b FROM events GROUP BY login_id; ``` **Add meaningful comments**: ```sql -- Query high-frequency search users in the past 7 days SELECT login_id, COUNT(*) as search_count FROM events WHERE event = 'Search' -- Only count search events AND time >= DATE_SUB(NOW(), INTERVAL 7 DAY) -- Limit time range AND login_id IS NOT NULL -- Exclude anonymous users GROUP BY login_id HAVING COUNT(*) >= 10 -- Only show users with 10+ searches ORDER BY search_count DESC; ``` **Use WHERE and HAVING appropriately**: ```sql -- [Recommended] WHERE for row-level filtering, HAVING for aggregate filtering SELECT u_city, COUNT(*) as user_count FROM users WHERE u_vip_level = 'gold' -- Row-level conditions use WHERE GROUP BY u_city HAVING COUNT(*) >= 100; -- Aggregate conditions use HAVING ``` ### 4. Common Pitfalls **About NULL values in login_id**: ```sql -- [Note] Note: login_id is NULL for users who haven't logged in -- If you need to analyze logged-in users, explicitly filter: SELECT login_id, COUNT(*) FROM events WHERE login_id IS NOT NULL -- Important: Exclude non-logged-in users GROUP BY login_id; ``` **About using distinct_id**: ```sql -- [Tip] Tip: distinct_id is the currently effective identifier (Login ID takes priority, otherwise Anonymous ID) -- If you need to distinguish logged-in and non-logged-in users: SELECT CASE WHEN login_id IS NOT NULL THEN login_id ELSE anon_id END as user_identifier, COUNT(*) FROM events GROUP BY user_identifier; ``` **Avoid duplicate counting**: ```sql -- [Wrong] Wrong: Multiple COUNT(DISTINCT) may produce inaccurate results -- (Be especially careful in complex queries involving JOINs) SELECT event, COUNT(DISTINCT ssid), -- May be inaccurate with JOINs COUNT(DISTINCT e_product_id) FROM events GROUP BY event; -- [Recommended] Correct approach: Use subqueries or window functions SELECT event, COUNT(DISTINCT ssid) as user_count, COUNT(DISTINCT e_product_id) as product_count FROM events GROUP BY event; ``` ## Analysis Pattern Examples ### User Lifecycle Analysis ```sql -- Analyze user lifecycle value SELECT ssid, login_id, DATEDIFF(NOW(), $first_seen_time) as days_active, DATEDIFF($last_seen_time, $first_seen_time) as lifespan_days, COUNT(DISTINCT DATE(time)) as active_days, COUNT(*) as total_events, COUNT(DISTINCT event) as event_types FROM events WHERE $first_seen_time IS NOT NULL GROUP BY ssid, login_id, $first_seen_time, $last_seen_time ORDER BY total_events DESC LIMIT 100; ``` ### Conversion Funnel Analysis ```sql -- Analyze user conversion funnel (View → Add to Cart → Purchase) WITH funnel_steps AS ( SELECT ssid, MAX(CASE WHEN event = 'PageView' THEN 1 ELSE 0 END) as has_view, MAX(CASE WHEN event = 'AddCart' THEN 1 ELSE 0 END) as has_addcart, MAX(CASE WHEN event = 'Purchase' THEN 1 ELSE 0 END) as has_purchase FROM events WHERE time >= DATE_SUB(NOW(), INTERVAL 30 DAY) GROUP BY ssid ) SELECT COUNT(*) as total_users, SUM(has_view) as view_users, SUM(has_addcart) as addcart_users, SUM(has_purchase) as purchase_users, ROUND(100.0 * SUM(has_addcart) / SUM(has_view), 2) as addcart_rate, ROUND(100.0 * SUM(has_purchase) / SUM(has_view), 2) as purchase_rate FROM funnel_steps; ``` ### User Retention Analysis ```sql -- Analyze user Day-7 retention rate WITH first_active AS ( SELECT ssid, DATE($first_seen_time) as cohort_date FROM users ) SELECT cohort_date, COUNT(DISTINCT u.ssid) as cohort_size, SUM(CASE WHEN DATE(e.time) = DATE_ADD(DATE(u.$first_seen_time), INTERVAL 7 DAY) THEN 1 ELSE 0 END) as day_7_active, ROUND(100.0 * SUM(CASE WHEN DATE(e.time) = DATE_ADD(DATE(u.$first_seen_time), INTERVAL 7 DAY) THEN 1 ELSE 0 END) / COUNT(DISTINCT u.ssid), 2) as retention_rate FROM users u LEFT JOIN events e ON u.ssid = e.ssid GROUP BY cohort_date ORDER BY cohort_date DESC LIMIT 30; ``` ## Important Notes ### Query Limits - **Timeout**: Individual query execution time must not exceed 10 minutes; queries exceeding this are automatically terminated - **Result rows**: Maximum of 10,000 rows per query (controllable via LIMIT) - **Concurrent queries**: A single account can run up to 5 queries simultaneously ### Data Security - **Query logs**: All SQL queries are recorded in the query_log table for auditing - **Permission control**: Users can only query data from projects they have access to - **Sensitive data**: Avoid including passwords, ID numbers, and other sensitive information in queries ### Performance Recommendations - **Avoid full table scans**: Always include time range restrictions in the WHERE clause - **Limit returned rows**: Use LIMIT for large data volume queries - **Use caching**: Frequently executed identical queries are automatically cached for faster response - **Monitor resources**: Monitor CPU, memory, and I/O resource usage to avoid overload ### Data Latency - **Freshness**: Event data is typically written to the database within 1-5 minutes - **Cohort data**: Cohort updates may have a 5-10 minute delay - **Query logs**: Complete query log records may have approximately 1-minute delay ## Related Documentation **Data model and concepts**: - [Data Model](../data-integration/data-model.mdx): Learn about the detailed structure of event and user tables - [Events and Properties](../data-integration/events-and-properties.mdx): Understand Event Property and User Property definitions **SQL syntax reference**: - [Apache Doris SQL Documentation](https://doris.apache.org/zh-CN/docs/3.x/sql-manual/sql-statements/data-query/SELECT): Complete SQL syntax reference and function manual **Related analysis**: - [Choosing the Right Analysis Model](choosing-analysis-model.mdx): Understand when to use SQL vs. predefined models - [Event Analysis](event-analysis.mdx): Analyze event data using the UI - [Cohorts](../05-受众计算/02-用户分群.md): Create Cohorts **Visualization and reports**: - [Chart Management](../04-可视化与报表/01-图表管理.md): Visualize SQL query results - [Report Management](../04-可视化与报表/02-报表管理.md): Add charts to reports ## Next Steps After completing SQL queries, you can: 1. **Visualize query results**: Convert query results into Line charts, Column charts, etc. for more intuitive data presentation 2. **Add to reports**: Add important query results and charts to reports for regular monitoring 3. **Save as Cohort**: Create Cohorts based on SQL query results for targeted marketing 4. **Set up alerts**: Set alert rules based on query results to promptly detect data anomalies 5. **View best practices**: See [Best Practices](best-practices.mdx) to improve analysis efficiency 6. **Resolve common issues**: If you encounter problems, check the [FAQ](faq.mdx) --- **Last updated**: January 19, 2026