SQL Alerting in Cloud Monitoring Observability

mjs061 pts0 comments

Alert with SQL in Cloud Monitoring Observability Analytics | Google Cloud Blog<br>Contact sales Get started for free

Management Tools

From query to action: Introducing SQL alerting in Cloud Monitoring Observability Analytics

June 26, 2026

Joy Wang<br>Senior Product Manager

Mark Stahl<br>Staff software engineer

Try Gemini Enterprise Business Edition today<br>The front door to AI in the workplace<br>Try now

Traditional alerting systems often force a compromise: you can either alert immediately on simple, noisy log events, or monitor rigid, pre-configured metrics that fail when faced with data with many unique answers like user sessions or IP addresses. But the most critical system issues — like a 20% spike in error rates for a specific customer or a latency anomaly correlated with database timeouts — are hidden in the aggregates and relationships between these signals.

Recently, we announced that you can now use SQL to query logs and traces in Observability Analytics (formerly Log Analytics). But the story gets better. You can also use SQL to create alerts in Observability Analytics. By bringing SQL directly to your alerting engine, you can write complex analytical queries over logs and traces and turn them into alerts. Whether you need to calculate error percentages, analyze high-cardinality dimensions, or JOIN logs and traces, SQL alerting helps you go from basic threshold monitoring to deep, contextual detection that goes beyond the capabilities of traditional alerting systems. SQL alerting is now in preview.

How SQL-based alerting works<br>SQL alerting in Observability Analytics is available as part of Cloud Monitoring. An alerting policy runs your SQL query on a schedule you define (for example, every 10 minutes). It automatically applies a "lookback window" to your query, so it only analyzes the log entries or trace spans it received since the last time it ran.<br>If the results of your query meet the condition you set, Cloud Monitoring creates an incident and sends a notification to your chosen channels, like email, Slack, or PagerDuty.<br>Please note that because SQL-based alerting uses BigQuery to process telemetry data, query executions are billed through BigQuery under your standard on-demand pricing or BigQuery reservations.<br>Two ways to trigger an alert<br>You can choose between two types of alert conditions.<br>Row count threshold: This is the simplest option. The alert fires if your query returns a number of rows that is greater than, equal to, or less than a threshold you set. This is perfect for "alert me if more than 10 users have failed logins" scenarios.<br>Boolean: This is the most powerful option. The alert fires if your query returns any row where a specific column you define has a value of true. This lets you build complex logic, like calculating percentages, directly in your SQL query.<br>Example 1: Alerting on payment gateway failures (row count)<br>Scenario: Imagine that you’re an e-commerce operator, and you want to be alerted immediately if your payment gateway is experiencing systemic outages, while ignoring occasional, normal card declines (like an incorrect PIN).<br>To do this, you can write a query to filter for log entries indicating gateway timeouts, and use a row count threshold to trigger the alert only if the volume of these errors spikes.

Loading...

SELECT<br>JSON_VALUE(json_payload.transaction_id) AS transaction_id,<br>JSON_VALUE(json_payload.error_code) AS error_code<br>FROM<br>`my-project-id.my-dataset.my-log-view`<br>WHERE<br>JSON_VALUE(json_payload.status) = 'FAILED'<br>-- Filter for systemic gateway issues, not user-input errors like WRONG_PIN<br>AND JSON_VALUE(json_payload.failure_reason) = 'GATEWAY_TIMEOUT'

Alert configuration:

Condition type: Row count threshold

Trigger condition: Fired when row counts greater than (>) 10

Evaluation window / lookback: 5 minutes (checks the last 5 minutes of data on your defined schedule)

Example 2: Alerting on agent latency (traces)

Scenario: You’re an AI platform engineer, and you want to ensure your multi-step AI agents are responding within acceptable time limits. You want to monitor the 99th percentile (p99) latency of the orchestrator service and get alerted if performance degrades.

To do this, you can write a SQL query against your trace data that calculates the p99 latency for all services and returns true if your agent-orchestrator exceeds 5 seconds (5000 milliseconds).

5000) AS has_latency_spike<br>FROM<br>latency_data" soy-skip ssk='4:dXrt'><br>Loading...

WITH latency_data AS (<br>SELECT<br>APPROX_QUANTILES(duration_nano, 100)[OFFSET(99)] / 1000000 AS p99_ms<br>FROM<br>`my-project-id.us._Trace.Spans._AllSpans`<br>WHERE<br>-- Examine rows produced by the agent-orchestrator<br>JSON_VALUE(resource.attributes, '$."service.name"') = 'agent-orchestrator'<br>GROUP BY<br>service_name<br>SELECT<br>"agent-orchestrator" AS service_name,<br>p99_ms,<br>-- Boolean logic: Alert if p99 exceeds 5000ms<br>(p99_ms > 5000) AS has_latency_spike<br>FROM<br>latency_data

Alert configuration:

Condition type: Boolean

Target column:...

alerting alert query cloud monitoring observability

Related Articles