Get query performance insights
This document describes how to use the query execution graph to diagnose query performance issues, and to see query performance insights.
BigQuery offers strong query performance, but it is also a complex distributed system with many internal and external factors that can affect query speed. The declarative nature of the SQL language can also hide the complexity of query execution. This means that when your queries are running slower than anticipated, or slower than prior runs, understanding what happened can be a challenge.
The query execution graph provides an intuitive interface for inspecting query performance details. By using it, you can review the query plan information in graphical format for any query, whether running or completed.
You can also use the query execution graph to get performance insights for queries. Performance insights provide best-effort suggestions to help you improve query performance. Since query performance is multi-faceted, performance insights might only provide a partial picture of the overall query performance.
Required permissions
To use the query execution graph, you must have the following permissions:
bigquery.jobs.get
bigquery.jobs.listAll
These permissions are available through the following BigQuery predefined Identity and Access Management (IAM) roles:
roles/bigquery.admin
roles/bigquery.resourceAdmin
roles/bigquery.resourceEditor
roles/bigquery.resourceViewer
View query performance insights
Console
Follow these steps to see query performance insights:
Open the BigQuery page in the Google Cloud console.
In the Editor, click either Personal History or Project History.
In the list of jobs, identify the query job that interests you. Click
Actions, and choose Open query in editor.Select the Execution Graph tab to see a graphical representation of each stage of the query:
To determine if a query stage has performance insights, look at the icon it displays. Stages that have an
information icon have performance insights. Stages that have a check icon don't.Click a stage to open the stage details pane, where you can see the following information:
- Query plan information for the stage.
- The steps executed in the stage.
- Any applicable performance insights.
Optional: If you are inspecting a running query, click
Sync to update the execution graph so that it reflects the query's current status.Optional: To highlight the top stages by stage duration on the graph, click Highlight top stages by duration.
Optional: To highlight the top stages by slot time used on the graph, click Highlight top stages by processing.
Optional: To include shuffle redistribution stages on the graph, click Show shuffle redistribution stages.
Use this option to show the repartition and coalesce stages that are hidden in the default execution graph.
Repartition and coalesce stages are introduced while the query is running, and are used to improve data distribution across the workers processing the query. Since these stages are not related to your query text, they are hidden to simplify the query plan that is displayed.
For any query that has performance regression issues, performance insights are also displayed on the Job Information tab for the query:
SQL
In the Google Cloud console, go to the BigQuery page.
In the query editor, enter the following statement:
SELECT `bigquery-public-data`.persistent_udfs.job_url( project_id || ':us.' || job_id) AS job_url, query_info.performance_insights FROM `region-REGION_NAME`.INFORMATION_SCHEMA.JOBS_BY_PROJECT WHERE DATE(creation_time) >= CURRENT_DATE - 30 -- scan 30 days of query history AND job_type = 'QUERY' AND state = 'DONE' AND error_result IS NULL AND statement_type != 'SCRIPT' AND EXISTS ( -- Only include queries which had performance insights SELECT 1 FROM UNNEST( query_info.performance_insights.stage_performance_standalone_insights ) WHERE slot_contention OR insufficient_shuffle_quota UNION ALL SELECT 1 FROM UNNEST( query_info.performance_insights.stage_performance_change_insights ) WHERE input_data_change.records_read_diff_percentage IS NOT NULL );
Click
Run.
For more information about how to run queries, see Run an interactive query.
API
You can get query performance insights in a non-graphical format by
calling the jobs.list
API method and inspecting the
JobStatistics2
information that is returned.
Interpret query performance insights
Use this section to learn more about what performance insights mean and how to address them.
Performance insights are intended for two audiences:
Analysts: you run queries in a project. You are interested in finding out why a query you have run before is unexpectedly running slower, and in getting tips on how to improve a query's performance. You have the permissions described in Required permissions.
Data lake or data warehouse administrators: you manage your organization's BigQuery resources and reservations. You have the permissions associated with the BigQuery Admin role.
Each of the following sections provides guidance on what you can do to address a performance insight you receive, based on which of these roles you occupy.
Slot contention
When you run a query, BigQuery attempts to break up the work needed by your query into tasks. A task is a single slice of data that is input into and output from a stage. A single slot picks up a task and executes that slice of data for the stage. Ideally, BigQuery slots execute these tasks in parallel to achieve high performance. Slot contention occurs when your query has many tasks ready to start executing, but BigQuery can't get enough available slots to execute them.
What to do if you're an analyst
Reduce the data you are processing in your query by following the guidance in Reduce data processed in queries.
What to do if you're an administrator
Increase slot availability or decrease slot usage by taking the following actions:
- If you use BigQuery's on-demand pricing, your queries use a shared pool of slots. Consider switching to capacity-based analysis pricing by purchasing reservations instead. Reservations let you reserve dedicated slots for your organization's queries.
If you are using BigQuery reservations, ensure that there are enough slots in the reservation that is assigned to the project that was running the query. The reservation might not have enough slots in these scenarios:
- There are other jobs that are consuming reservation slots. You can use Admin Resource Charts to see how your organization is using the reservation.
- The reservation does not have enough assigned slots to run queries fast enough. You can use the slot estimator to get an estimate of how large your reservations should be to efficiently process your queries' tasks.
To address this, you can try one of the following solutions:
- Add more slots (either baseline slots or max reservation slots) to that reservation.
- Create an additional reservation and assign it to the project running the query.
- Spread out resource-intensive queries, either over time within a reservation or over different reservations.
Ensure that the tables you are querying are clustered. Clustering helps to ensure that BigQuery can quickly read columns with correlated data.
Ensure that the tables you are querying are partitioned. For unpartitioned tables, BigQuery reads the entire table. Partitioning your tables helps ensure that you query only the subset of your tables that you are interested in.
Insufficient shuffle quota
Before running your query, BigQuery breaks up your query's logic into stages. BigQuery slots execute the tasks for each stage. When a slot completes the execution of a stage's tasks, it stores the intermediate results in shuffle. Subsequent stages in your query read data from shuffle to continue your query's execution. Insufficient shuffle quota occurs when you have more data that needs to get written to shuffle than you have shuffle capacity.
What to do if you're an analyst
Similarly to slot contention, reducing the amount of data that your query processes might reduce shuffle usage. To do this, follow the guidance in Reduce data processed in queries.
Certain operations in SQL tend to make more extensive usage of shuffle,
particularly
JOIN
operations
and GROUP BY
clauses.
Where possible, reducing the amount of data in these operations might reduce shuffle usage.
What to do if you're an administrator
Reduce shuffle quota contention by taking the following actions:
- Similarly to slot contention, if you use BigQuery's on-demand pricing, your queries use a shared pool of slots. Consider switching to capacity-based analysis pricing by purchasing reservations instead. Reservations give you dedicated slots and shuffle capacity for your projects' queries.
If you are using BigQuery reservations, slots come with dedicated shuffle capacity. If your reservation is running some queries that make extensive use of shuffle, this might cause other queries running in parallel to not get enough shuffle capacity. You can identify which jobs use shuffle capacity extensively by querying the
period_shuffle_ram_usage_ratio
column in theINFORMATION_SCHEMA.JOBS_TIMELINE
view.To address this, you can try one or more of the following solutions:
- Add more slots to that reservation.
- Create an additional reservation and assign it to the project running the query.
- Spread out shuffle-intensive queries, either over time within a reservation or over different reservations.
Data input scale change
Getting this performance insight indicates that your query is reading at least 50% more data for a given input table than the last time you ran the query. You can use table change history to see if the size of any of the tables used in the query has recently increased.
What to do if you're an analyst
Reduce the data you are processing in your query by following the guidance in Reduce data processed in queries.
High cardinality join
When a query contains a join with non-unique keys on both sides of the join, the size of the output table can be considerably larger than the size of either of the input tables. This insight indicates that the ratio of output rows to input rows is high and offers information about these row counts.
What to do if you're an analyst
Check your join conditions to confirm that the increase in the size of the
output table is expected. Avoid using
cross joins.
If you must use a cross join, try using a GROUP BY
clause to pre-aggregate
results, or use a window function. For more information, see
Reduce data before using a JOIN
.
Partition skew
To provide feedback or request support with this feature, send email to
[email protected]
.
Skewed data distribution can cause queries to run slowly. When a query is executed, BigQuery splits data into small partitions. You can't share partitions between slots. Therefore, if the data is unevenly distributed, some partitions become very large, which crashes the slot that processes the oversized partition.
Skew occurs in JOIN
stages. When you run a JOIN
operation,
BigQuery splits the data on the right side and left side of the
JOIN
operation into partitions. If a partition is too large, the data is
rebalanced by repartition stages. If the skew is too bad and
BigQuery cannot rebalance further, a partition skew insight is
added to the 'JOIN' stage. This process is known as repartition stages. If
BigQuery detects any large partitions that cannot be split
further, a partition skew insight is added to the JOIN
stage.
What to do if you're an analyst
To avoid partition skew, filter your data as early as possible.
Interpret query stage information
In addition to using query performance insights, you can also use the following guidelines when you are reviewing query stage details to help determine if there is an issue with a query:
- If the Wait ms value for one or more stages is high compared to previous
runs of the query:
- See if you have enough slots available to accommodate your workload. If not, load-balance when you run resource-intensive queries so they don't compete with each other.
- If the Wait ms value is higher than it has been for just one stage, look at the stage prior to it to see if a bottleneck has been introduced there. Things like substantial changes to the data or schema of the tables involved in the query might affect the query performance.
- If the Shuffle output bytes value for a stage is high compared to
previous runs of the query, or compared to a previous stage, evaluate the
steps processed in that stage to see if any create unexpectedly large amounts
of data. One common cause for this is when a step processes an
INNER JOIN
where there are duplicate keys on both sides of the join. This can return an unexpectedly large amount of data. - Use the execution graph to look at the top stages by duration and processing. Consider the amount of data they produce and whether it is commensurate with the size of the tables referenced in the query. If it isn't, review the steps in those stages to see if any of them might produce an unexpected amount of interim data.
What's next
- Review the query optimization guidelines for tips on improving query performance.