Why SELECT * is bad for SQL performance (2020)

Reasons why SELECT * is bad for SQL performance | Tanel Poder Blog

Reasons why SELECT * is bad for SQL performance

2020-11-24

Here’s a list of reasons why SELECT * is bad for SQL performance, assuming that your application doesn’t actually need all the columns. When I write production code, I explicitly specify the columns of interest in the select-list (projection), not only for performance reasons, but also for application reliability reasons. For example, will your application’s data processing code suddenly break when a new column has been added or the column order has changed in a table?

I’ll focus only on the SQL performance aspects in this article. I’m using examples based on Oracle, but most of this reasoning applies to other modern relational databases too.

Index

Increased network traffic

Increased CPU usage on client side

Some query plan optimizations not possible

Server-side memory usage

Increased CPU usage on server side

Hard parsing/optimization takes more time

Cached cursors take more memory in shared pool

LOB Fetching

Summary

Increased network traffic

This is the most obvious effect - if you’re returning 800 columns instead of 8 columns from every row, you could end up sending 100x more bytes over the network for every query execution (your mileage may vary depending on the individual column lengths, of course). More network bytes means more network packets sent and depending on your RDBMS implementation, also more app-DB network roundtrips.

Oracle can stream result data of a single fetch call back to client in multiple consecutive SQL*Net packets sent out in a burst, without needing the client application to acknowledge every preceding packet first. The throughput of such bursts depends on TCP send buffer size and of course the network link bandwidth and latency. Read more about the SQL*Net more data to client wait event.

SQL> SET AUTOT TRACE STAT SQL> SELECT * FROM soe_small.customers;

1699260 rows selected.

Elapsed: 00:01:35.82

Statistics 0 recursive calls 0 db block gets 45201 consistent gets 0 physical reads 0 redo size 169926130 bytes sent via SQL*Net to client 187267 bytes received via SQL*Net from client 16994 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1699260 rows processed

It took around 1 min 35 seconds and 169 MB of data was sent from the database back to the client (roughly 100 bytes per row, on average). Interestingly, the rough row length estimate from data dictionary stats shows that an average row size ought to be 119 bytes (116 plus 3 bytes for the row header, lock byte & column count):

SQL> SELECT COUNT(*),SUM(avg_col_len) FROM dba_tab_columns WHERE owner = 'SOE_SMALL' AND table_name = 'CUSTOMERS';

COUNT(*) SUM(AVG_COL_LEN) 16 116

The above table has only 16 columns, now let’s just select 3 columns that my application needs:

SQL> SELECT customer_id, credit_limit, customer_since FROM soe_small.customers;

1699260 rows selected.

Elapsed: 00:00:43.20

Statistics 0 recursive calls 0 db block gets 45201 consistent gets 0 physical reads 0 redo size 31883155 bytes sent via SQL*Net to client 187307 bytes received via SQL*Net from client 16994 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1699260 rows processed

So, selecting only 3 columns out of 16 has given me over 2x better query response time (1m 35sec vs 43 sec). The sqlplus Elapsed metric includes the time it took to execute the query on the DB server and to fetch all its records from to the client side, so the network latency, throughput and TCP send buffer configuration will affect it.

Oracle can deduplicate repetitive field values within a result set of every fetch call, so if you need to fetch a lot of rows & columns and save network bandwidth (say, copying data from New York to Singapore over a database link), you could maximize this “compression” by ordering the query resultset by the most repetitive (least distinct values) columns that are also wide.

SQL> SELECT * FROM soe_small.customers ORDER BY customer_class,nls_territory,nls_language,cust_first_name;

1699260 rows selected.

Elapsed: 00:01:09.23

Statistics 0 recursive calls 0 db block gets 28478 consistent gets 0 physical reads 0 redo size 65960489 bytes sent via SQL*Net to client 187334 bytes received via SQL*Net from client 16994 SQL*Net roundtrips to/from client 1 sorts (memory) 0 sorts (disk) 1699260 rows processed

The test above is a SELECT * again, sorted by a few VARCHAR2 columns that were 10-40 bytes (max) size, with lots of repetitive values. Only about 65 MB were sent by the server after its SQL*Net protocol-level deduplication. Note that the SQL*Net roundtrips to/from client value is the same for all test runs above, this is because my fetch arraysize has been set to 100 in my application. The arraysize controls how many fetch calls you end up sending over the network for data retrieval, every fetch after the 1st one requests...

Why SELECT * is bad for SQL performance (2020)

Related Articles

US Government directive to suspend access to Fable 5 and Mythos 5

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI

Britain Became as Poor as Mississippi