Debugging Postgres query times

Debugging Postgres query times - EXPLAIN ANALYZE

The Context

Working on a Django + Postgres project that does some memory intensive calculations in a database table. Postgres is hosted on AWS RDS service, with a local development environment hosted in a Docker container.

The Problem

Locally, with around half a million records in my database, my query runs quickly (order of seconds). On our staging platform (AWS), our database has an order of magnitude more records to select from, leading to a query which takes 45 minutes to run. It seems the scaling of our query is inadequate and we wanted to do a quick profiling of the query to check what was happening.

The Solution

Once we had the plaintext query that was being generated by our code, I connected to my database locally via Postico, and used the SQL editor to run the raw query manually on the correct database.

You can prefix a query with EXPLAIN ANALYSE and Postgres will profile the query (e.g. EXPLAIN ANALYSE SELECT * FROM ...). This also works in MySQL.

I can copy this output into an online analyser like this one, which can outline which parts of the query are slow, and which are high-cost:

Additionally, I wanted to check if the first filter condition had an index to improve performance on the initial discriminating select. If there was no index on my biggest discriminator, a quick win would be adding one.

I connected to my docker image:

docker-compose ps → check which containers are running

docker-compose exec db_container bash → run bash on the container with the database

I connect to the Postgres client: psql -U USERNAME -W DATABASE.

I can describe all the tables: \dt.

I can inspect the schema of the one that I want to look at\d table_name.

At the bottom, I see a list of indexes which tells me I have the correct (btree) one available.

The Code

Here’s an example of EXPLAIN ANALYZE:


PREPARE query(int, int) AS SELECT sum(bar) FROM test
    WHERE id > $1 AND id < $2
    GROUP BY foo;

EXPLAIN ANALYZE EXECUTE query(100, 200);

                  ----------------------- OUTPUT -----------------------
 HashAggregate  (cost=9.54..9.54 rows=1 width=8) (actual time=0.156..0.161 rows=11 loops=1)
   Group Key: foo
   ->  Index Scan using test_pkey on test  (cost=0.29..9.29 rows=50 width=8) (actual time=0.039..0.091 rows=99 loops=1)
         Index Cond: ((id > $1) AND (id < $2))
 Planning time: 0.197 ms
 Execution time: 0.225 ms
(6 rows)

Here’s what the schema description looks like after running \d table_name:


Column               |           Type           | Collation | Nullable | Default
---------------------+--------------------------+-----------+----------+--------
 uuid                | uuid                     |           | not null |
 created_at          | timestamp with time zone |           | not null |
 updated_at          | timestamp with time zone |           | not null |

...

Indexes:
    "table_name_pkey" PRIMARY KEY, btree (uuid)
		...
    "auto_generated_c5c578_idx" btree (property_i_want_to_see_an_index_on)