Understanding Why COUNT(*) Can Be Slow in PostgreSQL.
A deep dive into how PostgreSQL handles row counting under the hood.
Oftentimes, we have to write database queries to fetch the count of rows in a table.
SELECT COUNT(*) from dummy_data;
While this seems like a harmless little query, today we will understand why this query can be slow in Postgres — and what can be done to speed it up.
Let’s look at the query plan to figure out how Postgres calculates the count.
+----------------------------------------------------------------------------------------------+
| QUERY PLAN |
|----------------------------------------------------------------------------------------------|
| Finalize Aggregate (cost=14542.55..14542.56 rows=1 width=8) |
| -> Gather (cost=14542.33..14542.54 rows=2 width=8) |
| Workers Planned: 2 |
| -> Partial Aggregate (cost=13542.33..13542.34 rows=1 width=8) |
| -> Parallel Seq Scan on dummy_data (cost=0.00..12500.67 rows=416667 width=0) |
+----------------------------------------------------------------------------------------------+
If you are like me, you might wonder: why is the planner using Sequential scan to calculate the row count?
Can’t postgres just keep the row count in a metadata? or maybe count the number of leaf nodes in the index? Well, to understand this, we would need a quick 101 on MVCC, how Postgres implements it, and that impacts row counting.
MVCC (Multi-version concurrency control)
Multi-version concurrency control is a mechanism that Postgres uses to maintain concurrent database access and maintain consistency. Instead of using traditional locks, which can block readers when writers are modifying the data, Postgres maintains multiple versions of the data (rows) in the table. It’s essentially an Optimistic concurrency control.
MVCC Implementation
Let’s first understand how the MVCC is implemented in postgres under the hood:
Each transactions in Postgres is assigned a monotonically incrementing 32 bit integer ID.
Each row has hidden system columns called xmin and xmax.
xmin denotes the transaction ID that created the row.
xmax, initially set to null, denotes the transaction ID that deleted or updated the row.
When a transaction starts, it can see rows where:
xmin < current_transaction_id
xmax is either null or > current_transaction_id
When a row is deleted, that particular row is not removed physically. It’s marked as dead by setting the xmax to the transaction id. The newer transactions won’t have access to that row.
When a row is updated, a new version of the row is created with the updated value. The xmax column in old version is set to current transaction id. The copy has xmin as the current transaction id and xmax as null.
The dead rows are not immediately removed. They might still be visible to other long running transactions. Postgres has a process called VACCUM, it’s job is to scan and remove the dead rows to reclaim the space. This is how postgres removes the leftover junk from old row versions. It keeps your database fast and lean.
Bottom line: At any moment, multiple versions of a row may exist. Visibility is dictated by
xmin
andxmax
. The row sticks around until VACUUM cleans it up.
So what does this mean for COUNT(*)
?
Now that we understand how multiple versions of a row can exist, here’s why Postgres can’t just store a row count in metadata:
A row visible in one transaction might not be visible in another transaction. Hence, it has to sequentially scan the entire table and calculate the count. A transaction running COUNT(*) statement must figure out which rows are visible and which rows are dead, so it can’t just blindly count the row slots.
This is why COUNT(*) becomes slow for large table size or tables with many dead rows.
Optimisations
Here are some ways to optimise row counting in PostgreSQL:
1. Add a Filter on an Indexed Column
The query can be optimised to use the index instead of completely scanning the table. If a filter is added on the indexed column, the planner will utilise the index structure to get the count. It will still need to access the actual row referenced by these indexes to verify the visibility of rows. In some cases, the Postgres can just utilise the Index only scan. This depends on the visibility map maintained by Postgres. If the visibility map confirms that all the rows in the page are already visible, the Postgres won’t need to read the rows for that page to check for visibility. But if we have certain pages with partial active and dead rows, then the Postgres would need to access the row to confirm the visibility.
Let’s look at the query plan that is generated while using the indexed column in WHERE clause. In this query, the filter is applied on the category column, which has a B-tree index.
explain select count(*) from dummy_data where category = 10;
+------------------------------------------------------------------------------------------------------------+
| QUERY PLAN |
|------------------------------------------------------------------------------------------------------------|
| Aggregate (cost=2309.77..2309.78 rows=1 width=8) |
| -> Index Only Scan using idx_dummy_data_category on dummy_data (cost=0.42..2063.10 rows=98667 width=0) |
| Index Cond: (category = 10) |
+------------------------------------------------------------------------------------------------------------+
As you can see, the planner relies on Index structure, rather than fully scanning the table. The choice to use Index Scan or Index Only scan depends on the visibility map.
2. Estimate the row count
We don’t always have to get the actual count. In some cases, getting an approximation is good enough. We can fetch current statistics from the pg_class catalog table to get the count.
SELECT reltuples::BIGINT as approx_count from pg_class WHERE relname = 'dummy_data';
This gives us an approximate count of the rows in a table. It is the postgres’s row estimates from the last ANALYZE. In case there has been lot’s of inserts and updates since last analyze, this data can be stale. But with autovacuum and auto-analyze enabled (as they usually are), this value is often close enough.
TL;DR
PostgreSQL uses MVCC, so multiple versions of rows exist.
Without any filtering conditions, Postgres usually performs a sequential scan to evaluate
COUNT(*).
Indexes and visibility maps can help optimize specific filtered counts.
For fast approximations, use statistics from
pg_class
.