Postgres uses a mechanism called MVCC(Multi Version Concurrency Control) to track changes in your database. Due to this reason, some of the rows become “dead”. Dead rows are generated by DELETE and UPDATE operations, as well as transactions that have to be rolled back. Refer this link learn more about MVCC.
These dead rows keep on adding as there are lots of updates and deletes. Periodic clean up of these dead rows is necessary to not only save space but also maintain the performance of your database queries. The space taken by these dead rows is called bloat. You can check the bloat for your tables by running the following commands
To check on Heroku -
sql command to check deleted rows -
SELECT relname, n_dead_tup FROM pg_stat_all_tables WHERE schemaname = 'public';
To clean up these dead rows we need to run Vacuum - in older versions of Postgres you could only run vacuum manually.
Commands to run VACUUM manually -
To run VACUUM on a single table -
With new versions of Postgres database, you can configure Vacuum to run automatically after a certain threshold is reached. The default configuration for AutoVacuum is good enough for small to mid-sized tables. For larger tables on production, autovacuum tends to fall back with the ever-increasing threshold on production and Vacuum never runs automatically.
For example - we found out one of our many large tables in production where auto-vacuum did not run once -
schema | table | last_vacuum | last_autovacuum | rowcount | dead_rowcount | autovacuum_threshold | expect_autovacuum public | transaction_date_fields| 2017-04-29 08:04 | | 18,835,766 | 202,290 | 3,767,203 |
the last_autovacuum field is blank. To understand why the auto-vacuum daemon did not run - we need to check the default configuration - which is
vacuum threshold = autovacuum_vacuum_threshold + autovacuum_vacuum_scale_factor * number of rows
Default values are autovacuum_vacuum_threshold(50) and autovacuum_vacuum_scale_factor(0.2)
So based on the above config the auto vacuum will never kick in as the value for
autovacuum_threshold will keep moving ahead as the table grows. And now if the threshold is somehow reached and if the vacuum runs it will end up doing more harm than help as it will consume more resources and will slow down queries leading to a false action which is to turn off autovacuum which is dangerous.
Therefore it's necessary to identify and set a correct threshold for the tables to ensure autovacuum runs periodically and helps reduce any bloat or performance impact.
To find the vacuum stats across the table we can use the following commands -
On Heroku -
SELECT relname, last_vacuum, last_autovacuum, last_analyze, last_autoanalyze FROM pg_stat_all_tables WHERE schemaname = 'public';
Query to set vacuum config for a table
ALTER TABLE table_name SET (autovacuum_vacuum_scale_factor = 0.0); ALTER TABLE table_name SET (autovacuum_vacuum_threshold = 5000);
Excellent Read on AutoVacuum and its advantages -
Postgres Autovacuum is Not the Enemy
Hope this article help you to achieve better performance for your database application.