enter image description here

Postgres uses a mechanism called MVCC(Multi Version Concurrency Control) to track changes in your database. Due to this reason, some of the rows become “dead”. Dead rows are generated by DELETE and UPDATE operations, as well as transactions that have to be rolled back. Refer this link learn more about MVCC.

These dead rows keep on adding as there are lots of updates and deletes. Periodic clean up of these dead rows is necessary to not only save space but also maintain the performance of your database queries. The space taken by these dead rows is called bloat. You can check the bloat for your tables by running the following commands

To check on Heroku -

heroku pg:bloat

sql command to check deleted rows -

SELECT relname, n_dead_tup  FROM pg_stat_all_tables  
WHERE schemaname = 'public';

To clean up these dead rows we need to run Vacuum - in older versions of Postgres you could only run vacuum manually.

Commands to run VACUUM manually -

VACUUM;

To run VACUUM on a single table -

VACUUM users;

With new versions of Postgres database, you can configure Vacuum to run automatically after a certain threshold is reached. The default configuration for AutoVacuum is good enough for small to mid-sized tables. For larger tables on production, autovacuum tends to fall back with the ever-increasing threshold on production and Vacuum never runs automatically.

For example - we found out one of our many large tables in production where auto-vacuum did not run once -

 schema | table | last_vacuum | last_autovacuum | rowcount | dead_rowcount  | autovacuum_threshold | expect_autovacuum
 public | transaction_date_fields| 2017-04-29 08:04 |                  |  18,835,766 | 202,290 | 3,767,203 | 

the last_autovacuum field is blank. To understand why the auto-vacuum daemon did not run - we need to check the default configuration - which is

vacuum threshold = autovacuum_vacuum_threshold +
    autovacuum_vacuum_scale_factor * number of rows

Default values are autovacuum_vacuum_threshold(50) and autovacuum_vacuum_scale_factor(0.2)

So based on the above config the auto vacuum will never kick in as the value for autovacuum_threshold will keep moving ahead as the table grows. And now if the threshold is somehow reached and if the vacuum runs it will end up doing more harm than help as it will consume more resources and will slow down queries leading to a false action which is to turn off autovacuum which is dangerous.

Therefore it's necessary to identify and set a correct threshold for the tables to ensure autovacuum runs periodically and helps reduce any bloat or performance impact.

To find the vacuum stats across the table we can use the following commands -

On Heroku -

heroku pg:vacuum_stats

SQL -

SELECT relname, last_vacuum, last_autovacuum, last_analyze, last_autoanalyze  FROM pg_stat_all_tables  
WHERE schemaname = 'public';

Query to set vacuum config for a table

ALTER TABLE table_name SET (autovacuum_vacuum_scale_factor = 0.0);

ALTER TABLE table_name SET (autovacuum_vacuum_threshold = 5000);

Excellent Read on AutoVacuum and its advantages -
Postgres Autovacuum is Not the Enemy

Hope this article help you to achieve better performance for your database application.