PostgreSQL - Autovacuum

Last Updated : 05 Aug, 2024

In the 2000s, PostgreSQL developers discovered a significant loophole in their relational database management system. The issues related to storage space and transaction speed were primarily due to the inefficient handling of UPDATE and DELETE operations. The UPDATE query was particularly expensive as it duplicated the old row and rewrote the new data, leading to an unbounded database size. Similarly, deleting a row merely marked it as deleted while keeping the actual data intact, which was a problem for data integrity and efficiency.

This issue resembled how modern file systems and data recovery software function, where deleted data remains on the disk but is hidden from the interface. However, in databases, maintaining old data was essential for transactional integrity. To address this, PostgreSQL introduced the VACUUM feature, which cleaned up deleted rows. Initially, this process was manual and cumbersome, leading to the development of the Autovacuum feature.

What is Autovacuum?

Autovacuum is a daemon or background utility process offered by PostgreSQL to users to issue a regular clean-up of redundant data in the database and server. It does not require the user to manually issue the vacuuming and instead, is defined in the postgresql.conf file. In order to access this file, simply direct yourself to the following directory on your terminal and then open the file in a suitable editor.

>> cd C:\Program Files\PostgreSQL\13\data
>> "postgresql.conf"

When implemented on command prompt:

How Autovacuum Works

The autovacuum utility goes about reallocating the deleted block of data (blocks which were marked deleted) to new transactions by first removing the dead/outdated tuples and then informing the queued transactions where any updates or insertions may be placed in the table. This is in stark contrast to the old and former procedure where a transaction would blindly insert a new row of data with the same identifying elements and updated attributes.

Benefits of Autovacuum

The benefits of autovacuuming are quite evident.

Optimized Storage Space: Autovacuum ensures efficient use of storage space.
Improved Free-space Map Visibility: The free-space map (FSM) indicates available spaces in tables, aiding in better space management.
Resource Efficiency: Unlike manual vacuums, autovacuuming is not time and resource-intensive.
Non-blocking: Autovacuum does not place exclusive locks on tables, unlike full vacuums.
Prevents Table Bloating: Regular clean-up prevents unnecessary data accumulation, maintaining optimal table sizes.

Monitoring Data Sizes

One way of monitoring data sizes before and after transactions is simply by executing the following lines of code in the Shell after connecting to a specific database:

postgresql=# SELECT pg_size_pretty(pg_relation_size('table_name');

Consider a table storing the accounts of customers at a Toll booth:

The size of this table is then given by:

If transactions ensue, this query may be dealt out again to demonstrate the change in size of the table. Disproportionate changes in size would suggest failed autovacuuming (if the size doesn't change even though transactions have no use for the outdated state of rows).

Configuring Autovacuum

Since autovacuum is a background utility, it is turned on by default. However, keep in mind that it was developed quite a while back and therefore the parameters set were conservative i.e., parameters were set according to the availability of hardware and version of the software. Modern applications call for reviewing these parameters and adjusting them proportionately. We shall have a quick look at the nodal parameters:

autovacuum: It is set to 'on' by default so it may not be declared exclusively in the shell or terminal.
autovacuum_naptime: This parameter is set to 1min or 60s which indicates the duration between consecutive autovacuum calls or wakeups.
autovacuum_max_workers: This indicates the number of processes is vacuumed every time the function is woken up after 'naptime'.
autovacuum_vacuum_scale_factor: usually set to 0.2, it means that autovacuum conducts a clean-up only if 20% of the relation/table has been changed/updated.
autovacuum_vacuum_threshold: A precautionary measure, this parameter ensures that autovacuuming happens only if a set number of changes are made to the table (50 by default).
autovacuum_analyze_scale_factor: this is the analyzing utility that creates regular statistics of tables during transactions. If set to 0.1, analysis is done only if 10% of the table observes updates (deletes, updates, inserts, alters, etc.).
autovacuum_analyze_threshold: similar to autovacuum_vacuum_threshold, although here the action performed is analysis. The analysis is performed only if a minimum of 50 changes has been made by the transactions.

These parameters are modified based on the frequency at which transactions affect the database and how large the database is or is expected to grow. So, if transactions seem to be occurring at faster rates, 'autovacuum_max_workers' may be increased or 'autovacuum_vacuum_scale_factor' may be decreased if transactions aren't expected to demand older data. Additionally, the developer may adjust the analysis parameters to formulate better query techniques.

This brings us to question the idea behind analyzing tables so often.

Which tables require analysis?

Analysis may be performed manually or simply by keeping autovacuum turned on. Analysis provides specific statistical information regarding the database which helps developers enhance efficiency. Essentially, analysis provides the following information:

Identifying Common Values: A list of the most common values in a specific column of a relation/table. In some cases, this isn't required as the column might be the unique identifier - unique identifiers cannot be expected to repeat in a table.
Data Distribution Histograms: A histogram of the data distribution. This may include the sizes of data respective to columns or which columns are subject to the highest and lowest updates from transactions.

Now answering the pertinent question — which tables actually require analysis. By the means of autovacuum, most tables are subjected to analysis.

When to Perform Manual Analysis

Nevertheless, in the likely case that the explicit ANALYZE function is issued, it is done due to the following reasons:

Columns Are Not Frequently Updated: Used when UPDATE activities don't seem to directly affect certain columns. It might so happen that the statistics of certain columns are required which are not changed by ongoing transactions. Hence, the automatic analysis may be insignificant.
High Update Rates: Analysis may be important to keep a tab on tables in which the rate at which updates occur is relevant.
Identifying Stable Data Patterns: To understand which aspects of the data are least prone to changes to establish a pattern.

Yet another question that may be posed is - How to shortlist tables on which analysis daemon must be separately issued?

There exists a simple rule of thumb: analysis on tables makes sense as long as the minimum or maximum values of the columns are prone to changes. For example, a table showing speeds of vehicles measured by a velocity gun is bound to have its maximum values change. Therefore, the analysis will yield something conclusive.

PostgreSQL - Autovacuum

siddhant_baroth

Improve

Article Tags :

PostgreSQL - Autovacuum

What is Autovacuum?

How Autovacuum Works

Benefits of Autovacuum

Monitoring Data Sizes

Configuring Autovacuum

Which tables require analysis?

When to Perform Manual Analysis

Similar Reads

Thank You!

What kind of Experience do you want to share?