In the 2000s, PostgreSQL developers discovered a significant loophole in their relational database management system. The issues related to storage space and transaction speed were primarily due to the inefficient handling of UPDATE and DELETE operations. The UPDATE query was particularly expensive as it duplicated the old row and rewrote the new data, leading to an unbounded database size. Similarly, deleting a row merely marked it as deleted while keeping the actual data intact, which was a problem for data integrity and efficiency.
This issue resembled how modern file systems and data recovery software function, where deleted data remains on the disk but is hidden from the interface. However, in databases, maintaining old data was essential for transactional integrity. To address this, PostgreSQL introduced the VACUUM feature, which cleaned up deleted rows. Initially, this process was manual and cumbersome, leading to the development of the Autovacuum feature.

What is Autovacuum?
Autovacuum is a daemon or background utility process offered by PostgreSQL to users to issue a regular clean-up of redundant data in the database and server. It does not require the user to manually issue the vacuuming and instead, is defined in the postgresql.conf file. In order to access this file, simply direct yourself to the following directory on your terminal and then open the file in a suitable editor.
>> cd C:\Program Files\PostgreSQL\13\data
>> "postgresql.conf"
When implemented on command prompt:

How Autovacuum Works
The autovacuum utility goes about reallocating the deleted block of data (blocks which were marked deleted) to new transactions by first removing the dead/outdated tuples and then informing the queued transactions where any updates or insertions may be placed in the table. This is in stark contrast to the old and former procedure where a transaction would blindly insert a new row of data with the same identifying elements and updated attributes.
Benefits of Autovacuum
The benefits of autovacuuming are quite evident.
- Optimized Storage Space: Autovacuum ensures efficient use of storage space.
- Improved Free-space Map Visibility: The free-space map (FSM) indicates available spaces in tables, aiding in better space management.
- Resource Efficiency: Unlike manual vacuums, autovacuuming is not time and resource-intensive.
- Non-blocking: Autovacuum does not place exclusive locks on tables, unlike full vacuums.
- Prevents Table Bloating: Regular clean-up prevents unnecessary data accumulation, maintaining optimal table sizes.
Monitoring Data Sizes
One way of monitoring data sizes before and after transactions is simply by executing the following lines of code in the Shell after connecting to a specific database:
postgresql=# SELECT pg_size_pretty(pg_relation_size('table_name');
Consider a table storing the accounts of customers at a Toll booth:

The size of this table is then given by:Â

If transactions ensue, this query may be dealt out again to demonstrate the change in size of the table. Disproportionate changes in size would suggest failed autovacuuming (if the size doesn't change even though transactions have no use for the outdated state of rows). Â
Configuring Autovacuum
Since autovacuum is a background utility, it is turned on by default. However, keep in mind that it was developed quite a while back and therefore the parameters set were conservative i.e., parameters were set according to the availability of hardware and version of the software. Modern applications call for reviewing these parameters and adjusting them proportionately. We shall have a quick look at the nodal parameters:
The AUTOVACUUM section in the postgresql.conf file.- autovacuum: It is set to 'on' by default so it may not be declared exclusively in the shell or terminal.
- autovacuum_naptime: This parameter is set to 1min or 60s which indicates the duration between consecutive autovacuum calls or wakeups.
- autovacuum_max_workers: This indicates the number of processes is vacuumed every time the function is woken up after 'naptime'.
- autovacuum_vacuum_scale_factor: usually set to 0.2, it means that autovacuum conducts a clean-up only if 20% of the relation/table has been changed/updated.
- autovacuum_vacuum_threshold: A precautionary measure, this parameter ensures that autovacuuming happens only if a set number of changes are made to the table (50 by default).
- autovacuum_analyze_scale_factor: this is the analyzing utility that creates regular statistics of tables during transactions. If set to 0.1, analysis is done only if 10% of the table observes updates (deletes, updates, inserts, alters, etc.).
- autovacuum_analyze_threshold: similar to autovacuum_vacuum_threshold, although here the action performed is analysis. The analysis is performed only if a minimum of 50 changes has been made by the transactions.
These parameters are modified based on the frequency at which transactions affect the database and how large the database is or is expected to grow. So, if transactions seem to be occurring at faster rates, 'autovacuum_max_workers' may be increased or 'autovacuum_vacuum_scale_factor' may be decreased if transactions aren't expected to demand older data. Additionally, the developer may adjust the analysis parameters to formulate better query techniques.
This brings us to question the idea behind analyzing tables so often.
Which tables require analysis?
Analysis may be performed manually or simply by keeping autovacuum turned on. Analysis provides specific statistical information regarding the database which helps developers enhance efficiency. Essentially, analysis provides the following information:
- Identifying Common Values: A list of the most common values in a specific column of a relation/table. In some cases, this isn't required as the column might be the unique identifier - unique identifiers cannot be expected to repeat in a table.
- Data Distribution Histograms: A histogram of the data distribution. This may include the sizes of data respective to columns or which columns are subject to the highest and lowest updates from transactions.
Now answering the pertinent question — which tables actually require analysis. By the means of autovacuum, most tables are subjected to analysis.
When to Perform Manual Analysis
Nevertheless, in the likely case that the explicit ANALYZE function is issued, it is done due to the following reasons:
- Columns Are Not Frequently Updated: Used when UPDATE activities don't seem to directly affect certain columns. It might so happen that the statistics of certain columns are required which are not changed by ongoing transactions. Hence, the automatic analysis may be insignificant.
- High Update Rates: Analysis may be important to keep a tab on tables in which the rate at which updates occur is relevant.
- Identifying Stable Data Patterns: To understand which aspects of the data are least prone to changes to establish a pattern.
Yet another question that may be posed is - How to shortlist tables on which analysis daemon must be separately issued?Â
There exists a simple rule of thumb: analysis on tables makes sense as long as the minimum or maximum values of the columns are prone to changes. For example, a table showing speeds of vehicles measured by a velocity gun is bound to have its maximum values change. Therefore, the analysis will yield something conclusive.
Similar Reads
Non-linear Components In electrical circuits, Non-linear Components are electronic devices that need an external power source to operate actively. Non-Linear Components are those that are changed with respect to the voltage and current. Elements that do not follow ohm's law are called Non-linear Components. Non-linear Co
11 min read
Spring Boot Tutorial Spring Boot is a Java framework that makes it easier to create and run Java applications. It simplifies the configuration and setup process, allowing developers to focus more on writing code for their applications. This Spring Boot Tutorial is a comprehensive guide that covers both basic and advance
10 min read
Class Diagram | Unified Modeling Language (UML) A UML class diagram is a visual tool that represents the structure of a system by showing its classes, attributes, methods, and the relationships between them. It helps everyone involved in a projectâlike developers and designersâunderstand how the system is organized and how its components interact
12 min read
Backpropagation in Neural Network Back Propagation is also known as "Backward Propagation of Errors" is a method used to train neural network . Its goal is to reduce the difference between the modelâs predicted output and the actual output by adjusting the weights and biases in the network.It works iteratively to adjust weights and
9 min read
3-Phase Inverter An inverter is a fundamental electrical device designed primarily for the conversion of direct current into alternating current . This versatile device , also known as a variable frequency drive , plays a vital role in a wide range of applications , including variable frequency drives and high power
13 min read
Polymorphism in Java Polymorphism in Java is one of the core concepts in object-oriented programming (OOP) that allows objects to behave differently based on their specific class type. The word polymorphism means having many forms, and it comes from the Greek words poly (many) and morph (forms), this means one entity ca
7 min read
CTE in SQL In SQL, a Common Table Expression (CTE) is an essential tool for simplifying complex queries and making them more readable. By defining temporary result sets that can be referenced multiple times, a CTE in SQL allows developers to break down complicated logic into manageable parts. CTEs help with hi
6 min read
What is Vacuum Circuit Breaker? A vacuum circuit breaker is a type of breaker that utilizes a vacuum as the medium to extinguish electrical arcs. Within this circuit breaker, there is a vacuum interrupter that houses the stationary and mobile contacts in a permanently sealed enclosure. When the contacts are separated in a high vac
13 min read
Python Variables In Python, variables are used to store data that can be referenced and manipulated during program execution. A variable is essentially a name that is assigned to a value. Unlike many other programming languages, Python variables do not require explicit declaration of type. The type of the variable i
6 min read
Spring Boot Interview Questions and Answers Spring Boot is a Java-based framework used to develop stand-alone, production-ready applications with minimal configuration. Introduced by Pivotal in 2014, it simplifies the development of Spring applications by offering embedded servers, auto-configuration, and fast startup. Many top companies, inc
15+ min read