</synopsis>
The resulting dump can be restored with <application>psql</>:
<synopsis>
-psql template1 < <replaceable class="parameter">infile</replaceable>
+psql -f <replaceable class="parameter">infile</replaceable> template1
</synopsis>
(Actually, you can specify any existing database name to start from,
but if you are reloading in an empty cluster then <literal>template1</>
following command dumps a database using the custom dump format:
<programlisting>
-pg_dump -Fc <replaceable class="parameter">dbname</replaceable> > <replaceable class="parameter">filename</replaceable>
+pg_dump -Fc <replaceable class="parameter">dbname</replaceable> > <replaceable class="parameter">filename</replaceable>
</programlisting>
A custom-format dump is not a script for <application>psql</>, but
</para>
<para>
- If your database is spread across multiple volumes (for example,
- data files and WAL log on different disks) there may not be any way
- to obtain exactly-simultaneous frozen snapshots of all the volumes.
+ If your database is spread across multiple file systems, there may not
+ be any way to obtain exactly-simultaneous frozen snapshots of all
+ the volumes. For example, if your data files and WAL log are on different
+ disks, or if tablespaces are on different file systems, it might
+ not be possible to use snapshot backup because the snapshots must be
+ simultaneous.
Read your file system documentation very carefully before trusting
to the consistent-snapshot technique in such situations. The safest
approach is to shut down the database server for long enough to
establish all the frozen snapshots.
</para>
+ <para>
+ Another option is to use <application>rsync</> to perform a file
+ system backup. This is done by first running <application>rsync</>
+ while the database server is running, then shutting down the database
+ server just long enough to do a second <application>rsync</>. The
+ second <application>rsync</> will be much quicker than the first,
+ because it has relatively little data to transfer, and the end result
+ will be consistent because the server was down. This method
+ allows a file system backup to be performed with minimal downtime.
+ </para>
+
<para>
Note that a file system backup will not necessarily be
smaller than an SQL dump. On the contrary, it will most likely be
<programlisting>
SELECT pg_stop_backup();
</programlisting>
- If this returns successfully, you're done.
+ This should return successfully.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Once the WAL segment files used during the backup are archived as part
+ of normal database activity, you are done.
</para>
</listitem>
</orderedlist>
<para>
To make use of this backup, you will need to keep around all the WAL
- segment files generated at or after the starting time of the backup.
+ segment files generated during and after the file system backup.
To aid you in doing this, the <function>pg_stop_backup</> function
- creates a <firstterm>backup history file</> that is immediately stored
- into the WAL archive area. This file is named after the first WAL
- segment file that you need to have to make use of the backup. For
- example, if the starting WAL file is <literal>0000000100001234000055CD</>
- the backup history file will be named something like
- <literal>0000000100001234000055CD.007C9330.backup</>. (The second part of
- this file name stands for an exact position within the WAL file, and can
- ordinarily be ignored.) Once you have safely archived the backup dump
- file, you can delete all archived WAL segments with names numerically
- preceding this one. The backup history file is just a small text file.
- It contains the label string you gave to <function>pg_start_backup</>, as
- well as the starting and ending times of the backup. If you used the
- label to identify where the associated dump file is kept, then the
- archived history file is enough to tell you which dump file to restore,
- should you need to do so.
+ creates a <firstterm>backup history file</> that is immediately
+ stored into the WAL archive area. This file is named after the first
+ WAL segment file that you need to have to make use of the backup.
+ For example, if the starting WAL file is
+ <literal>0000000100001234000055CD</> the backup history file will be
+ named something like
+ <literal>0000000100001234000055CD.007C9330.backup</>. (The second
+ number in the file name stands for an exact position within the WAL
+ file, and can ordinarily be ignored.) Once you have safely archived
+ the file system backup and the WAL segment files used during the
+ backup (as specified in the backup history file), all archived WAL
+ segments with names numerically less are no longer needed to recover
+ the file system backup and may be deleted. However, you should
+ consider keeping several backup sets to be absolutely certain that
+ you are can recover your data. Keep in mind that only completed WAL
+ segment files are archived, so there will be delay between running
+ <function>pg_stop_backup</> and the archiving of all WAL segment
+ files needed to make the file system backup consistent.
+ </para>
+ <para>
+ The backup history file is just a small text file. It contains the
+ label string you gave to <function>pg_start_backup</>, as well as
+ the starting and ending times of the backup. If you used the label
+ to identify where the associated dump file is kept, then the
+ archived history file is enough to tell you which dump file to
+ restore, should you need to do so.
</para>
<para>
such index after completing a recovery operation.
</para>
</listitem>
+
+ <listitem>
+ <para>
+ If a <command>CREATE DATABASE</> command is executed while a base
+ backup is being taken, and then the template database that the
+ <command>CREATE DATABASE</> copied is modified while the base backup
+ is still in progress, it is possible that recovery will cause those
+ modifications to be propagated into the created database as well.
+ This is of course undesirable. To avoid this risk, it is best not to
+ modify any template databases while taking a base backup.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ <command>CREATE TABLESPACE</> commands are WAL-logged with the literal
+ absolute path, and will therefore be replayed as tablespace creations
+ with the same absolute path. This might be undesirable if the log is
+ being replayed on a different machine. It can be dangerous even if
+ the log is being replayed on the same machine, but into a new data
+ directory: the replay will still overwrite the contents of the original
+ tablespace. To avoid potential gotchas of this sort, the best practice
+ is to take a new base backup after creating or dropping tablespaces.
+ </para>
+ </listitem>
</itemizedlist>
</para>
since we may need to fix partially-written disk pages. It is not
necessary to store so many page copies for PITR operations, however.
An area for future development is to compress archived WAL data by
- removing unnecessary page copies.
+ removing unnecessary page copies. In the meantime, administrators
+ may wish to reduce the number of page snapshots included in WAL by
+ increasing the checkpoint interval parameters as much as feasible.
</para>
</sect2>
</sect1>
version, start the new server, restore the data. For example:
<programlisting>
-pg_dumpall > backup
+pg_dumpall > backup
pg_ctl stop
mv /usr/local/pgsql /usr/local/pgsql.old
cd ~/postgresql-&version;
gmake install
initdb -D /usr/local/pgsql/data
postmaster -D /usr/local/pgsql/data
-psql template1 < backup
+psql -f backup template1
</programlisting>
See <xref linkend="runtime"> about ways to start and stop the
/* Record the filesystem change in XLOG */
{
xl_dbase_create_rec xlrec;
- XLogRecData rdata[3];
+ XLogRecData rdata[1];
xlrec.db_id = dboid;
+ xlrec.tablespace_id = dsttablespace;
+ xlrec.src_db_id = src_dboid;
+ xlrec.src_tablespace_id = srctablespace;
+
rdata[0].buffer = InvalidBuffer;
rdata[0].data = (char *) &xlrec;
- rdata[0].len = offsetof(xl_dbase_create_rec, src_path);
- rdata[0].next = &(rdata[1]);
-
- rdata[1].buffer = InvalidBuffer;
- rdata[1].data = (char *) srcpath;
- rdata[1].len = strlen(srcpath) + 1;
- rdata[1].next = &(rdata[2]);
-
- rdata[2].buffer = InvalidBuffer;
- rdata[2].data = (char *) dstpath;
- rdata[2].len = strlen(dstpath) + 1;
- rdata[2].next = NULL;
+ rdata[0].len = sizeof(xl_dbase_create_rec);
+ rdata[0].next = NULL;
(void) XLogInsert(RM_DBASE_ID, XLOG_DBASE_CREATE, rdata);
}
/* Close pg_database, but keep exclusive lock till commit */
heap_close(pg_database_rel, NoLock);
+
+ /*
+ * We force a checkpoint before committing. This effectively means
+ * that committed XLOG_DBASE_CREATE operations will never need to be
+ * replayed (at least not in ordinary crash recovery; we still have
+ * to make the XLOG entry for the benefit of PITR operations).
+ * This avoids two nasty scenarios:
+ *
+ * #1: When PITR is off, we don't XLOG the contents of newly created
+ * indexes; therefore the drop-and-recreate-whole-directory behavior
+ * of DBASE_CREATE replay would lose such indexes.
+ *
+ * #2: Since we have to recopy the source database during DBASE_CREATE
+ * replay, we run the risk of copying changes in it that were committed
+ * after the original CREATE DATABASE command but before the system
+ * crash that led to the replay. This is at least unexpected and at
+ * worst could lead to inconsistencies, eg duplicate table names.
+ *
+ * (Both of these were real bugs in releases 8.0 through 8.0.3.)
+ *
+ * In PITR replay, the first of these isn't an issue, and the second
+ * is only a risk if the CREATE DATABASE and subsequent template
+ * database change both occur while a base backup is being taken.
+ * There doesn't seem to be much we can do about that except document
+ * it as a limitation.
+ *
+ * Perhaps if we ever implement CREATE DATABASE in a less cheesy
+ * way, we can avoid this.
+ */
+ RequestCheckpoint(true);
}
aclcheck_error(ACLCHECK_NOT_OWNER, ACL_KIND_DATABASE,
oldname);
- /* must have createdb */
- if (!have_createdb_privilege())
+ /* must have createdb rights */
+ if (!superuser() && !have_createdb_privilege())
ereport(ERROR,
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
errmsg("permission denied to rename database")));
bool isNull;
HeapTuple newtuple;
- /* changing owner's database for someone else: must be superuser */
- /* note that the someone else need not have any permissions */
+ /* must be superuser to change ownership */
if (!superuser())
ereport(ERROR,
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
return gottuple;
}
+/* Check if current user has createdb privileges */
static bool
have_createdb_privilege(void)
{
+ bool result = false;
HeapTuple utup;
- bool retval;
utup = SearchSysCache(SHADOWSYSID,
Int32GetDatum(GetUserId()),
0, 0, 0);
-
- if (!HeapTupleIsValid(utup))
- retval = false;
- else
- retval = ((Form_pg_shadow) GETSTRUCT(utup))->usecreatedb;
-
- ReleaseSysCache(utup);
-
- return retval;
+ if (HeapTupleIsValid(utup))
+ {
+ result = ((Form_pg_shadow) GETSTRUCT(utup))->usecreatedb;
+ ReleaseSysCache(utup);
+ }
+ return result;
}
/*
/* Record the filesystem change in XLOG */
{
xl_dbase_drop_rec xlrec;
- XLogRecData rdata[2];
+ XLogRecData rdata[1];
xlrec.db_id = db_id;
+ xlrec.tablespace_id = dsttablespace;
+
rdata[0].buffer = InvalidBuffer;
rdata[0].data = (char *) &xlrec;
- rdata[0].len = offsetof(xl_dbase_drop_rec, dir_path);
- rdata[0].next = &(rdata[1]);
-
- rdata[1].buffer = InvalidBuffer;
- rdata[1].data = (char *) dstpath;
- rdata[1].len = strlen(dstpath) + 1;
- rdata[1].next = NULL;
+ rdata[0].len = sizeof(xl_dbase_drop_rec);
+ rdata[0].next = NULL;
(void) XLogInsert(RM_DBASE_ID, XLOG_DBASE_DROP, rdata);
}
if (info == XLOG_DBASE_CREATE)
{
xl_dbase_create_rec *xlrec = (xl_dbase_create_rec *) XLogRecGetData(record);
+ char *src_path;
+ char *dst_path;
+ struct stat st;
+
+#ifndef WIN32
+ char buf[2 * MAXPGPATH + 100];
+#endif
+
+ src_path = GetDatabasePath(xlrec->src_db_id, xlrec->src_tablespace_id);
+ dst_path = GetDatabasePath(xlrec->db_id, xlrec->tablespace_id);
+
+ /*
+ * Our theory for replaying a CREATE is to forcibly drop the
+ * target subdirectory if present, then re-copy the source data.
+ * This may be more work than needed, but it is simple to
+ * implement.
+ */
+ if (stat(dst_path, &st) == 0 && S_ISDIR(st.st_mode))
+ {
+ if (!rmtree(dst_path, true))
+ ereport(WARNING,
+ (errmsg("could not remove database directory \"%s\"",
+ dst_path)));
+ }
+
+ /*
+ * Force dirty buffers out to disk, to ensure source database is
+ * up-to-date for the copy. (We really only need to flush buffers for
+ * the source database, but bufmgr.c provides no API for that.)
+ */
+ BufferSync(-1, -1);
+
+#ifndef WIN32
+
+ /*
+ * Copy this subdirectory to the new location
+ *
+ * XXX use of cp really makes this code pretty grotty, particularly
+ * with respect to lack of ability to report errors well. Someday
+ * rewrite to do it for ourselves.
+ */
+
+ /* We might need to use cp -R one day for portability */
+ snprintf(buf, sizeof(buf), "cp -r '%s' '%s'",
+ src_path, dst_path);
+ if (system(buf) != 0)
+ ereport(ERROR,
+ (errmsg("could not initialize database directory"),
+ errdetail("Failing system command was: %s", buf),
+ errhint("Look in the postmaster's stderr log for more information.")));
+#else /* WIN32 */
+ if (copydir(src_path, dst_path) != 0)
+ {
+ /* copydir should already have given details of its troubles */
+ ereport(ERROR,
+ (errmsg("could not initialize database directory")));
+ }
+#endif /* WIN32 */
+ }
+ else if (info == XLOG_DBASE_DROP)
+ {
+ xl_dbase_drop_rec *xlrec = (xl_dbase_drop_rec *) XLogRecGetData(record);
+ char *dst_path;
+
+ dst_path = GetDatabasePath(xlrec->db_id, xlrec->tablespace_id);
+
+ /*
+ * Drop pages for this database that are in the shared buffer
+ * cache
+ */
+ DropBuffers(xlrec->db_id);
+
+ if (!rmtree(dst_path, true))
+ ereport(WARNING,
+ (errmsg("could not remove database directory \"%s\"",
+ dst_path)));
+ }
+ else if (info == XLOG_DBASE_CREATE_OLD)
+ {
+ xl_dbase_create_rec_old *xlrec = (xl_dbase_create_rec_old *) XLogRecGetData(record);
char *dst_path = xlrec->src_path + strlen(xlrec->src_path) + 1;
struct stat st;
}
#endif /* WIN32 */
}
- else if (info == XLOG_DBASE_DROP)
+ else if (info == XLOG_DBASE_DROP_OLD)
{
- xl_dbase_drop_rec *xlrec = (xl_dbase_drop_rec *) XLogRecGetData(record);
+ xl_dbase_drop_rec_old *xlrec = (xl_dbase_drop_rec_old *) XLogRecGetData(record);
/*
* Drop pages for this database that are in the shared buffer
if (info == XLOG_DBASE_CREATE)
{
xl_dbase_create_rec *xlrec = (xl_dbase_create_rec *) rec;
+
+ sprintf(buf + strlen(buf), "create db: copy dir %u/%u to %u/%u",
+ xlrec->src_db_id, xlrec->src_tablespace_id,
+ xlrec->db_id, xlrec->tablespace_id);
+ }
+ else if (info == XLOG_DBASE_DROP)
+ {
+ xl_dbase_drop_rec *xlrec = (xl_dbase_drop_rec *) rec;
+
+ sprintf(buf + strlen(buf), "drop db: dir %u/%u",
+ xlrec->db_id, xlrec->tablespace_id);
+ }
+ else if (info == XLOG_DBASE_CREATE_OLD)
+ {
+ xl_dbase_create_rec_old *xlrec = (xl_dbase_create_rec_old *) rec;
char *dst_path = xlrec->src_path + strlen(xlrec->src_path) + 1;
sprintf(buf + strlen(buf), "create db: %u copy \"%s\" to \"%s\"",
xlrec->db_id, xlrec->src_path, dst_path);
}
- else if (info == XLOG_DBASE_DROP)
+ else if (info == XLOG_DBASE_DROP_OLD)
{
- xl_dbase_drop_rec *xlrec = (xl_dbase_drop_rec *) rec;
+ xl_dbase_drop_rec_old *xlrec = (xl_dbase_drop_rec_old *) rec;
sprintf(buf + strlen(buf), "drop db: %u directory: \"%s\"",
xlrec->db_id, xlrec->dir_path);