Log-File-Sync Wsaits
Log-File-Sync Wsaits
Oracle waits “log file sync” and “log file parallel write”
on Linux
July 2017
Nikolay Kudinov
LGWR Process Overview
The argument nsems can be 0 (a don’t care) when a semaphore set is not being created. Otherwise nsems must be greater than 0 and less than or
equal to the maximum number of semaphores per semaphore set.
RETURN VALUE
If successful, the return value will be the semaphore set identifier (a nonnegative integer), otherwise -1 is returned, with errno indicating
the error.
Deutsche
DeutscheBank
Bank Nikolay Kudinov 4
Deutsche Bank Technology Center PG Day
Example (semget)
#include <sys/sem.h>
#include <stdio.h> $ipcs -s -i 524291
$strace output:
…
semget(0x5d2, 1, IPC_CREAT|0666) = 524291
write(2, " Semaphore 1490 initialized.\n", 29 Semaphore 1490 initialized.) = 29
…
Deutsche
DeutscheBank
Bank Nikolay Kudinov 5
Deutsche Bank Technology Center PG Day
semop, semtimedop
NAME
semop, semtimedop - semaphore operations
SYNOPSIS
int semop(int semid, struct sembuf *sops, unsigned nsops);
int semtimedop(int semid, struct sembuf *sops, unsigned nsops, struct timespec *timeout);
DESCRIPTION
Each semaphore in a semaphore set has the following associated values:
unsigned short semval; /* semaphore value */
unsigned short semzcnt; /* # waiting for zero */
unsigned short semncnt; /* # waiting for increase */
pid_t sempid; /* process that did last op */
semop() performs operations on selected semaphores in the set indicated by semid. Each of the
nsops elements in the array pointed to by sops specifies an operation to be performed on a sin-
gle semaphore. The elements of this structure are of type struct sembuf, containing the follow-
ing members:
unsigned short sem_num; /* semaphore number */ , short sem_op; /* semaphore operation */ , short sem_flg; /* operation flags */
If sem_op is a positive integer, the operation adds this value to the semaphore value (semval)
If sem_op is less than zero, the process must have alter permission on the semaphore set. If semval is greater than or equal to the absolute value
of sem_op, the operation can proceed immediately: the absolute value of sem_op is subtracted from semval
Deutsche
DeutscheBank
Bank Nikolay Kudinov 6
Deutsche Bank Technology Center PG Day
Example (semop)
Deutsche
DeutscheBank
Bank Nikolay Kudinov 7
Deutsche Bank Technology Center PG Day
Example (semop)
set operations[0].sem_op = -1
$ipcs -s -i 524291
Semaphore Array semid=524291 hungs $ ipcs -s -i524291
uid=500 gid=500 cuid=500 cgid=500 ………………..
mode=0666, access_perms=0666 semget(0x5d2, 1, 0666) = 524291 otime = Thu ***********************
nsems = 1 getpid() = 12186 ctime = Thu ***********************
otime = Thu *********************** ……… semnum value ncount zcount pid
ctime = Thu *********************** semop(524291, 0xbfa4c356, 1 0 0 1 0 12181
semnum value ncount zcount pid
0 0 0 0 12181
Deutsche
DeutscheBank
Bank Nikolay Kudinov 8
Deutsche Bank Technology Center PG Day
Example (semtimedop)
void main()
{
int id;
struct sembuf operations[1];
int retval;
id = semget(KEY, 1, 0666);
if(id < 0){
fprintf(stderr, "Cannot find semaphore, exiting.\n");
exit(0);
}
printf("Process id is %d\n", getpid());
operations[0].sem_num = 0;
operations[0].sem_op = -1;
operations[0].sem_flg = 0;
struct timespec timeout = { 10, 0 };
retval = semtimedop(id, operations, 1, &timeout);
}
$strace –T
semget(0x5d2, 1, 0666) = 524291 <0.000009>
getpid() = 12445 <0.000008>
……………………………..
semtimedop(524291, 0xbfb5f2f6, 1, {10, 0}) = -1 EAGAIN (Resource temporarily unavailable) <10.001245>
Deutsche
DeutscheBank
Bank Nikolay Kudinov 9
Deutsche Bank Technology Center PG Day
semctl
NAME
semctl - semaphore control operations
SYNOPSIS
int semctl(int semid, int semnum, int cmd, ...);
DESCRIPTION
semctl() performs the control operation specified by cmd on the semaphore set identified by
semid, or on the semnum-th semaphore of that set. (The semaphores in a set are numbered start-
ing at 0.) This function has three or four arguments, depending on cmd. When there are four, the fourth
has the type union semun. The calling program must define this union as follows:
union semun {
int val; /* Value for SETVAL */
struct semid_ds *buf; /* Buffer for IPC_STAT, IPC_SET */
unsigned short *array; /* Array for GETALL, SETALL */
struct seminfo *__buf; /* Buffer for IPC_INFO (Linux specific) */
};
Valid values for cmd are
……………
SETVAL Set the value of semval to arg.val for the semnum-th semaphore of the set, updating also the sem_ctime member of the semid_ds structure
associated with the set. Undo entries are cleared for altered semaphores in all processes. If the changes to semaphore values would permit
blocked semop() calls in other processes to proceed, then those processes are woken up. The calling process must have alter permission
on the semaphore set.
Deutsche
DeutscheBank
Bank Nikolay Kudinov 10
Deutsche Bank Technology Center PG Day
Example (semctl)
#define KEY (1490) ipcs -s -i 524291
int main() {
int id; Semaphore Array semid=524291
int ret; uid=500 gid=500 cuid=500 cgid=500
union semun { int val;struct semid_ds *buf;ushort * array;} argument; mode=0666, access_perms=0666
argument.val = 10; nsems = 1
id = semget(KEY, 1, 0666 | IPC_CREAT); otime = Thu ************************
if(id < 0) ctime = Thu ************************
{ semnum value ncount zcount pid
fprintf(stderr, "Unable to obtain semaphore.\n"); 0 10 0 0 13453
exit(0);
}
ret =semctl(id, 0, SETVAL, argument);
}
$strace output:
…
semget(0x5d2, 1, IPC_CREAT|0666) = 524291
semctl(524291, 0, IPC_64|SETVAL, 0xbfc0e8d8) = 0
…
Deutsche
DeutscheBank
Bank Nikolay Kudinov 11
Deutsche Bank Technology Center PG Day
ipc
NAME
ipc - System V IPC system calls
SYNOPSIS
int ipc(unsigned int call, int first, int second, int third, void *ptr, long fifth);
DESCRIPTION
ipc() is a common kernel entry point for the System V IPC calls for
messages, semaphores, and
shared memory. call determines which IPC function to invoke; the other
arguments are passed
through to the appropriate call.
User programs should call the appropriate functions by their usual names.
Only standard library
implementors and kernel hackers need to know about ipc().
Deutsche
DeutscheBank
Bank Nikolay Kudinov 12
Deutsche Bank Technology Center PG Day
ipc source code
/*
* sys_ipc() is the de-multiplexer for the SysV IPC calls..
*
* This is really horribly ugly.
*/
asmlinkage int sys_ipc (uint call, int first, int second, int third, void __user *ptr, long fifth)
{
int version, ret;
version = call >> 16; /* hack for backward compatibility */
call &= 0xffff;
switch (call) {
case SEMOP:
return sys_semtimedop (first, (struct sembuf __user *)ptr, second, NULL);
case SEMTIMEDOP:
return sys_semtimedop(first, (struct sembuf __user *)ptr, second,
(const struct timespec __user *)fifth);
case SEMGET:
return sys_semget (first, second, third);
case SEMCTL: {
union semun fourth;
if (!ptr)
return -EINVAL;
if (get_user(fourth.__pad, (void __user * __user *) ptr))
return -EFAULT;
return sys_semctl (first, second, third, fourth);
Deutsche
DeutscheBank
Bank Nikolay Kudinov 13
Deutsche Bank Technology Center PG Day
Linux I/O (io_submit)
NAME
io_submit - Submit asynchronous I/O blocks for processing
SYNOPSIS
#include <libaio.h>
DESCRIPTION
io_submit() queues nr I/O request blocks for processing in the AIO context ctx_id. iocbpp should
be an array of nr AIO request blocks, which will be submitted to context ctx_id.
RETURN VALUE
io_submit() returns the number of iocbs submitted and 0 if nr is zero.
Deutsche
DeutscheBank
Bank Nikolay Kudinov 14
Deutsche Bank Technology Center PG Day
Linux I/O (io_getevents)
NAME
io_getevents - Read asynchronous I/O events from the completion queue
SYNOPSIS
#include <linux/time.h>
#include <libaio.h>
long io_getevents (aio_context_t ctx_id, long min_nr, long nr, struct io_event *events,
struct timespec *timeout);
DESCRIPTION
io_getevents() attempts to read at least min_nr events and up to nr events from the completion
queue of the AIO context specified by ctx_id. timeout specifies the amount of time to wait for
events, where a NULL timeout waits until at least min_nr events have been seen. Note that time-
out is relative and will be updated if not NULL and the operation blocks.
RETURN VALUE
io_getevents() returns the number of events read: 0 if no events are available or < min_nr if
the timeout has elapsed.
Deutsche
DeutscheBank
Bank Nikolay Kudinov 15
Deutsche Bank Technology Center PG Day
Example
char log_buffer1[2097152];
char log_buffer2[2097152];
…
cb1[0].aio_fildes = fd;
cb1[0].aio_lio_opcode = IO_CMD_PWRITE;
cb1[0].aio_reqprio = 0;
cb1[0].u.c.buf = log_buffer1;
cb1[0].u.c.nbytes = 2097152;
cb1[0].u.c.offset = 0;
cb1[1].aio_fildes = fd;
cb1[1].aio_lio_opcode = IO_CMD_PWRITE;
cb1[1].aio_reqprio = 0;
cb1[1].u.c.buf = log_buffer2;
cb1[1].u.c.nbytes = 2097152;
cb1[1].u.c.offset = 2097152;
iocbs1[0] = &cb1[0];
iocbs1[1] = &cb1[1];
printf("log_buffer1=%p \n",&log_buffer1[0]);
printf("log_buffer2=%p \n",&log_buffer2[0]);
Systemtap output
……………..
kernel.function("sys_io_submit@fs/aio.c:1837").call(+)a.out ctx_id=0x7ffcf1037000 nr=2
addr=0x7fff103b06a0 addr2=140733465691808
fd=3 file=test_file.txt op_code=1 offset= 0 bytes=2097152
addr=0x7fff101b06a0 addr2=140733463594656
fd=3 file=test_file.txt op_code=1 offset= 2097152 bytes=2097152
Deutsche
DeutscheBank
Bank Nikolay Kudinov 17
Deutsche Bank Technology Center PG Day
Kernel Parameter "aio-max-size" does not exist in RHEL4 / EL4 / RHEL5 /EL5 (Doc ID
549075.1)
With the introduction of Asynchronous I/O in RHEL2.1 / 3, (2.4 kernels) a new kernel parameter
aio-max-size was added. It can be adjusted to get better I/O throughout in some environments,
such as data store/warehouse etc. However this parameter is removed in the Linux 2.6 kernels.
Deutsche
DeutscheBank
Bank Nikolay Kudinov 18
Deutsche Bank Technology Center PG Day
kskthbwt, kskthewt
Kskthbwt
($rdx – wait event number)
function
STATE = WAITING
kskthewt
($rdx – wait event number)
Deutsche
DeutscheBank
Bank Nikolay Kudinov 19
Deutsche Bank Technology Center PG Day
kslgetl
Deutsche
DeutscheBank
Bank Nikolay Kudinov 20
Deutsche Bank Technology Center PG Day
kslgetl example
USER process
……………….
……………….
Deutsche
DeutscheBank
Bank Nikolay Kudinov 22
Deutsche Bank Technology Center PG Day
kcrfw_gather_lwn
(gdb) c
Continuing.
Hardware access (read/write) watchpoint 306: *0x60027c38
Value = 17225
0x0000000002d3fa26 in kcrfw_gather_lwn ()
Deutsche
DeutscheBank
Bank Nikolay Kudinov 23
Deutsche Bank Technology Center PG Day
Tools
(gdb) b io_submit
Breakpoint 1 at 0xaef560
(gdb) commands
Type commands for when breakpoint 1 is hit, one per line.
End with a line saying just "end".
>shell sleep 10
>c
>end
(gdb) c
Continuing.
“_use_single_log_writer”='TRUE‘
"_in_memory_undo"=FALSE
filesystemio_options=SETALL
disk_asynch_io=TRUE
kskthbwt kslwt_update_stats_int
Kslgetl
IPC
Latch name=messages
Kslgetl
Latch name= lgwr LWN SCN
Kslgetl
SEMTIMEDOP Latch name=redo allocation
SEMOP
(3 sec)
kskthewt
Deutsche Bank Nikolay Kudinov 33
Deutsche Bank Technology Center PG Day
LGWR (idle) diagram
"_log_parallelism_max"=2
kskthbwt kslwt_update_stats_int
Kslgetl
IPC
Latch name=messages
Kslgetl
Latch name= lgwr LWN SCN
Kslgetl
Latch name=redo allocation
SEMTIMEDOP Kslgetl
SEMOP
(3 sec) Latch name=redo allocation
kskthewt
Deutsche Bank Nikolay Kudinov 34
Deutsche Bank Technology Center PG Day
LGWR + 324 bytes of redo
_in_memory_undo=FALSE
2*(1048576+1048576+1028608)/512 = 12210
Deutsche
DeutscheBank
Bank Nikolay Kudinov 39
Deutsche Bank Technology Center PG Day
Warning: log write elapsed time
Warning: log write elapsed time 10022ms, size 2KB (gdb) b io_submit
Breakpoint 1 at 0x7f6511c1b690
*** 2016-01-26 04:49:13.798
(gdb) command
Warning: log write elapsed time 10018ms, size 5KB
Type commands for breakpoint(s) 1, one per
*** 2016-01-26 04:49:38.834 line.
Warning: log write elapsed time 10027ms, size 2KB End with a line saying just "end".
>shell sleep 10
*** 2016-01-26 04:49:48.867 >c
Warning: log write elapsed time 10032ms, size 0KB
>end
*** 2016-01-26 04:49:58.906 (gdb) c
Warning: log write elapsed time 10037ms, size 10KB
Kslgetl
(redo copy )
semop semop
Kslgetl
(redo allocation )
Klsgetl Kskthewt
(post/wait queue) (log file sync)
Deutsche Bank Nikolay Kudinov 44
Deutsche Bank Technology Center PG Day
User process+ COMMIT
_use_adaptive_log_file_sync =‘POLLING_ONLY’
Kslgetl
(redo copy )
Kslgetl
(redo allocation )
Kskthewt
(log file sync)
Deutsche Bank Nikolay Kudinov 46
Deutsche Bank Technology Center PG Day
Log file sync switching
P1 = buffer#
P2 = Not used / sync scn
P3 = Not used
buffer#
All changes up to this buffer number (in the log buffer) must be
flushed to disk and the writes confirmed to ensure that the
transaction is committed , and will remain committed upon an
instance crash. Hence the wait is for LGWR to flush up to this
buffer#.
sync scn (11.2 onwards)
Base of the SCN value required to be synced to disk. Hence the wait
is for LGWR to flush up to this SCN base.
Deutsche
DeutscheBank
Bank Nikolay Kudinov 48
Deutsche Bank Technology Center PG Day
oradebug ipc
Deutsche
DeutscheBank
Bank Nikolay Kudinov 50
Deutsche Bank Technology Center PG Day
kernel.function("sys_io_submit@fs/aio.c:1837").call(+)ora_lgwr_cdb1 ctx_id=0x7fd8ba183000 nr=1
addr=0x91cb3400 addr2=2446013440
fd=258 file=redo01.log op_code=1 offset= 8151040 bytes=1024
Deutsche
DeutscheBank
Bank Nikolay Kudinov 51
Deutsche Bank Technology Center PG Day
LOG BUFFER memory DUMP
Deutsche
DeutscheBank
Bank Nikolay Kudinov 52
Deutsche Bank Technology Center PG Day
NO “log file sync” in some cases on commit
user commits
it’s increased when session issues commit (even if LGWR is suspended)
http://blog.tanelpoder.com/
https://fritshoogland.wordpress.com
https://andreynikolaev.wordpress.com
https://jonathanlewis.wordpress.com
https://www.google.ch/patents/US5974425
http://savvinov.com
https://dmitryremizov.wordpress.com
https://orainternals.wordpress.com/2012/02/10/what-is-rdbms-ipc-message-wait-event/
http://www.minek.com/files/unix_examples/semab.html
http://www.cs.cf.ac.uk/Dave/C/node26.html
http://nofxsss2007.narod.ru/spo/47.htm
http://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64/
Deutsche Bank Nikolay Kudinov 61
Deutsche Bank Technology Center PG Day
This is not an offer to provide any services. This material is for information and illustrative
purposes only and is not intended, nor should it be distributed, for advertizing purposes, nor
is it intended for publication or broadcast. Any third party analysis does not constitute any
endorsement or recommendation. Opinions expressed herein are current opinions as of the
date appearing in this material only and are subject to change without notice This information
is provided with the understanding that with respect to the material provided herein, that you
will make your own independent decision with respect to any course of action in connection
herewith and as to whether such course of action is appropriate or proper based on your own
judgment, and that you are capable of understanding and assessing the merits of a course of
action. “Deutsche Bank TechCentre” LLC shall not have any liability for any damages of any
kind whatsoever relating to this material.
Deutsche
DeutscheBank
Bank Nikolay Kudinov 62
Deutsche Bank Technology Center PG Day