Tuning Oracle Database
Performance via AWR a
nd code changes
- a case study
Introduction
• In this presentation, I will share my example of Oracle
DB based system performance tuning.
• By applying code changes as well as using some DBA skill
s, a Oracle based web system got huge performance gain.
1) 10 times more concurrent users allowed (could be m
ore)
2) 500+ times reduction of page response time!
Background
• 1. Event: University course choice system – a ‘Black
Friday’ event
• 2. Concurrency: 3,000+ students hit system simultaneous
ly.
• 3. Technical Stack:
mod_plsql forward web request from Apache to Oracle DB. User c
an call procedure to rend web page via mod_sql (Apache plug-in).
• 4. System issues:
Web page response time – extremely high ( hours )
CPU usage – low (around 25%)
Concurrent sessions – high (up to 2,800)
Oracle cloud control manager screen shot
Performance diagnose & tuning methodology
• 1. AWR report
Top 10 Foreground Events by Total Wait Time
==> Wait Event Histogram
==> Segment Statistics
==> SQL ordered by Elapsed Time
SQL ordered by CPU Time
SQL ordered by User I/O Wait Time
SQL ordered by Gets
SQL ordered by Reads
AWR report
SQL ordered by Elapsed Time
Elapsed Time Elapsed Time
Executions %Total %CPU %IO SQL Id SQL Module
(s) per Exec (s)
2,472,551.14 4,648 531.96 80.45 0.26 0.00 cgja42az8smg httpd.worker@uos-
5 app00446-vs (TNS
V1-V3)
cgja42az8smg5 declare rc__ number; simple_list__ owa_util.vc_arr; complex_list__ owa_util.vc_arr; begin
owa.init_cgi_env(:n__, :nm__, :v__); htp.HTBUF_LEN := 63; null; null; simple_list__(1) := 'sys.
%'; simple_list__(2) := 'dbms\_%'; simple_list__(3) := 'utl\_%'; simple_list__(4) := 'owa\_%';
simple_list__(5) := 'owa.%'; simple_list__(6) := 'htp.%'; simple_list__(7) := 'htf.%';
simple_list__(8) := 'wpg_docload.%'; simple_list__(9) := 'ctxsys.%'; simple_list__(10) := 'mdsys.
%'; if ((owa_match.match_pattern(p_string => 'bwkkspgr.showpage' /* */, p_simple_pattern =>
simple_list__ , p_complex_pattern => complex_list__ , p_use_special_chars => false))) then
rc__ := 2; else twbklist.p_main; null; bwkkspgr.showpage(page=>:page); if
(wpg_docload.is_file_download) then rc__ := 1; wpg_docload.get_download_file(:doc_info);
null; null; null; commit; else rc__ := 0; null; null; null; commit; owa.get_page(:data__, :ndata__);
end if; end if; :rc__ := rc__; end;
1st - waits: cache: row cache objects
•1. Find suspicious procedure: bwkkspgr.showpage
Average execution time: 531.96 seconds
%Total Elapsed Time : 80.45%
• 2. Locate problematic code:
1) Function INSTR & SUBSTR applied on CLOB variable LV_PAGEDEF in below loop, they will b
e executed 3000+ times per loop . The only purpose of the loop is to rend one html page!!!
2) 99% of showpage procedure execution time spend in the loop
CAN’T find any document on this CLOB issue in Google or MOS, decide to ma
ke change based on evidence gathered from testing. CLOB is a usual suspect
of performance issue!!!
• 3. Solution – replace CLOB with varchar2 variable when CLOB size is below 32,767
LV_PAGEDEF CLOB;
LV_PAGEDEF_VC VARCHAR2(32767); --Added to solve performance issue
• 3. Result
1) Elapsed time for cgja42az8smg5 changed from 531.96
seconds to 0.95 seconds
2) Improvement ratio: 55 , 900 % !!!
3) Latch: row cache objects waits: changed from 12M to
285 , 99.9% row cache object wait events gone!!!
2nd – Waits: library cache
• 1. AWR report – No PL/SQL coding issue can be found.
• 2. DB looks good, then how about Apache?
Error messages found in Apache mod_plsql logging files
……
<1729057634 ms>StrArrPosBind pos 23 Charset Id : 46
<1729184884 ms>Execute: ORA-06550: line 35, column 3:
PLS-00306: wrong number or types of arguments in call to 'SHOWPAGE'
ORA-06550: line 35, column 3:
PL/SQL: Statement ignored
<1729184884 ms>(wppr.c,638) Execute:declare
……
Execution time: 184884-57634=127.250 s
-- 127 seconds execution time spend per call with PLS-00306 thrown at 1729057634
• 3. Error
1) MOD_PLSQL overhead issue:
……This works for most cases but fails if there is an attempt to
pass a single value for an array parameter or pass multiple values for
a scalar. In such cases, the first attempt to execute the PL/SQL proce
dure fails. mod_plsql issues a Describe call ……
2) If you define an array type parameter for procedure, b
ut only send one value to it, Oracle mod_plsql will trigger
a Describe (name resolve call).
3) What Oracle don’t say is ‘ this process (fails and D
escribe call) will spend 127 seconds per call !!!
• 4. Dirty hard-coding can improve performance – a lot !!!
Hard-coded SHOWPAGE3 designed for Neat and graceful baseline code
one web page UOS_PMDM_OOC_5 with extremely poor performance
rending only, 127s reduction per call TYPE T_VARCHARS IS TABLE OF ST_TEXT
INDEX BY BINARY_INTEGER;
SUBTYPE ST_TEXT IS VARCHAR2(4000);
PROCEDURE SHOWPAGE(
PAGE VARCHAR2
PROCEDURE SHOWPAGE3(
……
PAGE VARCHAR2
,C01 T_VARCHARS
,PFROMPAGE VARCHAR2
,C02 T_VARCHARS
,PSUBMIT VARCHAR2
……
,PFORM VARCHAR2 ,C38 T_VARCHARS
,PQRYCHKSUM VARCHAR2 ,C39 T_VARCHARS
,PDATAITEMS T_QNAMES ,C40 T_VARCHARS
,C01 ST_TEXT ,ml_text_id T_VARCHARS
,C02 ST_TEXT ,ml_text T_VARCHARS
,C03 ST_TEXT )
)
3rd : High water mark – DBA tim
e!!!
• 1. Classical dilemma – read or write ?
A big table GKRPWRK heavily used in procedure for temporarily sto
re web page parameter for each call
– insert/update/select occurred simultaneously in the same table!!
2 million historical records. For each event, 300,000 new rows will
be generated.
2. First reaction – remove historical records, then query & update
, insert . However, I got below high watermark issue:
Total Wait Wait
Event Waits % DB time Wait Class
Time (sec) Avg(ms)
enq: HW - contention 1,637 8012.9 4895 55.7 Configuration
latch: cache buffers chains 4,304 2773.5 644 19.3 Concurrency
resmgr:cpu quantum 7,990 2323.1 291 16.2 Scheduler
DB CPU 567 3.9
buffer busy waits 6,119 384.3 63 2.7 Concurrency
log file sync 350 307.9 880 2.1 Commit
latch: In memory undo latch 160 304.4 1903 2.1 Concurrency
library cache: mutex X 449 26.3 59 .2 Concurrency
latch free 111 17.7 160 .1 Other
direct path sync 61 11 180 .1 User I/O
• 3. Solution:
Change freelist for LOB segment - Rebuilding LOB freepools
(Doc ID 365156.1)
Move table GKRPWRK to an independent manually managed table
space APP_MAN, set freelist value of segment to 96 (maximum al
lowed).
Pro: high watermark wait event gone and performance improved
Cons: Table space will not shrink automatically, keep increa
sing ……
It’s DBA time now: DBA will manually move table GKRPWRK to A
SM table space after course option event end (next weekend).
Final Result
• 1. Before tuning
Page response time (all 5 pages): hours
(from numerous students’ complain collected by service now tea
m)
Concurrent users in queue: 20-40 (using script to limit user access)
CPU usage: 25%
• 2. After tuning – can support much more users , but……
No abnormal wait events, none !!!
Page response time (each page): 1 -2 seconds (Google Analytics)
Concurrent users in queue: 200 (using online queue to manage users)
CPU usage: below 10%