Trabajo con Latches

What Are Latches?

Latches are serialization mechanisms that protect areas of Oracle’s shared memory (the

SGA). In simple terms, latches prevent two processes from simultaneously updating — and

possibly corrupting — the same area of the SGA.

Oracle sessions need to update or read from the SGA for almost all database operations. For

instance:

• When a session reads a block from disk, it must modify a free block in the buffer

cache and adjust the buffer cache LRU chain1.

• When a session reads a block from the SGA, it will modify the LRU chain.

• When a new SQL statement is parsed, it will be added to the library cache within the

SGA.

• As modifications are made to blocks, entries are placed in the redo buffer.

• The database writer periodically writes buffers from the cache to disk (and must

update their status from “dirty” to “clean”).

• The redo log writer writes entries from the redo buffer to the redo logs.

Latches prevent any of these operations from colliding and possibly corrupting the SGA.

How Latches Work

Because the duration of operations against memory is very small (typically in the order of

nanoseconds) and the frequency of latch requests very high, the latching mechanism needs to

be very lightweight. On most systems, a single machine instruction called “test and set” is

used to see if the latch is taken (by looking at a specific memory address) and if not, acquire it

(by changing the value in the memory address).

If the latch is already in use, Oracle can assume that it will not be in use for long, so rather

than go into a passive wait (e.g., relinquish the CPU and go to sleep) Oracle will retry the

operation a number of times before giving up and going to passive wait. This algorithm is

called acquiring a spinlock and the number of “spins” before sleeping is controlled by the

Oracle initialization parameter “_spin_count”.

The first time the session fails to acquire the latch by spinning, it will attempt to awaken after

10 milliseconds. Subsequent waits will increase in duration and in extreme circumstances

may exceed one second. In a system suffering from intense contention for latches, these waits

will have a severe impact on response time and throughput.

1 The LRU (Least Recently Used) chain records how often individual blocks have been accessed. If a

block is not accessed it moves towards the LRU end of the list and eventually is flushed from the cache.

Causes of contention for specific latches

The latches that most frequently affect performance are those protecting the buffer cache,

areas of the shared pool and the redo buffer.

• Library cache latches: These latches protect the library cache in which sharable

SQL is stored. In a well defined application there should be little or no contention for

these latches, but in an application that uses literals instead of bind variables (for

instance “WHERE surname=’HARRISON’” rather that “WHERE

surname=:surname,” library cache contention is common.

• Redo copy/redo allocation latches: These latches protect the redo log buffer,

which buffers entries made to the redo log. Recent improvements (from Oracle 7.3

onwards) have reduced the frequency and severity of contention for these latches.

• Shared pool latches: These latches are held when allocations or de-allocations of

memory occur in the shared pool. Prior to Oracle 8.1.7, the most common cause of

shared pool latch contention was an overly large shared pool and/or failure to make

use of the reserved area of the shared pool2.

• Cache buffers chain latches: These latches are held when sessions read or write to

buffers in the buffer cache. In Oracle8i, there are typically a very large number of

these latches each of which protects only a handful of blocks. Contention on these

latches is typically caused by concurrent access to a very “hot” block and the most

common type of such a hot block is an index root or branch block (since any index

based query must access the root block).

Measuring Latch Contention

Ratio-based Techniques

Conventional wisdom in the mid-90s was to focus on the latch “miss” rate to determine the

degree of latch contention.

We see that for each latch, the number of gets (requests for the latch), misses (number of

times the first request fails) and sleeps (number of times a session failed to obtain a latch by

spinning) are recorded. In the past, queries such as the following were often used to determine

latch health:

SQL> select name, gets, misses, misses*100/gets misspct from v$latch where gets>0;

NAME GETS MISSES MISSPCT

---------------------------------------- ----------- ----------- -------

latch wait list 32 1 3.13

process allocation 28 0 .00

session allocation 1,223,068 84 .01

session switching 8,009 0 .00

process group creation 40 0 .00

session idle bit 2,426,940 1 .00

shared java pool 1,188 0 .00

event group latch 28 0 .00

messages 2,128,851 461 .02

enqueues 3,168,279 7 .00

enqueue hash chains 1,747,312 2 .00

channel handle pool latch 40 0 .00

channel operations parent latch 158 3 1.90

message pool operations parent latch 3 0 .00

file number translation table 946 0 .00

mostly latch-free SCN 1 0 .00

cache buffers lru chain 343,827 0 .00

active checkpoint queue latch 174,293 0 .00

checkpoint queue latch 1,459,929 26 .00

cache buffers chains 13,851,179 16,254 .12

cache buffer handles 61 0 .00

This approach was flawed on a number of levels:

• It is actually the number of sleeps that most accurately influences the impact of the

latch contention on response time.

• A high miss rate is expected for certain latches.

• A latch with a high miss rate (or sleep rate) that is not frequently accessed is probably

not impacting performance.

• Even if a latch is experiencing a high sleep rate, we can’t determine the impact on

performance without taking into account waits for other resources. So if sessions are

waiting 90% for IO, 8% for CPU and 2% for latch, expending effort on halving the

latch sleep wait only provides a 1% improvement in response time – probably not

noticeable.

In the above example the “latch wait list” latch has the highest miss rate. However, this is

totally irrelevant since it was only requested 26 times, while the “cache buffer chains” latch

appears to have only a moderate miss rate, but has been requested almost three million times

and — as we shall see — is the latch most affecting performance.

Wait interface-based techniques

A better approach to estimating the impact of latch contention is to consider the relative

amount of time being spent waiting for latches. The following query gives us some indication

of this:

SELECT event, time_waited,

round(time_waited*100/ SUM (time_waited) OVER(),2) wait_pct

FROM (SELECT event, time_waited

FROM v$system_event

WHERE event NOT IN

('Null event',

'client message',

'rdbms ipc reply',

'smon timer',

'rdbms ipc message',

'PX Idle Wait',

'PL/SQL lock timer',

'file open',

'pmon timer',

'WMON goes to sleep',

'virtual circuit status',

'dispatcher timer',

'SQL*Net message from client',

'parallel query dequeue wait',

'pipe get'

) UNION

(SELECT NAME, VALUE

FROM v$sysstat

WHERE NAME LIKE 'CPU used when call started'))

ORDER BY 2 DESC

EVENT TIME_WAITED WAIT_PCT

------------------------------ ----------- ----------

latch free 40144 31.67

CPU used when call started 30341 23.94

control file sequential read 12341 9.74

direct path read 11933 9.41

control file parallel write 6487 5.12

file identify 5666 4.47

log file sync 3492 2.75

log file parallel write 3213 2.53

instance state change 3064 2.42

log file switch completion 3049 2.41

db file sequential read 2290 1.81

Now we can look at the sleeps in v$latch to determine which latches are likely to be

contributing most to this problem:

SQL> select name, gets, sleeps,

sleeps*100/sum(sleeps) over() sleep_pct, sleeps*100/gets sleep_rate

from v$latch where gets>0

order by sleeps desc;

NAME GETS SLEEPS SLEEP_PCT SLEEP_RATE

------------------------------ ----------- ------------ --------- ----------

cache buffers chains 13,863,552 38,071 99.48 .2746

session allocation 1,223,982 110 .29 .0090

checkpoint queue latch 1,461,039 39 .10 .0027

library cache 9,239,751 22 .06 .0002

shared pool 869,652 16 .04 .0018

messages 2,130,515 6 .02 .0003

redo writing 1,330,987 6 .02 .0005

latch wait list 33 0 .00 .0000

session switching 8,014 0 .00 .0000

session idle bit 2,428,851 0 .00 .0000

enqueues 3,171,018 0 .00 .0000

channel handle pool latch 40 0 .00 .0000

message pool operations parent 3 0 .00 .0000

latch

mostly latch-free SCN 1 0 .00 .0000

Now we are in a position to make some reasonable conclusions:

• Latch sleeps contribute to about 30% of database response time (very excessive),

AND

• It’s the cache buffers chains latches that contributes to the vast majority of these

waits.

Note that if we had used the conventional “ratio based” analysis outlined in the previous

section we would have discounted cache buffers chains latches as a problem because the miss

rate was “only” 0.15%.

Tuning the Application to Avoid Latch Contention

There are some things we can do within our application design that can reduce contention for

latches.

Using Bind Variables

As noted earlier, failure to use bind variables within an application is the major cause of

library cache latch contention. All Oracle applications should make use of bind variables

whenever possible.

However, all is not lost if you are unable to modify your application code. From 8.1.6

onwards you can use the “CURSOR_SHARING” parameter to cause Oracle to modify SQL

on the fly to use bind variables. A setting of FORCE causes all literals to be converted to bind

variables. A setting of SIMILAR causes statements to be rewritten only if it would not cause

the statements execution plan today (which can happen if there are histogram statistics

defined on a column referenced in the WHERE clause).

Avoiding Hot Blocks

Cache buffers chains latch contention is one of the most intractable types of latch contention.

There are a couple of things you can do at the application level to reduce the severity of this

type of contention.

Firstly, identify the blocks that are “hot.” Metalink note 163424.1, “How to Identify a Hot

Block Within The Database” describes how to do this. Having identified the identity of the

hot block, you will most likely find that it is an index root or branch block. If this is the case,

there are two application design changes that may help.

1) Consider partitioning the table and using local indexes. This might allow you to

spread the heat amongst multiple indexes (you will probably want to use a hash

partition to ensure an even spread of load amongst the partitions).

2) Consider converting the table to a hash cluster keyed on the columns of the index.

This allows the index to be bypassed completely and may also result in some other

performance improvements.

Buscar este blog

Curiosidades del Oráculo..

Trabajo con Latches

Comentarios

Entradas populares de este blog

Instalar la utilidad STATSPACK. Crear snapshots y generar informe.

Fragmentacion de Tablespaces.

Las STATPACKS