What I love about writing SQL Tuning articles is that I very rarely end up
publishing the findings I set out to achieve. With this one, I set out to
demonstrate the advantages of PARALLEL DML, didn't find what I thought I would,
and ended up testing 8 different techniques to find out how they differed. And
guess what? I still didn't get the results I expected. Hey, at least I learned
something.
As an ETL designer, I hate updates. They are just plain nasty. I spend an
inordinate proportion of design time of an ETL system worrying about the
relative proportion of rows inserted vs updated. I worry about how ETL tools
apply updates (did you know DataStage applys updates singly, but batches inserts
in arrays?), how I might cluster rows together that are subject to updates, and
what I might do if I just get too many updates to handle.
It would be fair to say I obsess about them. A little bit.
The two most common forms of Bulk Updates are:
- Update (almost) every row in the table. This is common when applying data
patches and adding new columns.
- Updating a small proportion of rows in a very large table.
Case 1 is uninteresting. The fastest way to update every row in the table is
to rebuild the table from scratch. All of these methods below will perform
worse.
Case 2 is common in Data Warehouses and overnight batch jobs. We have a table
containing years worth of data, most of which is static; we are updating
selected rows that were recently inserted and are still volatile. This case is
the subject of our test. For the purposes of the test, we will assume that the
target table of the update is arbitrarily large, and we want to avoid things
like full-scans and index rebuilds.
The methods covered include both PL/SQL and SQL approaches. I want to test on
a level playing field and remove special factors that unfairly favour one
method, so there are some rules:
- Accumulating data for the update can be arbitrarily complex. SQL updates can
have joins with grouping and sub-queries and what-not; PL/SQL can have cursor
loops with nested calls to other procedures. I'm not testing the relative merits
of how to accumulate the data, so each test will use pre-preared update data
residing in a Global Temporary Table.
- Some methods - such as MERGE - allow the data source to be joined to the
update target using SQL. Other methods don't have this capability and must use
Primary Key lookups on the update target. To make these methods comparable, the
"joinable" techniques will use a Nested Loops join to most closely mimic the
Primary Key lookup of the other methods. Even though a Hash join may be faster
than Nested Loops for some distributions of data, that is not always the case
and - once again - we're assuming an arbitraily large target table, so a full
scan is not necessarily feasible.
- Having said that we're not comparing factors outside of the UPDATE itself,
some of the methods do have differences unrelated to the UPDATE. I have included
these deliberately because they are reasonably common and have different
performance profiles; I wouldn't want anyone to think that because their
statements were *similar* to those shown here that they have the same
performance profile.
The 8 methods I am benchmarking here are as follows (in rough order of
complexity):
- Explicit Cursor Loop
- Implicit Cursor Loop
- UPDATE with nested SET subquery
- BULK COLLECT / FORALL UPDATE
- Updateable Join View
- MERGE
- Parallel DML MERGE
- Parallel PL/SQL
For all of the tests, the following table structures will be used:
TEST{n} (Update Source) - 100K rows TEST (Update target) - 10M rows
Name Type Name Type
------------------------------ ------------ ------------------------------ ------------
PK NUMBER PK NUMBER
FK NUMBER FK NUMBER
FILL VARCHAR2(40) FILL VARCHAR2(40)
The data has the following characteristics:
- TEST.PK will contain values 0 .. 9999999, but not in that order.
- TEST.PK is poorly clustered. It is generated by reversing the digits in
LPAD(ROWNUM, '0', 7). PK values of 1,2, and 3 are adjacent in the primary key
index but one-million rows apart in the table.
- TEST.FK will contain values 0 .. 99, unevenly distributed to favour lower
numbers.
- For the first round of testing, the column FK will be indexed with a simple
b-tree index.
Not many people code this way, but there are some Pro*C programmers out there
who are used to Explicit Cursor Loops (OPEN, FETCH and CLOSE commands) and
translate these techniques directly to PL/SQL. The UPDATE portion of the code
works in an identical fashion to the Implicit Cursor Loop, so this is not really
a separate "UPDATE" method as such. The interesting thing about this method is
that it performs a context-switch between PL/SQL and SQL for every FETCH; this
is less efficient. I include it here because it allows us to compare the cost of
context-switches to the cost of updates.
DECLARE
CURSOR c1 IS
SELECT *
FROM test6;
rec_cur c1%rowtype;
BEGIN
OPEN c1;
LOOP
FETCH c1 INTO rec_cur;
EXIT WHEN c1%notfound;
UPDATE test
SET fk = rec_cur.fk
, fill = rec_cur.fill
WHERE pk = rec_cur.pk;
END LOOP;
CLOSE C1;
END;
/
This is the simplest PL/SQL method and very common in hand-coded PL/SQL
applications. Update-wise, it looks as though it should perform the same as the
Explicit Cursor Loop. The difference is that the Implicit Cursor internally
performs bulk fetches, which should be faster than the Explicit Cursor because
of the reduced context switches.
BEGIN
FOR rec_cur IN (
SELECT *
FROM test3
) LOOP
UPDATE test
SET fk = rec_cur.fk
, fill = rec_cur.fill
WHERE pk = rec_cur.pk;
END LOOP;
END;
/
This method is pretty common. I generally recommend against it for
high-volume updates because the SET sub-query is nested, meaning it is performed
once for each row updated. To support this method, I needed to create an index
on TEST8.PK.
UPDATE test
SET (fk, fill) = (
SELECT test8.fk, test8.fill
FROM test8
WHERE pk = test.pk
)
WHERE pk IN (
SELECT pk
FROM test8
);
This one is gaining in popularity. Using BULK COLLECT and FORALL statements
is the new de-facto standard for PL/SQL programmers concerned about performance
because it reduces context switching overheads between the PL/SQL and SQL
engines.
The biggest drawback to this method is readability. Since Oracle does not yet
provide support for record collections in FORALL, we need to use scalar
collections, making for long declarations, INTO clauses, and SET clauses.
DECLARE
CURSOR rec_cur IS
SELECT *
FROM test4;
TYPE num_tab_t IS TABLE OF NUMBER(38);
TYPE vc2_tab_t IS TABLE OF VARCHAR2(4000);
pk_tab NUM_TAB_T;
fk_tab NUM_TAB_T;
fill_tab VC2_TAB_T;
BEGIN
OPEN rec_cur;
LOOP
FETCH rec_cur BULK COLLECT INTO pk_tab, fk_tab, fill_tab LIMIT 1000;
EXIT WHEN pk_tab.COUNT() = 0;
FORALL i IN pk_tab.FIRST .. pk_tab.LAST
UPDATE test
SET fk = fk_tab(i)
, fill = fill_tab(i)
WHERE pk = pk_tab(i);
END LOOP;
CLOSE rec_cur;
END;
/
This is really a deprecated pre-9i method; the modern equivalent is the MERGE
statement. This needs a unique index on TEST1.PK in order to enforce key
preservation.
UPDATE (
SELECT /*+ ordered use_nl(old)*/ new.pk as new_pk
, new.fk as new_fk
, new.fill as new_fill
, old.*
FROM test1 new
JOIN test old ON (old.pk = new.pk)
)
SET fk = new_fk
, fill = new_fill
/
The modern equivalent of the Updateable Join View. Gaining in popularity due
to its combination of brevity and performance, it is primarily used to INSERT
and UPDATE in a single statement. We are using the update-only version here.
Note that I have included a FIRST_ROWS hint to force an indexed nested loops
plan. This is to keep the playing field level when comparing to the other
methods, which also perform primary key lookups on the target table. A Hash join
may or may not be faster, that's not the point - I could increase the size of
the target TEST table to 500M rows and Hash would be slower for sure.
MERGE /*+ FIRST_ROWS*/ INTO test
USING test2 new ON (test.pk = new.pk)
WHEN MATCHED THEN UPDATE SET
fk = new.fk
, fill = new.fill
/
Here is the Explain Plan
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------------
| 0 | MERGE STATEMENT | | 130K| 9921K| 258K (1)|
| 1 | MERGE | TEST | | | |
| 2 | VIEW | | | | |
| 3 | NESTED LOOPS | | 130K| 11M| 258K (1)|
| 4 | TABLE ACCESS FULL | TEST2 | 128K| 6032K| 172 (5)|
| 5 | TABLE ACCESS BY INDEX ROWID| TEST | 1 | 48 | 2 (0)|
| 6 | INDEX UNIQUE SCAN | TEST_PK | 1 | | 1 (0)|
-------------------------------------------------------------------------------
Now we're getting clever... This is the MERGE example on steroids. It uses
Oracle's Parallel DML capability to spread the load over multiple slave
threads.
ALTER SESSION ENABLE PARALLEL DML;
MERGE /*+ first_rows parallel(test) parallel(test2) */ INTO test
USING test5 new ON (test.pk = new.pk)
WHEN MATCHED THEN UPDATE SET
fk = new.fk
, fill = new.fill
/
Note the differences in the Explain Plan.
--------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| TQ |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------------
| 0 | MERGE STATEMENT | | 109K| 8325K| 1880 (1)| | | |
| 1 | PX COORDINATOR | | | | | | | |
| 2 | PX SEND QC (RANDOM) | :TQ10002 | 109K| 10M| 1880 (1)| Q1,02 | P->S | QC (RAND) |
| 3 | INDEX MAINTENANCE | TEST | | | | Q1,02 | PCWP | |
| 4 | PX RECEIVE | | 109K| 10M| 1880 (1)| Q1,02 | PCWP | |
| 5 | PX SEND RANGE | :TQ10001 | 109K| 10M| 1880 (1)| Q1,01 | P->P | RANGE |
| 6 | MERGE | TEST | | | | Q1,01 | PCWP | |
| 7 | PX RECEIVE | | 109K| 10M| 1880 (1)| Q1,01 | PCWP | |
| 8 | PX SEND HYBRID (ROWID PKEY) | :TQ10000 | 109K| 10M| 1880 (1)| Q1,00 | P->P | HYBRID (ROW|
| 9 | VIEW | | | | | Q1,00 | PCWP | |
| 10 | NESTED LOOPS | | 109K| 10M| 1880 (1)| Q1,00 | PCWP | |
| 11 | PX BLOCK ITERATOR | | | | | Q1,00 | PCWC | |
| 12 | TABLE ACCESS FULL | TEST2 | 107K| 5062K| 2 (0)| Q1,00 | PCWP | |
| 13 | TABLE ACCESS BY INDEX ROWID| TEST | 1 | 48 | 0 (0)| Q1,00 | PCWP | |
| 14 | INDEX UNIQUE SCAN | TEST_PK | 1 | | 0 (0)| Q1,00 | PCWP | |
--------------------------------------------------------------------------------------------------------------------
This is much easier to do with DataStage than with native PL/SQL. The goal is
to have several separate sessions applying UPDATE statements at once, rather
than using the sometimes restrictive PARALLEL DML alternative. It's a bit of a
kludge, but we can do this in PL/SQL using a Parallel Enable Table Function.
Here's the function:
CREATE OR REPLACE FUNCTION test_parallel_update (
test_cur IN SYS_REFCURSOR
)
RETURN test_num_arr
PARALLEL_ENABLE (PARTITION test_cur BY ANY)
PIPELINED
IS
PRAGMA AUTONOMOUS_TRANSACTION;
test_rec TEST%ROWTYPE;
TYPE num_tab_t IS TABLE OF NUMBER(38);
TYPE vc2_tab_t IS TABLE OF VARCHAR2(4000);
pk_tab NUM_TAB_T;
fk_tab NUM_TAB_T;
fill_tab VC2_TAB_T;
cnt INTEGER := 0;
BEGIN
LOOP
FETCH test_cur BULK COLLECT INTO pk_tab, fk_tab, fill_tab LIMIT 1000;
EXIT WHEN pk_tab.COUNT() = 0;
FORALL i IN pk_tab.FIRST .. pk_tab.LAST
UPDATE test
SET fk = fk_tab(i)
, fill = fill_tab(i)
WHERE pk = pk_tab(i);
cnt := cnt + pk_tab.COUNT;
END LOOP;
CLOSE test_cur;
COMMIT;
PIPE ROW(cnt);
RETURN;
END;
/
Note that it receives its data via a Ref Cursor parameter. This is a feature
of Oracle's parallel-enabled functions; they will apportion the rows of a single
Ref Cursor amongst many parallel slaves, with each slave running over a
different subset of the input data set.
Here is the statement that calls the Parallel Enabled Table Function:
SELECT sum(column_value)
FROM TABLE(test_parallel_update(CURSOR(SELECT * FROM test7)));
Note that we are using a SELECT statement to call a function that performs an
UPDATE. Yeah, I know, it's nasty. You need to make the function an AUTONOMOUS
TRANSACTION to stop it from throwing an error. But just bear with me, it is the
closest PL/SQL equivalent I can make to a third-party ETL Tool such as DataStage
with native parallelism.