Newsletter

How to copy LOBs

The Issue

Assume we must copy a table that contains a LOB. There can be different reasons for this. For example, if the table is too fragmented or if there is no more space in the current tablespace.

The straightforward approach with Insert / * + append parallel (n) enable_parallel_dml * / simply to copy the data fails, however. It takes a long time at first. You can see that the time is spent inserting and not reading. How you copy the LOBS is crucial. The rest of the time is comparatively negligible.

The Insert .. Select approach has another problem: There are no partial results. Either you succeed or you have to start all over again.

Searching the Internet I found a blog by Marek Läll that deals with the subject of copying LOBs [1].

The core of the matter is that the LOB locator (LOB address) has not yet been determined at the time of the insert. A new LOB must be created, which means you also need a new locator. As a result, the LOB to be copied is first written to the TEMP tablespace. Then the new locator is determined and then the LOB is copied again from the TEMP tablespace to the actual target. Läll now suggests another way: First an empty LOB is written, its locator is then read via SELECT .. FOR UPDATE. The source LOB can be copied there in one go. This saves you from writing and reading the LOB, which is important. As already mentioned, the most important thing for efficiency is how the LOBs are treated.

I implemented the code in PL / SQL and optimized it a bit, rewritten it to BULK and avoided the SELECT .. FOR UPDATE via the returning clause.

The code is very efficient compared to other approaches I tested. In my real application, I can easily get to 1.5 million rows / hour with a parallel 24

Copy Part

To illustrate my code let us assume this simple table:

 
 
CREATE table doc_table( doc_id Number,
document BLOB);
 

The table has a BLOB column called document.

 
DECLARE
 TYPE t_BLOB_tab IS TABLE OF BLOB; 
 v_blob t_BLOB_tab;
 v_blob_length NUMBER;
 CURSOR c1 is
 SELECT /*+ noparallel */ doc_id , document -- 1: noparallel hint
 FROM doc_table 
 WHERE ROWID BETWEEN :start_id AND :end_id; -- 2: Start and End Rowid
 TYPE c1_tab IS TABLE OF c1%rowtype;
 v_c1_tab c1_tab; 
 c_limit PLS_INTEGER := 10000; 
BEGIN 
OPEN c1;
LOOP
 FETCH c1 bulk collect INTO v_c1_tab LIMIT c_limit;
 EXIT WHEN v_c1_tab.COUNT=0;
 FORALL i IN 1 .. v_c1_tab.COUNT
 INSERT INTO doc_table_new (doc_id , document P) -- 3: Conventional Insert
 VALUES (v_c1_tab(i).doc_id, empty_blob())
 RETURNING document BULK COLLECT INTO v_blob; -- 4: Loblocator of new LOB
 FOR i IN 1 .. v_c1_tab.COUNT 
LOOP
 v_blob_length := DBMS_LOB.GETLENGTH(v_c1_tab(i).document);
 IF nvl(v_blob_length,0) > 0 THEN -- 5: DBMS_LOB.COPY will throw an exception
 DBMS_LOB.COPY(v_blob(i), -- for empty LOBS
 v_c1_tab(i).document,
 v_blob_length);
 END IF;
 END LOOP;
 COMMIT; 
 END LOOP;
 COMMIT;
END;
/

Annotations:

1. The anonymous block is parallelized in the next step via DBMS_PARALLEL_EXECUTE. It would be illogical to parallelize again within a parallel process.

2. The start and end id must be used in the central select. They must not be declared, they are set from the DBMS_PARALLEL_EXECUTE framework.

3. A direct path write would be possible here using the APPEND_VALUES hint. I refrained from it so as not to provoke an exclusive table lock. I have not tested whether this actually would be the case. I am so satisfied with the performance of the solution described here that I consider a test to be unnecessary.

4. The return saves a SELECT .. FOR UPDATE.

5. DBMS_LOB.COPY is the fastest way to copy and seems to use a direct path.

 

Parallelize

I could have called a stored function via parallel SQL in order to parallelize the approach. The decision to use DBMS_PARALLEL_EXECUTE was rather by instinct. There are some objective points for PL/SQL parallelization, however. E.g. a stored function would have resulted in many context switches. DBMS_PARALLEL_EXECUTE allows you to stay in PL / SQL. In addition, the code below will also work with Standard Edition.

I also did some tests with parallel SQL and functions, but never got them to scale. I would not exclude that there is a better approach than the one I present here. However, compared to the alternatives I have seen I rather like the approach presented below.

Here is the code for parallelization, I highlighted the anonymous block that we discussed in the previous paragraph.

Screenshot 2023 07 28 at 13 33 26 How to copy LOBs DBConcepts
Screenshot 2023 07 28 at 13 39 22 How to copy LOBs DBConcepts

Annotations:

There are several ways to divide the work. The chunking query is the most flexible. The search conditions here must also be found again in cursor c1.
Ntile (10) means that the result of the query will hopefully be divided into 10 equal sections.

 

References:

Marek Läll, binary/textual data insert into BLOB/CLOB columns – use 50% less database (IO) resources, Marek Läll / Technical blog about Oracle database: binary/textual data insert into BLOB/CLOB columns – use 50% less database (IO) resources (mareklall.blogspot.com)

DBConcepts

Weitere Beiträge

DBConcepts Adventpunsch

Am Dienstag, den 03.12 findet heuer unser allseits beliebter Adventpunsch statt. Dieses Jahr treffen wir uns wieder im Weihnachtsdorf im Alten AKH (Universitätscampus).Wir haben einen eigens für uns

DBConcepts

Newsletter abonnieren

Wir freuen uns, dass wir Ihr Interesse für den Newsletter geweckt haben! Mit dem Versand dieser Zustimmung erhalten Sie regelmäßig alle aktuellen Informationen!

Vielen Dank für Ihr Interesse an unserem Unternehmen. Derzeit suchen wir niemanden für diese Stelle. Aber wir sind immer an talentierten Menschen interessiert und freuen uns von Ihnen zu hören! Schicken Sie uns einfach Ihren Lebenslauf und eine kurze Nachricht und schreiben Sie an welcher Stelle Sie interessiert sind: recruitment@dbconcepts.com. Wir freuen usn von Ihnen zu hören!