Testing the NO SAVEPOINT feature in InterBase 7.5.1

Vlad Horsun (hvlad@users.souceforge.net), Alexey Kovyazin (ak@ib-aid.com)

In issue 1 we published Dmitri Yemanov’s article about the internals of savepoints. While that article was still on the desk, Borland announced the release of InterBase 7.5.1, introducing, amongst other things, a NO SAVEPOINT option for transaction management. Is this an important improvement for InterBase? We decided to give this implementation a close look and test it some, to discover what it is all about.

Testing NO SAVEPOINT

In order to analyze the problem that the new transaction option was intended to address, and to assess its real value, we performed several very simple SQL tests. The tests are all 100% reproducible, so you will be able to verify our results easily.

Database for testing

The test database file was created in InterBase 7.5.1, page size = 4096, character encoding is NONE. It contains two tables, one stored procedure and three generators.

For the test we will use only one table, with the following structure:

CREATE TABLE TEST (
    ID           NUMERIC(18,2),
    NAME         VARCHAR(120),
    DESCRIPTION  VARCHAR(250),
    CNT          INTEGER,
    QRT          DOUBLE PRECISION,
    TS_CHANGE    TIMESTAMP,
    TS_CREATE    TIMESTAMP,
    NOTES        BLOB
);

This table contains 100,000 records, which will be updated during the test. The stored procedure and generators are used to fill the table with test data. You can increase the quantity of records in the test table by calling the stored procedure to insert them:

SELECT * FROM INSERTRECS(1000);

The second table, TEST2DROP, has the same structure as the first and is filled with the same records as TEST

INSERT INTO TEST2DROP SELECT FROM TEST;

As you will see, the second table will be dropped immediately after connect. We are just using it as a way to increase database size cheaply: the pages occupied by the TEST2DROP table will be released for reuse after we drop the table. With this trick we avoid the impact of database file growth on the test results.

Setting test environment

All that is needed to perform this test is the trial installation package of InterBase 7.5.1, the test database and an SQL script.

Download the InterBase 7.5.1 trial version from www.borland.com. The installation process is obvious and well-described in the InterBase documentation.

You can download a backup of the test database ready to use from http://www.ibdeveloper.com/issue2/testspbackup.zip (~4 Mb) or, alternatively, an SQL script for creating it from http://www.ibdeveloper.com/issue2/testspdatabasescript.zip (~1 Kb).

If you download the database backup, the test tables are already populated with records and you can proceed straight to the section “Preparing to test”, below.

If you choose instead to use the SQL script, you will create database yourself. Make sure you insert 100,000 records into table TEST using the INSERTRECS stored procedure and then copy all of them to TEST2DROP three or four times. After that, perform a backup of this database and you will be on the same position as if you had downloaded “ready for use” backup.

Hardware is not a material issue for these tests, since we are only comparing performance with and without the NO SAVEPOINT option. Our test platform was a modest computer with Pentium-4, 2 GHz, with 512 RAM and an 80GB Samsung HDD.

Preparing to test

A separate copy of the test database is used for each test case, in order to eliminate any interference between statements. We create four fresh copies of the database for this purpose. Supposing all files are in a directory called C:\TEST, simply create the four test databases from your test backup file:

gbak –c –user SYSDBA –pass masterkey C:\TEST\testspbackup.gbk C:\TEST\testsp1.ib
gbak –c –user SYSDBA –pass masterkey C:\TEST\testspbackup.gbk C:\TEST\testsp2.ib
gbak –c –user SYSDBA –pass masterkey C:\TEST\testspbackup.gbk C:\TEST\testsp3.ib
gbak –c –user SYSDBA –pass masterkey C:\TEST\testspbackup.gbk C:\TEST\testsp4.ib

SQL test scripts

The first script tests a regular, one-pass update without the NO SAVEPOINT option. For convenience, the important commands are clarified with comments:

connect \"C:\TEST\testsp1.ib\" USER \"SYSDBA\" Password \"masterkey\";  //Connect 
drop table TEST2DROP;       // Drop table TEST2DROP to free database pages
commit;					
select count(*) from test;	  // Walk down all records in TEST to place them into cache
commit;					
set time on;			  //enable time statistics for performed statements		
set stat on;			 // enables writes/fetches/memory statistics
commit;			 
// perform bulk update of all records in TEST table 	
update TEST set ID = ID+1, QRT = QRT+1, NAME=NAME||'1', ts_change = CURRENT_TIMESTAMP;
commit;
quit;  

The second script tests performance for the same UPDATE with the NO SAVEPOINT option:

connect \"C:\testsp2.ib\" USER \"SYSDBA\" Password \"masterkey\";
drop table TEST2DROP; 
commit;
select count(*) from test;
commit;
set time on;
set stat on;
commit;
SET TRANSACTION NO SAVEPOINT;	// enable NO SAVEPOINT
update TEST set ID = ID+1, QRT = QRT+1, NAME=NAME||'1', ts_change = CURRENT_TIMESTAMP;
commit;
quit;

Except for the inclusion of the SET TRANSACTION NO SAVEPOINT statement in the second script, both scripts are the same, simply testing the behavior of engine in case of the single bulk UPDATE.

To test sequential UPDATEs, we added several UPDATE statements–we recommend using five. The script for testing without NO SAVEPOINT would be:

connect \"E:\testsp3.ib\" USER \"SYSDBA\" Password \"masterkey\";
drop table TEST2DROP; 
commit;
select count(*) from test;
commit;
set time on;
set stat on;
commit;
update TEST set ID = ID+1, QRT = QRT+1, NAME=NAME||'1', ts_change = CURRENT_TIMESTAMP;
update TEST set ID = ID+1, QRT = QRT+1, NAME=NAME||'1', ts_change = CURRENT_TIMESTAMP;
update TEST set ID = ID+1, QRT = QRT+1, NAME=NAME||'1', ts_change = CURRENT_TIMESTAMP;
update TEST set ID = ID+1, QRT = QRT+1, NAME=NAME||'1', ts_change = CURRENT_TIMESTAMP;
update TEST set ID = ID+1, QRT = QRT+1, NAME=NAME||'1', ts_change = CURRENT_TIMESTAMP;
commit;
quit;

You can download all the scripts and the raw results of their execution from this location: http://www.ibdeveloper.com/issue2/testresults.zip

How to perform the test

The easiest way to perform the test is to use isql’s INPUT command. Suppose you have the scripts located in c:\test\scripts:

>isql
Use CONNECT or CREATE DATABASE to specify a database
SQL>input c:\test\scripts\script01.sql;

Test results

The single bulk UPDATE

First, let’s perform the test where the single-pass bulk UPDATE is performed. This is an excerpt from the one-pass script with default transaction settings.

SQL> update TEST set ID = ID+1, QRT = QRT+1, NAME=NAME||'1', ts_change = CURRENT
_TIMESTAMP;
Current memory = 10509940
Delta memory = 214016
Max memory = 10509940
Elapsed time= 4.63 sec
Buffers = 2048
Reads = 2111
Writes 375
Fetches = 974980
	
SQL> commit;
Current memory = 10340752
Delta memory = -169188
Max memory = 10509940
Elapsed time= 0.03 sec
Buffers = 2048
Reads = 1
Writes 942
Fetches = 3

This is an excerpt from the one-pass script with NO SAVEPOINT enabled

SQL> SET TRANSACTION NO SAVEPOINT;
SQL> update TEST set ID = ID+1, QRT = QRT+1, NAME=NAME||'1', ts_change = CURRENT
_TIMESTAMP;
Current memory = 10352244
Delta memory = 55296
Max memory = 10352244
Elapsed time= 4.72 sec
Buffers = 2048
Reads = 2102
Writes 350
Fetches = 967154
	
SQL> commit;
Current memory = 10344052
Delta memory = -8192
Max memory = 10352244
Elapsed time= 0.15 sec
Buffers = 2048
Reads = 1
Writes 938
Fetches = 3

Performance appears to be almost the same, whether the NO SAVEPOINT option is enabled or not.

Sequential bulk UPDATE statements

With the mutlti-pass script (sequential UPDATEs) the raw test results are rather large. The full results are here:http://www.ibdeveloper.com/issue2/testresults.zipq1). For convenience, the results are tabulated below

IB 7.5.1 update 100k with default setting 5 times
Max mem, Kb Time, ms Writes Reads Delta mem, bytes
1 10271.926 4670 375 2111 214016
2 60427.926 478560 5941 2239 50305100
3 60427.926 159950 11831 2299 0
4 60427.926 156190 14688 2311 0
5 60427.926 160660 19317 2341 0
IB 7.5.1 update 100k 5 times with NO SAVEPOINT
Max mem, Kb Time, ms Writes Reads Delta mem, bytes
1 10130.621 4950 354 2099 55296
2 10134.621 7210 5513 2220 0
3 10134.621 6230 11240 2298 0
4 10134.621 6380 14689 2312 0
5 10134.621 6560 18192 2341 0

Table 1 Test results for 5 sequental UPDATEs

The corresponding graphs are below:

Figure 1 Time to perform UPDATEs with and without NO SAVEPOINT

The first UPDATE statement has almost the same execution time with and without the NO SAVEPOINT option. However, memory consumption is reduced fivefold when we use NO SAVEPOINT.

In the second UPDATE we start to see the difference. With default transaction settings this UPDATE takes a very long time - 47 seconds - compared to only 7210 ms with NO SAVEPOINT enabled. With default transaction settings we can see that memory usage is significant, wherease with NO SAVEPOINT no additional memory is used.

The third and all following UPDATE statements with default settings show equal time and memory usage values and the growth of writes parameters. With NO SAVEPOINT usage we observe that time/memory values and writes growth are all small and virtually equal for each pass.

Figure 2. Memory usage while performing UPDATEs with and without NO SAVEPOINT

Inside the UNDO log

So what happened during the execution of the test scripts? What is the secret behind this magic that NO SAVEPOINT does? Is there any cost attached to these gains?

A few words about versions

You probably know already that InterBase is a multi-record-version database engine, meaning that each time a record is changed, a new version of that record is produced. The old version does not disappear immediately but is retained as a backversion.

In fact, the first time a backversion is written to disk, it is as a delta version, which saves disk space and memory usage by writing out only the differences between the old and the new versions. The engine can rebuild the full old version from the new version and chains of delta versions. It is only if the same record is updated more than once within the same transaction that the full backversion is written to disk

The UNDO log concept

You may recall from Dmitri’s article how each transaction is implicitly enclosed in a “frame” of savepoints, each having its own undo log. This log stores a record of each change in sequence, ready for the possibility that a rollback will be requested.

A backversion materializes whenever an UPDATE or DELETE statement is performed. The engine has to maintain all these backversions in the undo log for the relevant savepoint.

So, the Undo Log is a mechanism to manage backversions for savepoints in order to enable the associated changes to be rolled back. The process of Undo logging is quite complex and maintaining it can consume a lot of resources.

The NO SAVEPOINT option

The NO SAVEPOINT option in InterBase 7.5.1 is a workaround for the problem of performance loss during bulk updates that do multiple passes of a table. The theory is: if using the implicit savepoint management causes problems then let’s kill the savepoints. No savepoints – no problem :-)

Besides ISQL, it has been surfaced as a transaction parameter in both DSQL and ESQL. At the API level, a new transaction parameter block (TPB) option isc_tpb_no_savepoint can be passed in the isc_start_transaction() function call to disable savepoints management. Syntax details for the latter flavors and for the new tpb option can be found in the 7.5.1 release notes.

The effect of specifying the NO SAVEPOINT transaction parameter is that no undo log will be created. However, along with the performance gain for sequential bulk updates, it brings some costs for transaction management.

First and most obvious is that, with NO SAVEPOINT enabled, any error handling that relies on savepoints is unavailable. Any error during a NO SAVEPOINT transaction precludes all subsequent execution and leads to rollback (see “Release Notes” for InterBase 7.5. SP1, “New in InterBase 7.5.1”, page 2-2).

Secondly, when a NO SAVEPOINT transaction is rolled back, it is marked as rolled back in the transaction inventory page. Record version garbage thereby gets stuck in the “interesting” category and prevents the OIT from advancing. Sweep is needed to advance the OIT and back out dead record versions.

Fuller details of the NO SAVEPOINT option are provided in the InterBase 7.5.1. Release

Initial situation

Consider the implementation details of the undo-log.

Figure 3 shows the initial situation:

Figure 3. Initial situation before any UPDATE - only the one record version exists, Undo log is empty

Recall that we perform this test on freshly-restored database, so it is guaranteed that only one version exists for any record.

The first UPDATE

The first UPDATE statement creates delta backversions on disk (see figure 4). Since deltas store only the differences between the old and new versions, they are quite small. This operation is fast and it is easy work for the memory manager.

It is simple to visualize the undo log when we perform the first UPDATE/DELETE statement inside the transaction – the engine just records the numbers of all affected records into the bitmap structure. If it needs to roll back the changes associated with this savepoint, it can read the stored numbers of the affected records, then walk down to the version of each and restore it from the updated version and the backversion stored on disk

Figure 4. UPDATE1 create small delta version on disc and put record number into UNDO

This approach is very fast and economical on memory usage. The engine does not waste too many resources to handle this undo log – in fact it reuses the existing multi-versioning mechanism. Resource consumption is merely the memory used to store the bitmap structure with the backversion numbers. We don’t see any significant difference here between a transaction with the default settings and one with the NO SAVEPOINT option enabled.

The second UPDATE

When the second UPDATE statement is performed on the same set of records, we have a different situation.

Here is a good place to note that the example we are considering is the most simple situation, where only the one global (transaction level) savepoint exists. We will also look at the difference in the Undo log when an explicit (or enclosed BEGIN… END) savepoint is used.

To preserve on-disk integrity (remember the ‘careful write’ principle ?) the engine must compute a new delta between the old version (by transaction1) and new version (by transaction2, update2), store it somewhere on disk, fetch the current full version (by transaction2, update1), put it into the in-memory undo-log, replace it with the new full version (with backpointers set to the newly created delta) and erase the old, now superseded delta. As you can see – there is much more work to do, both on disk and in memory.

The engine could write all intermediate versions to disk but there is no reason to do so. These versions are visible only to the modifying transaction and would not be used unless a rollback was required.

Figure 5. The second UPDATE creates a new delta backversion for transaction 1, erases from disk the delta version created by the first UPDATE, and copies the version from UPDATE1 into the Undo log

This all makes hard work for the memory manager and the CPU, as you can see from the growth of the “max mem”\”delta mem” parameters values in the test that uses the default transaction parameters.

When NO SAVEPOINT is enabled we avoid the expense of maintaining the Undo log. As a result, we see execution time, reads/writes and memory usage as low for subsequent updates as for the first.

The third UPDATE

The third and all subsequent UPDATEs are similar to the second UPDATE, with one exception – memory usage does not grow any further.

Original design of IB implement second UPDATE another way but sometime after IB6 Borland changed original behavior and we see what we see now.But this theme is for another article ;)

Why is the delta of memory usage zero? The reason is that, beyond the second UPDATE, no record version is created. From here on, the update just replaces record data on disk with the newest one and shifts the superseded version into the Undo log.

A more interesting question is why we see an increase in disk reads and writes during the test. We would have expected that the third and following UPDATEs would do essentially equal numbers of read and writes to write the newest versions and move the previous ones to the undo log. However, we are actually seeing a growing count of writes. We have no answer for it, but we would be pleased to know.

The following figure helps to illustrate the situation in the Undo log during the sequential updates.

Figure 6. The third UPDATE overwrites the UPDATE1 version in the Undo log with the UPDATE2 version and its own version is written to disk as the latest one

When NO SAVEPOINT is enabled, the only pieces we need to perform are replacing the version on disk and updating the original backversion. It is fast as the first UPDATE.

Explicit SAVEPOINT

When an UPDATE statement is going to be performed within its own explicit or implicit BEGIN… END savepoint framework, the engine has to store a backversion for each associated record version in the Undo log.

For example, if we used an explicit savepoint, e.g. SAVEPOINT Savepoint1, upon performing UPDATE2, we would have the situation illustrated in figure 7:

Figure 7. If we have an explicit SAVEPOINT, each new record version associated with it will have a corresponding backversion in the Undo log of that savepoint

In this case the memory consumption would be expected to increase each time an UPDATE occurs within the explicit savepoint’s scope.

Summary

The new transaction option NO SAVEPOINT can solve the problem of excessive resource usage growth that can occur with sequential bulk updates. It should be beneficial when applied appropriately. Because the option can create its own problems by inhibiting the advance of the OIT, it should be used with caution, of course. The developer will need to take extra care about database housekeeping, particularly with respect to timely sweeping.

Leave a Reply

You must be logged in to post a comment.