Inside BLOBs

This is an excerpt from the book “1000 InterBase & Firebird Tips & Tricks” by Alexey Kovyazin and Dmitri Kouzmenko, which will be published in 2006.

How the server works with BLOBs

The BLOB data type is intended for storing data of variable size. Fields of BLOB type allow for storage of data that cannot be placed in fields of other types, - for example, pictures, audio files, video fragments, etc.

From the point view of the database application developer, using BLOB fields is as transparent as it is for other field types (see chapter “Data types” for details). However, there is a significant difference between the internal implementation mechanism for BLOBs and that for other data.

Unlike the mechanism used for handling other types of fields, the database engine uses a special mechanism to work with BLOB fields. This mechanism is transparently integrated with other record handling at the application level and at the same time has its own means of page organization. Let’s consider in detail how the BLOB-handling mechanism works.

Initially, the basic record data on the data page includes a reference to a “BLOB record” for each non-null BLOB field, i.e. to record-like structure or quasi-record that actually contains the BLOB data. Depending on the size of the BLOB, this BLOB-record will be one of three types.

The first type is the simplest. If the size of BLOB-field data is less than the free space on the data page, it is placed on the data page as a separate record of “BLOB” type.

The second type is used when the size of BLOB is greater than the free space on the page. In this case, references to pages containing the actual BLOB data are stored in a quasi -record. Thus, a two-level structure of BLOB-field data is used.

If the size of BLOB-field contents is very large, a three-level structure is used – a quasi-record stores references to BLOB pointer pages which contain references to the actual BLOB data.

The whole structure of BLOB storage (except for the quasi-record, of course) is implemented by one page type – the BLOB page type. Different types of BLOB-pages differ from each other in the presence of a flag (value 0 or 1) defining how the server should interpret the given page.

BLOB Page

The blob page consists of the following parts:

The special header contains the following information:

  • The number of the first blob page in this blob. It is used to check that pages belong to one blob.
  • A sequence number. This is important in checking the integrity of a BLOB. For a BLOB pointer page it is equal to zero.
  • The length of data on a page. As a page may or may not be filled to the full extent, the length of actual data is indicated in the header.

Maximum BLOB size

As the internal structure for storing BLOB data can have only 3 levels of organization, and the size of data page is also limited, it is possible to calculate the maximum size of a BLOB.

However, this is a theoretical limit (if you want, you can calculate it), but in practice the limit will be much lower. The reason for this lower limit is that the length of BLOB-field data is determined by a variable of ULONG type, i.e. its maximal size will be equal to 4 gigabytes

Moreover, in reality this practical limit is reduced if a UDF is to be used for BLOB processing. An internal UDF implementation assumes that the maximum BLOB size will be 2 gigabytes. So, if you plan to have very large BLOB fields in your database, you should experiment with storing data of a large size beforehand.

The segment size mystery

Developers of database applications often ask what the Segment Size parameter in the definition of a BLOB is, why we need it and whether or not we should set it when creating Blob-fields.

In reality, there is no need to set this parameter. Actually, it is a bit of a relic, used by the GPRE utility when pre-processing Embedded SQL. When working with BLOBs, GPRE declares a buffer of specified size, based on the segment size. Setting the segment size has no influence over the allocation and the size of segments when storing the BLOB on disk. It also has no influence on performance. Therefore the segment size can be safely set to any value, but it is set to 80 bytes by default.

Information for those who want to know everything: the number 80 was chosen because 80 symbols could be allocated in alphanumeric terminals

Leave a Reply

You must be logged in to post a comment.