Comments on query 3 and 20

Dmitri Kouzmenko Dmitri Kouzmenko, our optimization expert and lead developer of IBAnalyst software, reviews the results of the most slowest queries in the recent TPC-R based test results. He have investigated queries 3 and 20, which were the most slowest at InterBase and found some interesting facts.

Since queries 3 and 20 are not passed in InterBase 7.5.1, I think I need to explain, why.

InterBase 7.5 (also as all previous Interbase versions) generates the following plan for query 3:

PLAN SORT (
       SORT (
         JOIN (ORDERS INDEX (ORDERS_ORDERDATE),
               LINEITEM INDEX (LINEITEM_PK,LINEITEM_SHIPDATE),
               CUSTOMER INDEX (CUSTOMER_PK))))

everything looks fine, except when you start to execute this query on 1.5 gb TPC-R database. This will run for 6 hours and more. The problem is in LINEITEM_SHIPDATE index.

Firebird 1.5 optimizer is not using this index, because it leaves only primary key (or unique) index if any additional indices were found (since there is an equality join by l_orderkey = o_orderkey)

But, since InterBase is using this index, let’s look what is wrong with it.
LINEITEM table have 6 million rows, and index LINEITEM_SHIPDATE have only 2526 (thanks to IBAnalyst) unique (different) keys. Longest duplicate chain is 2706 keys, but the most bad thing that condition

l_shipdate > date '1995-03-15'

covers ~3.5 million keys, which is slightly greater than half of the whole index. So, bitmap AND of LINEITEM_PK and LINEITEM_SHIPDATE indices will be made in memory for long, long, long time.

Can it be fixed? Yes, hopefully.
Just change in WHERE condition

 and l_shipdate > date '1995-03-15'
to
	
 and l_shipdate+0 > date '1995-03-15'

this will cause plan to be the same with Firebird 1.5.2

 PLAN SORT (
        SORT (
          JOIN (ORDERS INDEX (ORDERS_ORDERDATE),
                CUSTOMER INDEX (RDB$PRIMARY4),
                LINEITEM INDEX (RDB$PRIMARY6))))

because LINEITEM_SHIPDATE index now can’t be used by optimizer.

Also, InterBase 7.5.1 will execute this tuned query 2 times faster than Firebird 1.5.2, with the equal (by default) settings in ibconfig and firebird.conf.

And, why InterBase 7.5.1 with corrected "plan" executes this query 2 times faster? We don’t know the details, but know where speed is - InterBase 7.1 SP2 and InterBase 7.5.1 does very fast grouping with the temporary sort file.

For example, if you have query like

select ....
   ...
   group by field1, field2...
   ...

and PLAN SORT(join…

here it is. Query (join) result will be stored and sorted in temporary file, and grouped also. Right now InterBase can do this faster than Firebird.

There is same situation with query 20, as with query 3 - index LINEITEM_SHIPDATE corrupting performance again.

To avoid usage of this index by optimizer, change lines

 and l_shipdate >= date '1994-01-01'
 and l_shipdate < date '1995-01-01'

to

 and l_shipdate+0 >= date '1994-01-01'
 and l_shipdate+0 < date '1995-01-01'

and query performance will be fixed.

Anyway, since query 20 does not have grouping and sorting, as query 3, Interbase doesn’t win here, query execution time is the same as Firebird 1.5.2.

But, you do not need to think about LINEITEM_SHIPDATE as a some kind of evil. If you will look to other query plans, you will see that this index is used in lot of them, and used by both optimizers, InterBase 7.5.1 and Firebird 1.5.2.

We hope that InterBase R&D will fix this optimizer behaviour in new versions.

One Response to “Comments on query 3 and 20”

  1. kdv Says:

    Need to say sorry because information in this topic is only partially correct. This test was done with 512Mb RAM, but when 1Gb RAM is present, Firebird 1.5 executes Query N3 with the same speed (4 seconds faster) than InterBase 7.5.1. Thus, I can say now that assumptions in this article are only correct about how InterBase 7.5 works with database cache - and there is less conflict with system file cache when there is not much RAM on the computer…

Leave a Reply

You must be logged in to post a comment.