Why we use B+ tree for clustered index rather than hashing?

In MySQL InnoDB or lots of other database engines, the primary key is implemented with clustered index. However after searching with secondary index, the engine must look up into clustered index with primary keys provided in secondary index(if there is no covering index).

InnoDB uses B+ tree for its clustered index, it is a structure with O(log n) complexity in searching, so we can summerize the procedure like the following:

  1. Using clusterd index: One pass, Cost O(n).
  2. Using secondary index: Two passes. The first pass cost O(log n) an result in m records. Then the second pass cost O(log n) for each of the m records, so the time complexity will be m*O(log n).

I know when using hasing, the time complexity in seaching can be reduced to O(1), so I am wondering why these database engines prefer using B+ tree rather than hasing techniques(e.g. build a KV store)? Is it because of records are stored on disk rather than in memory?

Meanwhile, I have another question, some other databases, like RocksDB, use KV storage rather than B+ tree. Why they use that?


I want to make the question more clearly. I find many tables are designed with auto increment PK, rather than using something with actual meaning, like phone number or IP. So B+ tree’s advantage is not fully exploited. For example, B+ tree is good at searching data in range, but I searching a auto increment PK in range is rare in practice.


An important characteristic of B-tree indexes is the so-called range scan. Hash indexes don’t have that characteristic. The name of the older MySQL table engine, MyISAM, holds a clue. It stands for Indexed Sequential Access Method. The inherent ordering of BTREE indexes is a major feature.

If we have a table credit with, for example, columns credit_id, user_id, datestamp, and amount, we might use this query.

SELECT SUM(amount) amount FROM credit 
 WHERE datestamp >= CURDATE() - INTERVAL 7 DAY
   AND datestamp <  CURDATE();

With a two-column BTREE index on (datestamp, amount) MySQL can random-access the index O(log n) to the first eligible datestamp, and then sequentially access it O(1) for each successive eligible datestamp. And, because amount is in the index, MySQL can completely satisfy the query from the index. (It’s called a covering index). InnoDB indexes implicitly include the primary key column, that is, the key to the clustered index holding all the table’s data.

Most large production tables have several covering indexes defined for them, chosen to accelerate queries that take the most time in the particular application.

I’m not claiming that HASH indexes are useless. Far from it. But it’s pretty clear that MySQL would work far less efficiently without BTREE indexes.

(The InnoDB engine’s code has many optimizations for inserting rows with autoincremented primary keys. If an application uses something else — like a randomized guid — for a primary key it can defeat those optimization.)

Leave a Reply

Your email address will not be published. Required fields are marked *