Why hibernate PersistentSet.contains() is so slow? (compared to java.util.HashSet)

I’m developing a use case where I’m given a collection of IDs (named group) and need to verify which of these IDs are inside another collection (named projectDevicesIds) and which don’t. Note that this last collection is a PersistentSet obtained from a database. The code is very simple as follow:

Collection<String> inside = new HashSet<>();
Collection<String> notInside = new HashSet<>();
group.forEach(id -> {
        if (projectDevicesIds.contains(id)) inside.add(id);
        else notInside.add(id);
    });

So far so good, the problem is when projectDevicesIds (hibernate PersistentSet) is 100 000 in size and group contains 1000 ids, this code takes an average of 200 ms to run. When I do the same test but instead of using a PersistentSet I use a HashSet it only takes 1 ms! Even if the test is not professionally accurate, this difference is insane and harms my use case performance. In hibernate official docs they say PersistentSet uses HashSet internally, so I was expecting about the same performance.

Can someone explain to me why PersistentSet.contains() takes so long comparing to HashSet? And help me in some way improving this use case performance?

Answer

A PersistentSet represents an association on the database. This means that when you call contains, Hibernate ORM needs to first flush previous operations that might affect the association and eventually reload it from the database. Or it might need to just reload it if the association has been lazy loaded.

The difference in performance shouldn’t be that high once the collection has been loaded the first time but it really depends from how you get the projectDevicesIds.

If you enable the log, you should see if Hibernate ORM needs to run additional queries when you call the contains method or not.

Leave a Reply

Your email address will not be published. Required fields are marked *