List implementation that is a view over multiple sublists?

I’m working on a piece of software that very frequently needs to return a single list that consists of the first (up to) N elements of a number of other lists. The return is not modified by its clients — it’s read-only.

Currently, I am doing something along the lines of (code simplified for readability):

List ret = new ArrayList<String>();
for (List aList : lists) {
    // add the first N elements, if they exist
    ret.addAll(aList.subList(0, Math.min(aList.size(), MAXMATCHESPERLIST)));
    if (ret.size() >= MAXMATCHESTOTAL) {
        break;
    }
}
return ret;

I’d like to avoid the creation of the new list and the use of addAll() as I don’t need to be returning a new list, and I’m dealing with thousands of elements per second. This method is a major bottleneck for my application.

What I’m looking for is an implementation of List that simply consists of the subList() results (those are cheap views, not actual copies) of each of the contained lists.

I’ve looked through the usual suspects including java.util, Commons Collections, Commons Lang, etc., but can’t for the life of me find any such implementation. I’m pretty sure it has to have been implemented at some point though and hopefully I’ve missed something obvious.

So I’m turning to you, Stack Overflow — is anyone aware of such an implementation? I can write one myself, but I hate re-inventing the wheel if the wheel is out there.

Suggestions for alternative more efficient approaches are very welcome!

Optional background detail (probably not all that relevant to my question, but just in case it helps you understand what I’m trying to do): this is for a program to fill crossword-style grids with words that revolve around a theme. Each theme may have any number of candidate word lists, ordered in decreasing order of theme relevancy. For instance, the “film” theme may start with a list of movie titles, then a list of actors, then a generic list of places that may or may not be film-relevant, then a generic list of english words. The lists are each stored in a wildcarded trie structure to allow fast lookups that meet the grid constraints (e.g. “CAT” would be stored in trie’d lists against the keys “CAT”, “CA?”, “C??”, “?AT”, … “???” etc.) Lists vary from a few words to several tens of thousands of words.

For any given query, e.g. “C??”, I want to return a list that contains up to N (say 50) matching words, ordered in the same order as the source lists. So if list 1 contains 3 matches for “C??”, list 2 contains 7, and list 3 contains 100, I need a return list that contains first the 3 matches from list 1, then the 7 matches from list 2, then 40 of the matches from list 3. And I want that returned “conjoined list view” operation to be more efficient than having to continuously call addAll(), in a similar manner to the implementation of subList().

Caching the returned lists is not an option due to memory constraints — my trie is already consuming the vast majority of my (32 bit) max-sized heap.

PS this isn’t homework, it’s for a real project. Any help much appreciated!

Answer

Do you need random access for the resulting list? Or you client code only iterates over the result?

If you only need to iterate over the result. Create a custom list implementation which will have list of the original lists 🙂 as the instance field. Return custom iterator which will take items from every list one by one and stops when there are no more items in any of the underlying lists or you return MAXMATCHESTOTAL items already.

With some thoughts you can do the same for random access.

Leave a Reply

Your email address will not be published. Required fields are marked *