How does a HashMap identify which locations in the internal array contain elements?

I am trying to build a simple implementation of the HashMap class in Java, for learning purposes. I know how rehashing works (Rehashing process in hashmap or hashtable).

When rehashing, all the elements present in the internal array are identified and where they go in the new array can be determined by recomputing their hashes based on the new hash function. However, how are all the elements present in the array identified?

Is there some kind of mechanism which keeps track of all keys, or there is a mechanism which keeps track of the indexes in the internal array which contain elements?

The alternative (which I used in my implementation) would be to scan the entire array for elements. This might be inefficient however as a lot of time would be wasted scanning empty buckets. Is there a better way?

Here is my implementation. The focus here is the rehash(int) function.

public class HashMap<T, U> {
    private static final int MIN_CAPACITY = 16; 
    private static final double LOAD_FACTOR = 0.75; 
    private int mCount = 0; 
    private HashMapItem<T, U>[] mArray = (HashMapItem<T, U>[]) new HashMapItem[MIN_CAPACITY]; 

    public HashMap() {
    }

    private void rehash(int newCapacity) {
        HashMapItem<T, U>[] newArray = (HashMapItem<T, U>[]) new HashMapItem[newCapacity]; 
        for (HashMapItem<T, U> hashMapItem : mArray) {
            if (hashMapItem != null) {
                HashMapItem<T, U> currentNode = hashMapItem; 
                while (currentNode != null) {
                    putInArray(currentNode.key, currentNode.value, newArray); 
                    currentNode = currentNode.next; 
                }
            }
        }
        mArray = newArray; 
    }

    private int hashFunction(T key, int arrayCapacity) {
        return Math.abs(key.hashCode()) % arrayCapacity; 
    }

    private boolean putInArray(T key, U value, HashMapItem<T, U>[] array) {
        boolean duplicateKey = false; 
        int index = hashFunction(key, array.length); 
        HashMapItem<T, U> hashMapItem = array[index]; 
        if (hashMapItem == null) array[index] = new HashMapItem<T, U>(key, value); 
        else {
            HashMapItem<T, U> currentNode = hashMapItem; 
            while (true) {
                if (currentNode.key.equals(key)) {
                    currentNode.value = value; 
                    duplicateKey = true; 
                    break; 
                }
                else if (currentNode.next != null) currentNode = currentNode.next; 
                else break; 
            }
            if (!duplicateKey) currentNode.next = new HashMapItem<T, U>(key, value); 
        }
        return duplicateKey; 
    }

    public void put(T key, U value) {
        if (mCount >= mArray.length * LOAD_FACTOR) rehash(mArray.length << 1); 
        boolean duplicateKey = putInArray(key, value, mArray); 
        if (!duplicateKey) mCount++; 
    }

    public U get(T key) {
        int index = hashFunction(key, mArray.length); 
        HashMapItem<T, U> hashMapItem = mArray[index]; 
        if (hashMapItem != null) {
            HashMapItem<T, U> currentNode = hashMapItem; 
            while (currentNode != null) {
                if (currentNode.key.equals(key)) return currentNode.value; 
                currentNode = currentNode.next; 
            }
        }
        return null; 
    }

    public U remove(T key) {
        U removedItem = null; 
        int index = hashFunction(key, mArray.length); 
        HashMapItem<T, U> hashMapItem = mArray[index]; 
        if (hashMapItem != null) {
            HashMapItem<T, U> currentNode = hashMapItem; 
            HashMapItem<T, U> previousNode = null; 
            while (currentNode != null) {
                if (currentNode.key.equals(key)) {
                    removedItem = currentNode.value; 
                    if (previousNode == null) mArray[index] = currentNode.next; 
                    else previousNode.next = currentNode.next; 
                    break; 
                }
                previousNode = currentNode; 
                currentNode = currentNode.next; 
            }
        }
        if (removedItem != null) mCount--; 
        return removedItem; 
    }

    public int count() {
        return mCount; 
    }

    private class HashMapItem<T, U> {
        T key; 
        U value; 
        HashMapItem<T, U> next; 

        public HashMapItem(T key, U value) {
            this.key = key; 
            this.value = value; 
        }
    }
}

Answer

There are two approaches to this problem:

  • Maintain a linked list-like structure of non-empty buckets – this could be done reasonably efficiently. It could also give you predictability on iteration, similar to LinkedHashMap, or
  • Scan all locations on re-hashing – this is exactly what you are doing.

Effectively, the choice boils down to paying with memory for reduced use of CPU. If you must iterate the hash map often, the first solution is better. If you do it only when you rehash, the second solution is better, because rehashing happens only when your map is relatively full. In other words, the majority of checks during the scan are going to succeed.

Leave a Reply

Your email address will not be published. Required fields are marked *