Hash Table implementation in Python

Posted on 2023-04-16 Edited on 2023-10-22 In Tech , Programming Views:

Intro

Hash Table is a data structure which maps keys to values for highly efficient data search. It stores data in an array by using a hash function to generate a slot to insert the data value.

The following are the general steps to insert a key/value pair to the Hash Table.

Compute the key’s hash code with a hash function
Find the available slot by mapping the hash code to an index of array
Insert the value to the available slot

Initial implementation

Create an empty hash table of size 10:

hash_table = [None] * 10

To create a hash function, we simply return the modulus of the array length. The modulo operator(%) yields the remainder from the division of the key by the length of hash table. This ensures the hash key always falls into the range of [0,len(hash_table)-1]. So a slot in the array can be always retrieved.

def hash_func(key):
    return key % len(hash_table)

To insert key/value pair to the hash table, we firstly get the hash key and then stores the value to the retrieved slot.

def insert(key, value):
    hash_key = hash_func(key)
    hash_table[hash_key] = value

To test the code, we insert two key/value pairs (5,6) and (5,8). Both values are stored in the slot 5 of the array. However, the value 8 replaces the existing value 6 at the same slot(with the same key 5).

hash_table = [None] * 10
print(hash_table)
insert(5,6)
print(hash_table)

insert(5,8)
print(hash_table)

# Output:
# [None, None, None, None, None, None, None, None, None, None]
# [None, None, None, None, None, 6, None, None, None, None]
# [None, None, None, None, None, 8, None, None, None, None]

Solve the collision

A hash key collision would occur when the multiple keys hit the same slot(index) in the hash table(array).

There are generally two techniques to resolve a collision:

Linear probing(open addressing)
Separate chaining(open hashing)

Linear probing

In linear probing(aka open addressing), all the entries are stored in the array itself instead of linked list. If the slot to insert entry is already occupied, the next slot will be searched sequentially.

hash_key = hash_func(key)
while hash_table[hash_key]:
    hash_key = (hash_key + 1) % len(hash_table)

Separate chaining

Separate chaining is a very commonly used technique for collision resolution. It is usually implemented with linked list. All the entries will be inserted into a specific linked list.

In python, we create a hash table as nested list below. Each hash table bucket contains a list to store data entries.

hash_table = [[] for _ in range(10)] # init with empty list

The same hash function can be used.

def hash_func(key):
    return key % len(hash_table)

To insert an entry to the hash table, we simply append it to the list positioned by the hash key.

def insert(key, value):
    hash_key = hash_func(key)
    hash_table[hash_key].append(value)

To test the code, we still insert two key/value pairs (5,6) and (5,8). Now, the two values 6 and 8 are stored in the slot 5 of the same list.

hash_table = [[] for _ in range(10)] # init with empty list
print(hash_table)
insert(5,6)
print(hash_table)
insert(5,8)
print(hash_table)

# Output:
# [[], [], [], [], [], [], [], [], [], []]
# [[], [], [], [], [], [6], [], [], [], []]
# [[], [], [], [], [], [6, 8], [], [], [], []]

Final implementation

Now that we know about the hash function and how to resolve hash collision, we can implement the hash table with insert, delete and search functions. We use Python built-in function hash() to generate hash code from an generic object. This allows the hash table to support generic types like integer, string and so on.

def hash_func(key):
    hash_key = hash(key)
    print("Hash key is {}".format(hash_key))
    return hash_key % len(hash_table)


def insert(key, value):
    hash_key = hash_func(key)
    bucket = hash_table[hash_key]

    # check if the key already exists
    existed = False
    for idx, pair in enumerate(bucket):
        k, v = pair
        if key == k:
            existed = True
            break

    if existed:
        # update the existing value
        bucket[idx] = (key, value)
    else:
        # add the new key/value pair
        bucket.append((key, value))


def delete(key):
    hash_key = hash_func(key)
    bucket = hash_table[hash_key]

    # check if the key already exists
    existed = False
    for idx, pair in enumerate(bucket):
        k, v = pair
        if key == k:
            existed = True
            break

    if existed:
        # delete the key/value pair
        del bucket[idx]
        print("key {} deleted".format(key))
    else:
        # key does not exist
        print("key {} not found".format(key))


def search(key):
    hash_key = hash_func(key)
    bucket = hash_table[hash_key]

    # check if the key already exists
    for idx, pair in enumerate(bucket):
        k, v = pair
        if key == k:
            return v


hash_table = [[] for _ in range(10)]  # init with empty list
print(hash_table)
insert(5, 6)
print(hash_table)
insert(5, 8)
print(hash_table)
insert(3, 4)
print(hash_table)
insert(9, 999)
print(hash_table)
insert(11, 111)
print(hash_table)
insert(20, 20)
print(hash_table)
insert(1000, 1000)
print(hash_table)
print(search(11))
delete(11)
insert("11", 222)
print(hash_table)
insert("aa", "val")
print(hash_table)

# Output:
# [[], [], [], [], [], [], [], [], [], []]
# Hash key is 5
# [[], [], [], [], [], [(5, 6)], [], [], [], []]
# Hash key is 5
# [[], [], [], [], [], [(5, 8)], [], [], [], []]
# Hash key is 3
# [[], [], [], [(3, 4)], [], [(5, 8)], [], [], [], []]
# Hash key is 9
# [[], [], [], [(3, 4)], [], [(5, 8)], [], [], [], [(9, 999)]]
# Hash key is 11
# [[], [(11, 111)], [], [(3, 4)], [], [(5, 8)], [], [], [], [(9, 999)]]
# Hash key is 20
#[[(20, 20)], [(11, 111)], [], [(3, 4)], [], [(5, 8)], [], [], [], [(9, 999)]]
# Hash key is 1000
# [[(20, 20), (1000, 1000)], [(11, 111)], [], [(3, 4)], [], [(5, 8)], [], [], [], [(9, 999)]]
# Hash key is 11
# 111
# Hash key is 11
# key 11 deleted
# Hash key is 3453933796215988004
# [[(20, 20), (1000, 1000)], [], [], [(3, 4)], [('11', 222)], [(5, 8)], [], [], [], [(9, 999)]]
# Hash key is -185426594550641729
# [[(20, 20), (1000, 1000)], [('aa', 'val')], [], [(3, 4)], [('11', 222)], [(5, 8)], [], [], [], [(9, 999)]]