Skip to main content

Hash Tables : Scratching the Surface

Have you ever worked with key-value data stores where you can attach one value to a key and store it in some form to perform easy lookups ? If yes, then i guess you have already seen hash tables in action. Hash Tables are so common these days that you don't even realise when you start using them but not a lot of people are familiar with the internals of hash tables and how these can be used in real world problems like cryptography. This a series of articles all about hashing. So sit tight and enjoy the journey.

What Problems Does Hashing Solve ?

Hash Tables are the most popular data structure in computer science. We forget to worry about time complexities of finding our data in hash table as its basically constant. But it doesn't show how hashing solves real world problems. Following are some problems which are being solved by hashing only.

  • Computer Cache is internally a hash table with keys being the tag bits (refer to cache post)
  • Python Dictionaries are implemented using hashing scheme only.
  • Associative Arrays in any programming language such as php are maintained using hashing.
  • Cryptographic Hashing uses hash functions to digest a variable length message to a fixed length string.
Apart from the above scenarios, you can use hash tables anywhere you may find appropriate. As hash tables store data in unordered fashion, we can't use it in sorting applications.

Simple Implementation of Hash Table

We can consider a hash table as an array where each array entry holds the original key and value pair. If the key itself is a value then there is no need to store value separately. For any hash table to work, you need some way to map keys to array indices. This mapping is done by a hash function which can be anything you want it to be. Desirable properties of a hash function are:

  • Uniform Distribution of keys over hash table. We'll see shortly why is that needed.
  • Mapping of one key to one hash table entry must not change over time.
Considering the above properties in mind, we can build our hash function using many simple arithmetic techniques. One of them is modulo operator. Assume we are managing integer keys and there is a value associated to each key. Thus we need our hash table to perform constant time lookup of integers. This can be achieved by using an integer array. Our hash function which maps a key to one hash entry location performs the modulo operation on the key and returns the index in the array where the key should be placed. Following is the implementation of such a hash table in c++.
#include <iostream>
class HashTable{
public:
    static const int CAPACITY = 10;
private:
    int* entries;
    bool* valid_map;
    int size;
public:
    Table(){
        entries = new int[CAPACITY];
        valid_map = new bool[CAPACITY];
        size = 0;
    }
    void insert(int key, value){
        int index = hash(key);
        entries[index] = value;
        valid_map[index] = true;
        size++;
    }
    void delete(int key){
        int index = hash(key);
        valid_map[index] = false;
        size--;
    }
    bool exists(int key){
        int index = hash(key);
        return valid_map[index];
    }
    int find(int key){
        int index = hash(key);
        if(valid_map[index] == true){
            return entries[index];
        }
        return -1;
    }
private:
    int hash(int key){
        return key%CAPACITY;
    }
};
    
As you see above, our hash table is capable of storing 10 keys. But there is one major problem with the above approach. There is no collision handling. If two keys map to same location then second key will overwrite the first one. We'll see how can we tackle Collisions in the next post and will look at how python handles hashing of objects.

Comments

Popular posts from this blog

Cordova viewport problem solved

Include the viewport settings in Cordova If you are facing the auto zooming problem of cordova then go read on the full article. Cordova actually ignores the viewport meta tag which causes the pixel density problem. So we need to tell cordova that viewport tag is equally important as other tags. To do this, we need to add some code to a file which is specify in the article. Corodva messes with pixels If you are using the latest cordova version or creating the cordova app for latest android versions then you may have faced the zoom malfunctioning.I also faced it when creating an app. Many of you may have already searched the web and found the answer of changing the meta tag attributes to get it working. But adding target-densitydpi=medium-dpi does not solve the problem for latest android versions. It may work for gingerbread but not for kitkat and others. So the final solution which i found was one of the stackexchange answer but rarely found. So i am gonna two things here, i ...

Understanding Python Decorators

If you have ever wondered what those @something mean above a python function or method then you are going to have your answers now. This @something line of code is actually called a decorator. I have red from various articles about them but some of them were not able to clarify the concept of a decorator and what we can achieve with them. So in this post we'll learn a lot about python decorators. Here is a list of topics we'll be covering. What is python decorator Understanding the concept Multiple decorators on same function class method decorator Where can we use decorators What is python decorator A python decorator is nothing but a function which accepts your given function as a parameter and returns a replacement function. So its like something this def decorator(your_func): def replacement(your_func_args): #do some other work return replacement @decorator your_func(your_func_args): #your_func code Now when your_func gets called then...

Image Search Engine Using Python

Images provide a lot more information than audio or text. Image processing is the prime field of research for robotics as well as search engines. In this article we will explore the concept of finding similarity between digital images using python. Then we will use our program to find top 10 search results inside a dataset of images for a given picture. It won't be as good as google's search engine because of the technique we will be using to find similarity between images. But what we are going to make will be pretty cool. So lets start. Setting up the Environment Our Algorithm How the code looks Lets build the GUI Additional Techniques Setting up the Environment The code we are going to write requires a few tools which we need to install first. I will try to be as precise as i can and if you get stuck into installing some tool then you can drop a comment below and i will help you sort out the problem. So here are the tools and the steps to install ...