Skip to main content

Hash Tables : Handling Collision

We saw in the previous post how a simple hash table can be constructed without much effort. But, and that is a big But, that table was prone to collisions and could result in corrupted data. Of course we don't want a data structure which can corrupt our data even if it is faster. So to avoid data corruption we will study some collision resolution techniques and will see how to evenly distribute a key over the table to minimize collision. Lets continue the journey.

Collision Resolution

First of all, what is collision ? Collision is a phenomenon which happens when 2 or more distinct keys are mapped to the same hash table entry. But we don't want that so using some technique, we relocate that key to some other empty location. The technique used to do that is called collision resolution technique. There are 2 types of techniques in use. One is called Chaining and Other is Open Addressing. Open Addressing has further more types but the concept is basically the same. Lets start with chaining first and see how it can be useful.

Chaining

As the name suggests, In chaining technique , we create a list of colliding keys for every hash entry location. So our hash table is basically an array of pointers which are pointing to the corresponding list. Whenever 2 distinct keys are mapped to same location, we append them to that location's list. The below picture will make it more clear.

Now the question comes about searching for a key in chained table. As you can see in the above picture, we cannot find a key in one go. We'll first have to find the entry location in hash table and then will have to traverse the whole list associated with that location. Each list node has the original key and thus we can compare our key with node's key to find the match.

Chaining may seem to solve the problem of collision and in reality it does but there is an overhead of memory. Each new list node must have a pointer to the next node. Also you may have observed in the above picture that a bad hashing technique may try to place all the keys in one location, thus making search a linear operation. Thus we can get amortized O(1) time for get, set and search operations if our hashing technique evenly distributes the keys

Open Addressing

After seeing how the chaining works, its time to explore another collision resolution technique which utilizes the hash table to its full potential. As we saw in chaining, a lot of keys can be placed in one location and then many other locations remain empty and not fully utilized. Open addressing solves this problem. There are many types of open addressing techniques but in this post we will look at Linear Open Addressing

In linear open addressing, we first find a key location inside hash table where it should sit using hash function. Then we check whether that location is empty or not. If its empty, place the key there otherwise, find the next empty location linearly in circular motion, i.e., go to the start to the table if there is no empty location till the end of the table. After finding an empty entry, place the key there. The following picture will help you understand this better.

Now lets to come to search. Search operation is somewhat similar to insert operation. When we want to search a key in table, we first find its location using hash function. After that, we linearly go through all the keys and match with our key until we find it. If we encounter an empty entry during our search or if we are back to where we started, it implies that the key is not present. As you can imagine, linear probing may result in clustures if our hash function is not uniformly distributive enough. The other type of open addressing which tries to avoid clustures is Quadratic Probing. In this we don't go in linear fashion instead we go in steps of power of 2. Like we first go 1 step to find next empty bucket, 2 steps if not found, 4 steps if no empty location found, 8 steps and so on.

As we saw in this post Open addressing solves memory problem but chaining is faster for high load factors [load factor = (total number of keys)/(number of entries in table)]. In real world cases, people prefer open addressing as Open addressing is usually faster than chained hashing when the load factor is low because you don't have to follow pointers between list nodes. If the load factor approaches 1, open addressing becomes very very slower. But usually, the table size is kept double the number of total keys in the table and thus open addressing achieves good performance. This was it for this post, in the next post, we'll see how python manages its dictionaries and how we ensure that table size is always double the number of entries. Stay tuned. Also share this article to your friends if this helped you in any way.

Comments

Popular posts from this blog

Cordova viewport problem solved

Include the viewport settings in Cordova If you are facing the auto zooming problem of cordova then go read on the full article. Cordova actually ignores the viewport meta tag which causes the pixel density problem. So we need to tell cordova that viewport tag is equally important as other tags. To do this, we need to add some code to a file which is specify in the article. Corodva messes with pixels If you are using the latest cordova version or creating the cordova app for latest android versions then you may have faced the zoom malfunctioning.I also faced it when creating an app. Many of you may have already searched the web and found the answer of changing the meta tag attributes to get it working. But adding target-densitydpi=medium-dpi does not solve the problem for latest android versions. It may work for gingerbread but not for kitkat and others. So the final solution which i found was one of the stackexchange answer but rarely found. So i am gonna two things here, i ...

Image Search Engine Using Python

Images provide a lot more information than audio or text. Image processing is the prime field of research for robotics as well as search engines. In this article we will explore the concept of finding similarity between digital images using python. Then we will use our program to find top 10 search results inside a dataset of images for a given picture. It won't be as good as google's search engine because of the technique we will be using to find similarity between images. But what we are going to make will be pretty cool. So lets start. Setting up the Environment Our Algorithm How the code looks Lets build the GUI Additional Techniques Setting up the Environment The code we are going to write requires a few tools which we need to install first. I will try to be as precise as i can and if you get stuck into installing some tool then you can drop a comment below and i will help you sort out the problem. So here are the tools and the steps to install ...

Understanding Python Decorators

If you have ever wondered what those @something mean above a python function or method then you are going to have your answers now. This @something line of code is actually called a decorator. I have red from various articles about them but some of them were not able to clarify the concept of a decorator and what we can achieve with them. So in this post we'll learn a lot about python decorators. Here is a list of topics we'll be covering. What is python decorator Understanding the concept Multiple decorators on same function class method decorator Where can we use decorators What is python decorator A python decorator is nothing but a function which accepts your given function as a parameter and returns a replacement function. So its like something this def decorator(your_func): def replacement(your_func_args): #do some other work return replacement @decorator your_func(your_func_args): #your_func code Now when your_func gets called then...