Skip to main content

Hash Tables : Designing Hash Functions

In the last post, we saw how collision in handled in a hash table. Now its time to explore some real world scenarios because come on, what is the use of learning all this if we can't apply it in real world softwares. As we learned that open addressing and chaining were both good in different scenarios only if we met a condition that the keys are uniformaly distributed. But its not the application job to pass distributed keys to hash tables, its our job to map any key to some random location. We do this using a hash function (or prehash function, whatever term you prefer). A hash function should map the keys as uniform as possible. So lets discuss some decent hash functions to some high end hash functions.

Desining Hash Functions

Hash Functions do the job of mapping a key to some location in hash table. This is achieved by doing some fixed operations on the key and generating a number between 0 to the capacity of table. Hash functions highly determine the overall complexity of hash table for insert and search operations because as we saw in open addressing if a lot of keys are mapped to same location then it will shorty form a clusture there and then inserting a new key or searching an old key will not be a O(1) operation anymore. So we want our hash function to have some properties. Lets see what they are.

Properties of Hash Functions

Desirable properties of a hash function are:

  • Uniform Distribution: Each key should be equally likely hashed to any slot of the table independent of where other keys hash. This means that hashing of a key should not be related to hash table size or placement of other keys.
  • Time Independent: Hash values of keys should be changed over time. This implies to the fact that if i hash my key now and if i hash the same key one year later, the resulting hash values in both the cases should be same.

Basic Hash Functions

As we saw what we want in our hash functions, we are ready to write some. We are making one assumption that the key is always integer. Later we'll see how to convert any object to an integer atleast in c/c++. Some of the hash functions which are used commonly are described below.

  • Division Method: In this method, we take modulo of key with table size. This will generate a number between 0 and size of table. This may create a problem if keys and size of table have common factors, then only some of table space will be utilized.
  • Multiplication Method: Opposite of the division method is multiplication method. In this method we multiply the key with some FIXED number and take out a bunch of bits.
  • Universal Hashing: In this method, we compute a value by multiplying the key with prime number and then taking the mod with size of table. The worst case possibility of colliding two keys is 1/(size of table) which is ideal situation.

Cryptographic Hash Functions

Hash Functions are not only used to compute the location of a key in a table but also to convert any given key or should we say any data to a fixed length digest. This digest is unique for each key and thus it is made sure that no 2 keys have same digest. Such type of hash functions are used in cryptography to encrypt a message or a password to transfer it safely from one place to another. Cryptographic hash functions must have a property other than the above two and that is they should be one way. This means that it must not be possible to recover the key from its digest. There are many widely used hash functions which are either used for encryption or just to compute checksum. Two most commonly used and famous hash functions are MD and SHA. We won't go in detail how these work because there are libraries to use these hash functions. Lets see what are those.

  • MD (Message Digest): The MD family comprises of hash functions MD2, MD4, MD5 and MD6. It was adopted as Internet Standard RFC 1321. It is a 128-bit hash function. MD5 was the most popular hash function for some years until in 2004, collisions were found in MD5. Since then it is not recommended to use but still people use it for less sophisticated applications where collisions are allowed.
  • SHA (Secure Hash Function): Family of SHA comprise of four SHA algorithms; SHA-0, SHA-1, SHA-2, and SHA-3. SHA-1 is most widely used hash function of this family. It is also employed in SSL (Secure Socket Layer).

This was it for this post. We'll see in the final post of this series how python manages it dictionaries and lists. Also share this article to your nerd friends to make them aware of where the world is heading. Keep Hashing

Comments

Popular posts from this blog

Cordova viewport problem solved

Include the viewport settings in Cordova If you are facing the auto zooming problem of cordova then go read on the full article. Cordova actually ignores the viewport meta tag which causes the pixel density problem. So we need to tell cordova that viewport tag is equally important as other tags. To do this, we need to add some code to a file which is specify in the article. Corodva messes with pixels If you are using the latest cordova version or creating the cordova app for latest android versions then you may have faced the zoom malfunctioning.I also faced it when creating an app. Many of you may have already searched the web and found the answer of changing the meta tag attributes to get it working. But adding target-densitydpi=medium-dpi does not solve the problem for latest android versions. It may work for gingerbread but not for kitkat and others. So the final solution which i found was one of the stackexchange answer but rarely found. So i am gonna two things here, i ...

Understanding Python Decorators

If you have ever wondered what those @something mean above a python function or method then you are going to have your answers now. This @something line of code is actually called a decorator. I have red from various articles about them but some of them were not able to clarify the concept of a decorator and what we can achieve with them. So in this post we'll learn a lot about python decorators. Here is a list of topics we'll be covering. What is python decorator Understanding the concept Multiple decorators on same function class method decorator Where can we use decorators What is python decorator A python decorator is nothing but a function which accepts your given function as a parameter and returns a replacement function. So its like something this def decorator(your_func): def replacement(your_func_args): #do some other work return replacement @decorator your_func(your_func_args): #your_func code Now when your_func gets called then...

Image Search Engine Using Python

Images provide a lot more information than audio or text. Image processing is the prime field of research for robotics as well as search engines. In this article we will explore the concept of finding similarity between digital images using python. Then we will use our program to find top 10 search results inside a dataset of images for a given picture. It won't be as good as google's search engine because of the technique we will be using to find similarity between images. But what we are going to make will be pretty cool. So lets start. Setting up the Environment Our Algorithm How the code looks Lets build the GUI Additional Techniques Setting up the Environment The code we are going to write requires a few tools which we need to install first. I will try to be as precise as i can and if you get stuck into installing some tool then you can drop a comment below and i will help you sort out the problem. So here are the tools and the steps to install ...