Hashing Techniques and Collision Handling: All You Need to Know

Share this

Hashing in Java is a technique for storing data in the form of key-value pairs. A hash function is used to convert the key into a unique identifier, called a hash code. The hash code is then used to store the value in a hash table.

Let us see an example of the usage of hashing i.e. the HashMap

import java.util.HashMap;
import java.util.Map;

public class HashingExample {

    public static void main(String[] args) {

        // Create a hash map
        Map<String, Integer> hashMap = new HashMap<>();

        // Add some key-value pairs to the hash map
        hashMap.put("John", 25);
        hashMap.put("Mary", 30);
        hashMap.put("Peter", 35);

        // Get the value for the key "John"
        int johnsAge = hashMap.get("John");

        // Get the value for the key "Mary"
        int marysAge = hashMap.get("Mary");

        // Get the value for the key "Peter"
        int petersAge = hashMap.get("Peter");

        // Print the values
        System.out.println("John's age is " + johnsAge);
        System.out.println("Mary's age is " + marysAge);
        System.out.println("Peter's age is " + petersAge);
    }
}

This code will print the following output:

John's age is 25
Mary's age is 30
Peter's age is 35

Hash tables are very efficient for storing and retrieving data because the hash code can be used to quickly find the correct value.

Internal Working of Hashing Technique

hashing technique in java

The internal working of hashing in Java is as follows:

  1. A hash function is used to generate an index from the key. The hash function is a mathematical function that takes the key as input and produces an integer as output.
  2. The index is used to access the corresponding entry in the array. If the entry is empty, the data is stored at that location. If the entry is not empty, the data is overwritten.
  3. When data is retrieved, the hash function is used to generate an index. The index is used to access the corresponding entry in the array. The data is then returned.

Hashing Collision

However, hash tables can also suffer from collisions, which occur when two different keys have the same hash code. When this happens, the values for the two keys are stored in a linked list at the hash table location.

There are two main types of collisions that can occur in hashing:

  • Primary collision: This occurs when two different keys produce the same hash index.
  • Secondary collision: This occurs when two different keys produce the same hash index, and the array entry at that index is already occupied.

Primary Collisions

Primary collisions are handled by chaining. This means that each array entry contains a linked list of all the data that has the same hash index. When data is stored, it is added to the end of the linked list at the corresponding index. When data is retrieved, the linked list is searched for the data with the matching key.

Secondary Collisions

Secondary collisions are handled by rehashing. This means that the hash function is used to generate a new index for the data. The new index is then used to access the corresponding entry in the array.

Chaining Collision Handling

Chaining is a collision-handling technique in hashing. It is used to resolve collisions when two or more keys are hashed to the same index in a hash table. In chaining, each hash table index is linked to a linked list of all the elements that have been hashed to that index. This allows for fast lookups, even when there are collisions.

Rehashing Collision handling

Rehashing is another collision-handling technique. In rehashing, when a collision occurs, the hash function is used to generate a new index for the data. The new index is then used to access the corresponding entry in the hash table. This can be more efficient than chaining when the number of collisions is small, but it can be less efficient when the number of collisions is large.

Which Collision handling is better?

In terms of performance, chaining is generally better than rehashing. This is because chaining does not require the hash function to be recomputed when a collision occurs. Rehash, on the other hand, requires the hash function to be recomputed, which can be time-consuming.

However, chaining can also be more space-consuming than rehashing. This is because chaining requires each hash table index to be linked to a linked list of elements. Rehash, on the other hand, only requires each hash table index to store a single element.

The best collision handling technique to use depends on the specific application. If performance is the most important factor, then chaining should be used. If space is the most important factor, then rehashing should be used.

Summary

Here is a table that summarizes the advantages and disadvantages of chaining and rehashing:

TechniqueAdvantagesDisadvantages
Chaining* Fast lookups * Simple to implement * Scalable* Can be space-consuming * Can lead to long linked lists if there are many collisions
Rehashing* Efficient when there are few collisions * Space-efficient* Can be slow when there are many collisions * More complex to implement

Benefits of Hashing

Here are some of the benefits of using hashing in Java:

  • Fast lookups: Hashing allows for fast lookups of data since the index can be calculated directly from the key.
  • Efficient storage: Hashing can be used to store large data sets efficiently.
  • Flexibility: Hashing can be used to store a variety of data types, including strings, integers, and objects.
  • Scalability: Hash tables can be scaled to handle large amounts of data.

Drawbacks of Hashing

Here are some of the drawbacks of using hashing in Java:

  • Collisions: Collisions can occur when two different keys have the same hash code. This can slow down the performance of hash tables.
  • Space complexity: Hashing can use more space than other data structures, such as arrays.
  • Complexity: Hashing can be more complex to implement than other data structures.
  • Security: Hash tables can be vulnerable to security attacks, such as hash collisions and rainbow tables.

Conclusion

Overall, hashing is a powerful tool that can be used to store and retrieve data in Java. However, it is important to be aware of the challenges of using hashing, such as collisions and security vulnerabilities.

Further Readings

Feel free to share your thoughts on this topic in the comments section below 👇 We would be happy to hear and discuss the same 🙂

Share this

Leave a comment

Your email address will not be published. Required fields are marked *