Understanding Hashes – For Beginners

MD5, CRC32…many have heard of it but do you know what they actually do and what their purpose is? And why it is impossible to simply extract the password from a ZIP or RAR file?

Hashes are everywhere: From the point when I installed this blog onto the webserver, right now when this data is send in packets to your computer and when you log in to my blog.

To clarify things lets have a simple example:

You have a set of numbers like:

6, 8, 2, 8, 1, 9

Now you want to make sure that no one tampers with those numbers. You can use a hash for this purpose. There are different types of hashes but lets create our own: We just add all numbers together to get our hash:

6 + 8 + 2 + 8 + 1 + 9 = 34

So our hash is 34 with the input of {6, 8, 2, 8, 1, 9}. So, what is the purpose of this? Lets pretend some data got lost on the way…or you want to remember those numbers but are not sure anymore if they are right. After 1 year you try to remember and come up with these numbers:

6 + 8 + 2 + 8 + 4 + 9

If we draw the hash again we see that the hash is now 37 instead of 34 so a change in the number must have taken place! And this is one purpose of hashes to find a change in a set of data. Of course our hash mechanism is not very efficient because of two reasons:

  1. The hashes of {6, 8, 2, 7, 2, 9} or even {5, 9, 1, 9, 0, 10} are the same even though the data is totally different. MD5 or CRC32 have a far more complex algorithm to compute a hash which is far more unique than ours. The longer the resulting hash the more unique it is obviously (as long as the algorithm is right).
  2. With increasing numbers {1923, 3216213, 321098, 32, 1151, 519…} our hash gets bigger and bigger. The beauty of a real hash algorithms is that no matter how small or big the input, the computed hash always has the same length!

Try it yourself with a MD5 hash generator!

Another purpose of a hash is the following: Lets say you have a website and want to implement a log in system. You want to make your website secure for the users and in case a hacker gets into your system, you do not want to leak out any of the username and passwords. A simple yet very effective way is store the passwords as hashes instead!

So lets say you register for a website and your Username is “MorsPrinciupumEst” and the password is “Metal4Ever”. You take that password and generate a Hash from it (usually MD5 or a modified version of it) and store it inside the website database like

Username: MorsPrincipiumEst

Password:b9626a9649a4c3b8bce5c16ac09298fe

The hacker now will have a tough time retrieving that password from that hash. He has no way of telling what the password is unless he tries to compute every possible hash for every possible combination of numbers and letters (called Bruteforcing).

The next time the user logs in the website will compute a hash again from the input password and compare it to the one stored in the database. A very effective trick but it has 2 possible downsides:

  1. If you loose your password, there is no way of retrieving it (unless Bruteforce).
  2. Even though hashes are unique, there is always a chance that two inputs result in the same hash. Thus there are unlimited password possibilities to put in which result in the same hash (even though the chances are very slim)!

My third and last example is the use of hashes inside archives which are encrypted with a password.

Before a program like WinZIP or WinRAR packs a program to reduce its size, it generates a hash for the original data. When packing, the original data is encrypted with the password you entered. So when unpacking, you have to have the correct password to correctly decrypt that data again. How does WinRAR know that the password and the decrypted data is now correct? When packing the data, it adds the hash which was computed based on the original data and stores it along side inside the archive. After unpacking it recalculates the hash for the unpacked data and see if it matches with the original hash. If the hashes are the same, the data was unpacked successfully and the password is correct (mathematically not entirely true but very unlikely that the opposite occurs).

That’s why unless you try every password combination out there, there is no way of extracting the password of a RAR or ZIP file. Those password recovery applications are useless unless the password is max. 4 characters long. Like everything else, they just try every possible password combination out there. It would take years to bruteforce a password with 8 or more characters (the reason the minimum password length is usually around 6).

This is my little insight on Hashes. Hope you enjoyed it!

2 thoughts on “Understanding Hashes – For Beginners

Leave a message

Your email address will not be published. Required fields are marked *

*