Salt, pepper and rainbows: storing passwords properly

ℹ️ This article was written many years ago. Its technical content may be outdated.

On any site that uses passwords for authentication, be it Google, Facebook or this one, the site needs to have some way of telling whether the password given by the user matches the correct password. This alone is a simple concept — but at the same time we need to make sure that we protect users’ passwords if the database containing usernames, passwords, etc is compromised. Many people use the same password for different web services, so if an obscure website’s database was hacked and the emails and passwords of their users were compromised, it’s likely that more than a few used the same email and password combination for their PayPal account, Google account, online banking accounts… you get the idea.

Let’s first consider the example where we simply store users’ passwords in plaintext (that is, if a user’s password is abc123 we store abc123 in the database). Consider the following set of users.

UserPasswordPassword data stored in database

Obviously anyone who gains access to this database will have all the login details for all users immediately. The site owners always need to be able to tell whether a user knows the correct password, so all we can do is obscure the process of calculating a user’s password from their database data.

(Aside: encrypting the database with a secret key could be seen as a solution; however, the majority of web attacks are done through the likes of SQL injection vulnerabilities, which execute at a level at which the database can be read in a decrypted form.)

A good first step is to apply a public, well-tested hash function to the passwords in the database, and store the password hashes rather than the passwords themselves. A hash function is a one-way function; that is, is it easy to perform one way (i.e. turning the password into the hash), but hard or impossible to do the reverse (i.e. turning the hash into the password). Since hashing algorithms map to a fixed-length string, there are guaranteed to be hash collisions and so it is often impossible to know the original text given the hash. For this example I am going to use the SHA1 algorithm, which is conveniently built into many web programming languages including PHP. Our database would now look as follows.

UserPasswordPassword data stored in database

To the human eye this would probably be good enough. Also, note that although User2 and User3 have very similar (but not identical) passwords, their hashes are completely different. (As a rule of thumb, changing one bit of data in the plaintext should flip approximately 50% of the bits of the resulting hash.) However, there are still a couple of caveats:

  • Anyone looking at the database would know that User1 and User2 have the same password.
  • Databases using popular hashing algorithms are still vulnerable to rainbow tables. These are huge tables holding the result of hashing every possible string, often up to a certain number of characters. For example, the rainbow table available here for SHA1 hashes of lowercase letters, numbers and spaces up to 9 characters is already 52GB! Anyone with a hash could run it through such a rainbow table and come up with possible sources for that hash.

Rainbow tables are only useful when someone has gone to the bother of generating them, which means that obscure or self-coded hashing algorithms will likely not have a rainbow table against which they can be looked up. However, such algorithms are not likely to be mathematically rigorous enough to be considered a strong hash function, which is bad news if you’re a big target like Google.

A way to protect against rainbow table attacks is to ensure that the number of characters in the plaintexts are not included in the rainbow table’s domain. For example, in the example above, the 52GB rainbow table only worked when the passwords were 9 characters or less — so anything above 9 characters would be invulnerable to this current rainbow table. And since it’s extremely low overhead to hash something as short as a string of text, we might as well add, say, 10 characters to each password. (In practice you would want to do more than this, but I want to fit everything on the page!) A common string of characters that we concatenate with every password is called a salt. Let’s take our salt as abcdefghij. Then, for example, the first entry in the database will be the result of applying the SHA1 function to abcabcdefghij.

UserPasswordSaltPassword data stored in database

Note that User1 and User2 still have the same stored password. While we assume that the salt will also be compromised if the database is, all existing rainbow tables will be worthless. However, if the target was high-value, a new rainbow table could be constructed to work explicitly for this database. Since the overhead of each SHA1 operation is very low, we might as well slow down any would-be crackers by performing it recursively many times, say 4096. (This is the number of rounds of hashing performed by the WPA algorithm, which takes the SSID of the access point as its salt.) This will represent no slowdown to us, but to an attacker it will slow down the creation of the rainbow table 4096-fold. We now have the following data.

UserPasswordSaltPassword data stored in database

We still have the problem that User1 and User2 can be identified as having the same password. So, as well as having the database-wide salt to obscure our data from rainbow tables, we add a pepper to obscure each entry from each other. (In practice, an attacker would have to create a separate rainbow table for every entry they wanted to crack!) With the peppers listed below, we now have the following final data.

UserPasswordSaltPepperPassword data stored in database

And now we finally have our solution! The same plaintexts now hash to different values, as their individual peppers are different.

There are some additional measures we could take to increase the security of the system as a whole.

  • Hash the user’s password client-side using JavaScript or similar so that the server never “sees” the original password. This will prevent a passive man in the middle attack. (While we’re at it, we may as well salt the hash.)
  • Use a session-encryption technology like https. Such a technique is required to prevent active MITM attacks, as additional layers of JavaScript will only provide obscurity rather than security if an attacker can change a page’s or script’s content in transit to the user.

If the above wall of text proved difficult to follow, the process is pictured below.

The site owner would take the data provided by the user, run it through this process, and compare the result with the data stored in the database. If they match, the password the user originally entered must have been correct. All this can be done by the owner with absolutely minimal computational overhead. Are you paying attention, Sony?