Could hashing prevent SQL injection?

Question

On a whim I've recently decided to throw up the first proper website I created onto my local web server I use for development. I thought it'd be a great environment to throw some SQL at for injection as I know there are flaws and the site was only really meant for my own personal development.

Anyway, to get to the point, after a few attempts the furthest I could get was to have the page return an error with the statement. I was trying to get into a specific test account I set up, (if the result returns more than one account an error's thrown, so I didn't expect selecting every username where 1=1 to work), but every time I got a response as if I had entered a normal, incorrect password. I took a look at the PHP code and turns out I was hashing the password before the query so the attack was being hashed before it could do any harm.

Being new to web security as a whole, and having an interest in web development, I was wondering whether there are any vulnerabilities with this method of SQL injection prevention as I expect to have not thought something through. Just to clarify, this isn't meant to be a "look guys I've found something new" as there are plenty more brighter sparks in information security than myself, who would have likely figured this out already, but I'd like to know why this likely isn't suitable as a security mechanism.

An answer to a distantly related question that I was reading earlier today, especially "The act of hashing a password is the act of making the password safe to store in your database. The hash function doesn't give special meaning to any bytes, so no cleansing of its input is required for security reasons" — Honinbo Shusaku, Dec 19 '16 at 19:01
Any information that is hashed can't be displayed. You end up hashing only for some and not for other. If you are thinking of using other cipher then you end up with the same problem as just replacing ' with '', you hope that this solution works, which often time doesn't because of special cases that you didn't think of. Better have a consistent method that has proven to work which is parameterize query. — the_lotus, Dec 20 '16 at 13:52

score 36 · Accepted Answer · answered Dec 19 '16 at 16:08

36

So, hashing the user password before entering it into the query is a coincidental security feature to prevent SQL injection, but you can't necessarily do that with all user input. If I'm looking up a Customer by their Name and I have a query like

Select * from Customer Where Name like '%userInput%'

If I set userInput as a hashed version of what was typed in, it wouldn't work correctly even if Name was also hashed in the database, it would only work for exact search query. So if I had a Customer with Name "Sue" and I typed in a search for "sue", it wouldn't work. I also wouldn't know the name of the customers unless there was an exact match in my search, which isn't practical.

The way you want to prevent SQL injection is to not make your queries like above, you'll want to process and parameterize inputs into a query, stripping out things like =s and other symbols that don't make sense in context of the input. A good guide for preventing SQL injection can be found here.

answered Dec 19 '16 at 16:08

Ryan Kelso

1,230
9
14

21
Nowadays, any reasonably maintained query engine does parameterization out of the box, meaning you don't need to do any sanitizing yourself. You just learn the parameterization syntax, write your query that way, and pass the values in separately from the string containing the query. So I think saying you "process" these inputs is misleading. 2. Sanitizing input doesn't "strip" symbols; it escapes them.

jpmc26

Dec 20 '16 at 00:09

"So if I had a Customer with Name "Sue" and I typed in a search for "sue", it wouldn't work" — this actually can work if the name is always converted to lowercase before storing it to DB. (but I'm not saying that it's a good practice of course)

– Display Name Dec 20 '16 at 08:06

@Sarge Sure, that's called normalization. But it still wouldn't match on a query for "Borsch" - because it's hashed, you're matching all or nothing. – Riking Dec 20 '16 at 08:45

1

Thanks for the response, and the guide also. Though I'm still curious as to why methods of obfuscation aren't used in protecting against SQL Injection more, as it seems to do the job quite well, and even something such as a caesars cipher could provide a quite effective prevention of SQL injection. Of course I'm not suggesting this solely be used as a single point of defence as that seems like a terrible idea with any method of security, but rather wonder why it's not such a popular prevention method used alongside others. – lewis Dec 20 '16 at 08:51

@Riking from a development and not security point of view, I'd say converting to upper / lower case before hashing should solve that, for usernames where case sensitivity shouldn't be an issue. – lewis Dec 20 '16 at 08:55

@lewis The reason is that obfuscation techniques are much, much more complicated then just using whatever method the API you are using offers by default. All user input requires escaping, whether we output it to SQL, CSV, HTML or whatever. The rules of how escaping should be performed differ per output format and are generally pretty complex to code from scratch... which is why we put it into some common libraries. Just use something like

Query q = new Query("SELECT * FROM customer WHERE name like :name"); q.setParameter("name", "%" + name + "%"); q.execute();

or whatever your API does. – Stijn de Witt Dec 20 '16 at 10:57

@jpmc26 You are entirely correct of course, query engines are the way to go with working with database interactions as they provide a bunch of security out of the box and can overall reduce code complexity with the database interaction, on your second point again correct and I should correct my answer as escaping is the more standard way of accepting the input. – Ryan Kelso Dec 21 '16 at 13:53

@lewis So the reason we don't really care about obfuscation is that it's not super simple to obfuscate both the input and the data in the back end and match those and return a plaintext response (as in the case of searching for customer of name Sue), while if we always escape the characters that can be used to create a SQL query in code (which you should be using query engines as jpmc26 pointed out and Stijn de Witt as they make this fool proof and you don't reinvent the wheel) that's fast and no worries about data in the backend being reconverted for the front when it's returned. – Ryan Kelso Dec 21 '16 at 14:00

score 7 · Answer 2 · edited Mar 17 '17 at 13:14

7

Basically yes, if you hash input (represented in Hex or Base64 format) before passing it to SQL, it can no longer be an effective SQLi attack vector. The same goes if you parseInt the input. Those simply do not support the characters needed for a useful SQLi. (namely to break out of the quoted string)

This technique has only limited utility in practice. i.e. it would not be compatible with some of the famous SQL features such as LIKE, <, >,
or indexed equality searches (= and <> using index)

As a side note, I would point out that password hashes really should be salted. If that criteria were met in your example, then I think you would no longer be including a hash in the SQL WHERE clause.

edited Mar 17 '17 at 13:14

Community

1

answered Dec 19 '16 at 16:16

700 Software

13,997
3
55
82

7

That said... once you hash you can never go back. Hashing user input before storage would defeat injection even with weak code, surely, but you'd never be able to use this approach in the general case since, other than passwords, you don't really want to destroy the data you're having the user input in the first place. – J... Dec 19 '16 at 19:26
1

"I would point out that password hashes really should be salted." Such a good point!! If you don't use salts properly, you might end up creating the greatest crossword puzzle in the history of the world just like Adobe did... – Stijn de Witt Dec 20 '16 at 11:02

score 1 · Answer 3 · answered Dec 19 '16 at 16:19

What you are doing (inadvertently) is part of what is known as "sanitizing your input", which are steps every application should be taking to reduce defects (and therefore vulnerabilities.)

An app should first ensure the input can't overflow the app's ability to accept data. That might mean length-checking the input, or otherwise rejecting oversized data.

Next, the app should be checking input against a whitelist, if possible, to make sure that no unwanted input can enter the logic. In some cases this might mean ensuring the input only contains valid alphanumeric characters. But sometimes a whitelist doesn't make sense for that particular input field, yet the developer still needs to prevent injection, so they incorporate a blacklist. The blacklist validates the input doesn't contain symbols known to make injection possible. The drawback to blacklists is that they only protect against what the developer knows to block. An example might be a blacklist that stops a quote mark to prevent SQL injection; but the developer might not recognize when input passes through URL encoded that a %27 will become a quote mark.

Internally, the application should be written defensively to guard against various forms of injection. Using parameterized SQL (instead of constructing a SQL query using concatenation) helps prevent SQL injection. But many other places where user input lands can be vulnerable to different types of injection. Directory paths can be impacted if an attacker inputs something like "..\..\..\..\..\..\windows\cmd.exe" XPath queries can be injected to tamper with unauthorized data. And JavaScript can pass all the way through the database to tamper with the victim's browser. Other forms of defense include not revealing internal diagnostic error messages in the output, and logging to assist in identifying potential attackers.

What your app has done is to first pass the input to a hash routine, which has the dual effect of sanitizing the data and limiting its length before it arrives at the SQL step. Just be mindful that the input to the hash routine may still be vulnerable to a buffer overflow. What happens when an attacker enters a password of "aaaa[...repeated 1000 times...]aaaa_Evil_Injection_Here"? Your app still needs to handle that.

I disagree with your concentration on sanitization. That's the sort of thing that leads to Mr O'Doherty being unable to enter his name into a web-form (and no: Odoherty and ODoherty are both wrong). In my view the right approach is to make sure that you don't trust untrusted data - throughout the life of the data (so the javascript should not be included in the output webpage in a way that it is executed — Martin Bonner supports Monica, Dec 19 '16 at 20:50
@MartinBonner, I agree that whitelisting and blacklisting are not nearly as good as writing high-quality code that can accept the input without risking an injection attack. However, with the complexity of many development shops (contractor team A writes the GUI, different contractor team B writes the app server, contractor team C writes the rules engine, etc.) the quality of the underlying layers varies. Sometimes defensive coding at a higher layer prevents exploiting a vulnerability that exists at a lower layer. — John Deters, Dec 19 '16 at 21:37
I agree with Martin... check my name... guess how many times I have seen messages like 'Last name contains illegal characters'? The fact is that performing the sanitation yourself is very difficult, very error-prone and usually wrong. You don't actually want to prevent a user from typing a space or quote... You just want to make sure those symbols don't mess with whatever rules the output format has. The best way to deal with it imho is indeed to know that user data (which includes all data stored in the DB) is not safe and use whatever escaping mechanism is in place for that format. — Stijn de Witt, Dec 20 '16 at 11:08
I disagree with "Sanitizing your input" - although the data is an input to the system, it is OUTPUT from the application tier (where SQL injection occurs). — symcbean, Dec 20 '16 at 12:27

Your Common Sense · Answer 4 · 2016-12-20T11:40:47.150

Like many other learners, you are confusing the storage and the transport. And in order to protect the latter, you are deliberately spoiling the former.

You have to understand that SQL injection is not called "Database injection" for a reason. You don't have to protect the database, the storage. All you need to protect is the transport - the SQL query. While your storage should be able to store the data as is, or it will be of no value at all. If you still need any proof for that statement,

there could be different clients, who have no idea of your home-brewed "protection"
there is not only strict comparison, and not only comparison: there are helluvalot of statements in which your data could be involved: arithmetics, substring search, various conversions, string manipulations and calculations of all sorts
a database is not only used to store your data but also to manipulate it. Try to order your hashed usernames
after all, sometimes you need to print the data back from a database.

So you can tell that whatever protection measure that does change the stored data in any way is simply wrong. And your idea is simply not well thought.

score 0 · Answer 5 · answered Dec 20 '16 at 02:26

Hashing protects SQL injection of the password field by virtue of the step that encodes its binary output back to a string.

This protection is normally useless but suffices if the following assumptions are true:

Your application doesn't handle strings anywhere but username and password.
You strip username of ' and \ characters.

If either of these are not true or may ever be in the future, hashing is not a useful defense. This eliminates 99.9% of web applications.

Could hashing prevent SQL injection?

5 Answers5