I am planning to develop a website that require that the users register a username and a password. When I let the user choose a password, what chars should I allow the users to have in the password? is there any that I shouldn't because of security issues with the http protocol or implementation language?

I haven't decided for a implementation language yet but I will use Linux.

From a security/implementation perspective, there shouldn't be any need to disallow characters apart from '\0' (which is hard to type anyway). The more characters you bar, the smaller the total phase space of possible passwords and therefore the quicker it is to brute-force passwords. Of course, most password-guessing actually uses dictionary words rather than systematic searches of the input domain...

From a usability perspective, however, some characters are not typed the same way on different machines. As an example, I have two different computers here where shift-3 produces # on one and £ on the other. When I type a password in, both appear as '*' so I don't know whether I got it right or not. Some people think that could confuse people enough to start disallowing those characters. I don't think it's worth doing. Most real people access real services from one or maybe two computers, and don't tend to put many extended characters in their passwords.

There can be issues with non-ASCII characters. A password is a sequence of glyphs, but the password processing (hashing) will need a sequence of bits, so there must be a deterministic way to transform glyphs into bits. This is the whole murky swamp of code pages. Even if you stick to Unicode, there is trouble afoot:

  • A single character can have several decompositions as code points. For instance, the "é" character (which is very frequent in French) can be encoded as either a single code point U+00E9, or as the sequence U+0065 U+0301; both sequences are meant to be equivalent. Whether you get one or the other depends on the conventions used by the input device.

  • A Unicode string is a sequence of code points (which are integers in the 0 to 1114110 range). There are several standard encodings for converting such a sequence into bytes; the most common will be UTF-8, UTF-16 (big-endian), UTF-16 (little-endian), UTF-32 (big-endian) and UTF-32 (little-endian). Any of these may or may not start with a BOM.

Therefore a single "é" can be meaningfully encoded into bytes with at least twenty distinct variants, and that's when sticking to "mainstream Unicode". Latin-1 encoding, or its Microsoft counterpart, is also widespread, so make that 21. Which encoding a given piece of software will use may depend upon a lot of factors, including the locale. It is bothersome when the user cannot log on his computer anymore because he switched the configuration from "Canadian - English" to "Canadian - French".

Experimentally, most problems of that kind are avoided by restricting passwords to the range of printable ASCII characters (those with codes ranging from 32 to 126 -- personally I would avoid space, so make that 33 to 126) and enforcing mono-byte encoding (no BOM, one character becomes one byte). Since passwords are meant to be typed on various keyboards with no visual feedback, the list of characters should be even more restricted for optimal usability (I daily battle with Canadian layouts where what is written on the keyboard does not necessarily match what the machine thinks it is, especially when going through one or two nested RDP connections; the '<', '>' and '\' characters are most often moving around). With just letters (uppercase and lowercase) and digits, you will be fine.

You could say that the user is responsible; he is free to use any characters he wishes as long as he deals with the problem of typing them. But that's not ultimately tenable: when users have trouble, they call your helpdesk, and you have to assume part of their mistakes.

If you are generating random passwords, it's a good idea to avoid characters that can be confused for others. For example (ignoring symbols):

  • Lowercase: l, o
  • Uppercase: I, O
  • Numbers: 1, 0
In addition to allowing all characters, consider having a very generous max length on the password field to support people who take the passphrase approach to passwords.

The phrase "my password is all in lowercase" is actually a reasonable strong passphrase due to its length.

There are a couple of characters that may cause issues:

*, ? and %: As these are often used as wildcards they may confuse the underlying programming language.

Tab, Return, NewLine, Vertical Tab, Escape: Such special characters can solicit weird behavior from your programming language OR from the browser used by the customer. (If the customer uses several different browsers it is quite possible that one will allow these to be entered and another browser not. Effectively locking the customer out of his account on that browser.)

\ is often treated as an escape character that gives the character that follows special meaning.
E.g. "\n" is newline in many cases. "\t" is tab.
If your programming language (or the customers browser) does this you are back to the possibility of receiving the characters I mentioned above.
So it is probably best to dis-allow \ altogether just to be safe.

I think that unless a 'virtual keyboard' or a similar tool is available, that would produced characters in uniform way, we have alphanumeric characters only. The location of all the rest can differ on different keyboards. If a user should access the service from another location, that could lead to efficiently locking them out of service.

I would suggest using virtual keyboard as a way to send exactly the same character representations (it was said about Unicode above already) in the same manner no matter what system/keyboard/whatever is used. Thus there will be no need to exclude any character that could be typed on any keyword.


You should not disallow any characters. You may wish to prevent passwords from being shorter than 6 characters. And then you should use bcrypt to hash the password.

If you allow upper and lower case alphanumerics and set the minimum password length to eight characters you should be OK. Allowing other characters raises issues with different keyboards.

