Are user-provided translation strings an attack vector

Question

Twitter and various other web companies allow users to help translate the user interface into their language.

Crowdsourcing translations isn’t new for us. Since October, 2009, we’ve counted on Twitter users to volunteer as translators and help us localize Twitter.

An HTML template then probably substitutes delimited primary language strings with those from the output language. Since the output language strings come from an untrusted source, they could contain a payload to exploit an XSS vulnerability, or, if the results show up in feeds, an XML entity attack or the like.

Does anyone know whether such attacks have shown up in the wild?

score 1 · Answer 1 · answered Jan 14 '12 at 23:44

1

They obviously need to be sanitized/encoded. But I don't see how it is more vulnerable to XSS and the like than other user supplied data.

I'd worry more about messages that mean something different in that other language being substituted. That could be used for social engineering, or to lower the reputation of your website by inserting inappropriate or insulting content.

answered Jan 14 '12 at 23:44

CodesInChaos

12,084
2
41
50

Encoding works if all translation strings are plain text. – Mike Samuel Jan 15 '12 at 02:45
If not, you can use BB-Code, Markdown or a html sanitizer, like every website that allows users to enter formatted text. – CodesInChaos Jan 15 '12 at 10:23
my question is not what I can do, but whether people who have not done so have had problems. – Mike Samuel Jan 19 '12 at 01:00

Are user-provided translation strings an attack vector

1 Answers1