é is not é: The same glyph can have different Unicode representations

created 2012-Jan-25

Did you know that é is not the same as é? No, seriously.

The first is a Unicode "Latin small letter e with acute" character, 0xC3 0xA9 in UTF-8.

The second is a "Latin small letter e" character (0x65 in ASCII and UTF-8) followed by a "Combining Acute Accent" character (0xCC 0x81 in UTF-8). The second glyph is zero-width and draws over top of the first.

Why does this matter? Well, if you use OS X to name and upload a file to your web server and then later try to navigate to the file by typing in the address in Windows, you will fail.

Making matters worse, when you then browse the directory of files on the web server and click on the link you get a file name that looks exactly like what you typed in, but that works (unlike what you typed).

Unicode is hrrd.

net.mind details other résumé contact
Home of Phrogz.net