J
If only UTF-8 were the natural enemy of the encoding bug... I'm currently overhauling a PHP-based registration form. Unfortunately, I can't change everything over in one piece due to time constraints. I also can't adapt the old code to play nice with the new one because the previous coder was a friend of sphagetti-esque PHTML nightmare code without indentation or even consistent linebreaks. PHP tags containing HTML blocks (like [tt]<?php if ($foo) { ?> <html stuff /> <?php } ?>[/tt]) are the norm.Now I'm running into issues where I'm working in/displaying UTF-8, the old code is working in/displaying an unspecified encoding and the database apparently refuses to accept anthing other than latin-1 without somehow corrupting the string. In order to actually store user data without running into an encoding issue I'm now translating all entered text into HTML entities, storing that in the DB and then translating it back when I need it on the screen. Of course that means modifications to the PHTML mess, but it's still better than the site reliably breaking whenever anyone uses an encoding that is not expected by the script/DB.Effectively, I now translate all entered text into UTF-8 if it isn't already and then reduce that UTF-8 to ASCII via HTML entities. Now I just need a camera and a wooden table... PS: Yeah, I could just work in latin-1, only the database still chokes on data sent by browsers which prefer to send UTF-8 so I still need to reduce somewhere - and the HTML entity way does have the advantage of being resistant to injection attacks.