Java HTML encoding?



  • A (probably stupid) question before I start reinventing the wheel:
    Is there a standard class in Java that has a method to HTML-encode a string?
    < ... &lt;
    >... &gt;
    ä....&auml;
    etc.

    but not the URLencoder thingy, which would tranlate blanks to +



  • Not in the Java API, at least not that I know of. Of course, I haven't Java'd since 1.4, so take that with a grain of salt. Google seems to back me up, though. If you're doing JSP or servlets, I belive there are some utilities in Jakarta that will do the trick, otherwise you'll need to roll your own.



  • Well, if you don't need to use the named HTML entities, it's trivial to convert to the numerical ones:



    public static String encodeHTML(String s)

    {

        StringBuffer out = new StringBuffer();

        for(int i=0; i<s.length(); i++)

        {

            char c = s.charAt(i);

            if(c > 127 || c=='"' || c=='<' || c=='>')

            {

               out.append("&#"+(int)c+";");

            }

            else

            {

                out.append(c);

            }

        }

        return out.toString();

    }




  • Thanks brazzy, I will use that. I think it's good enough - if not, at least we might make it to the front page.



  • Usual method for this is in the JTidy package: http://jtidy.sourceforge.net/

    Look at the source for org.w3c.tidy.servlet.util.HTMLEncode if you want to know what it's doing: http://jtidy.sourceforge.net/multiproject/jtidyservlet/clover/org/w3c/tidy/servlet/util/HTMLEncode.html



  • Thanks Otto. I'm still surprised that there is no generally-accepted one-for-all solution to this common task.



  • @ammoQ said:

    Thanks Otto. I'm still surprised that there is no generally-accepted one-for-all solution to this common task.

    Haven't used Java long, have you?

    Java tends to be made up of a lot of user-based packages. The stuff from Sun is useful, but not extremely so. So you have all this open source work like Jakarta and JTidy and Spring and dozens of other packages which are actually used to make real-world Java apps on servers and such.

    Anyway, the JTidy package is "generally-accepted" and "one-for-all", as far as that goes. It's just not "official".



  • @Otto said:

    Java tends to be made up of a lot of user-based
    packages. The stuff from Sun is useful, but not extremely so. So you
    have all this open source work like Jakarta and JTidy and Spring and
    dozens of other packages which are actually used to make real-world
    Java apps on servers and such.




    Tell me, which language are you using where this is different? Compared
    to most other languages, the standard Java API is rather large and
    powerful. To the point where some people are complaining about it being
    bloated and containting too much specialized stuff.



  • @Otto said:


    Haven't used Java long, have you?

    Since 1999 or so. But I've been a terrible wheel-reinventer, so in the past I didn't care that much if something that easy is available in a standard lib or not. Now that the sword of damocles called TDWTF is hanging over my had, I've changed my stance.

    Java tends to be made up of a lot of user-based packages. The stuff from Sun is useful, but not extremely so. So you have all this open source work like Jakarta and JTidy and Spring and dozens of other packages which are actually used to make real-world Java apps on servers and such.

    I know, but IMO there are just too many of those frameworks comming from Apache, so I rather try to avoid them.



  • @ammoQ said:


    I know, but IMO there are just too many of those frameworks comming from Apache, so I rather try to avoid them.


    Wait, what?  That's a strange justification.  If you'd said "Apache's frameworks tend to be overengineered" or something (which, yeah, some of them really are -- maven, struts, I'm looking at you two here) that'd make sense to me, but because there are "too many"?



  • @Angstrom said:

    @ammoQ said:

    I know, but IMO there are just too many of those frameworks comming from Apache, so I rather try to avoid them.


    Wait, what?  That's a strange justification.  If you'd said "Apache's frameworks tend to be overengineered" or something (which, yeah, some of them really are -- maven, struts, I'm looking at you two here) that'd make sense to me, but because there are "too many"?

    Think about it.
    a) The sheer number of them makes them short-lived. It's hard to become good in one framework before it's outdated.
    b) Since the customer decides what framework is today's buzzword, it's rather unlikely he will choose the one you're actually good at.
    c) in their struggle for publicity, it's likely that the developers will add feature after feature so they don't fall back behind other frameworks



  • @brazzy said:

    @Otto said:
    Java tends to be made up of a lot of user-based
    packages. The stuff from Sun is useful, but not extremely so. So you
    have all this open source work like Jakarta and JTidy and Spring and
    dozens of other packages which are actually used to make real-world
    Java apps on servers and such.




    Tell me, which language are you using where this is different? Compared
    to most other languages, the standard Java API is rather large and
    powerful. To the point where some people are complaining about it being
    bloated and containting too much specialized stuff.

    Try taking a peek at Python Global Module Index once in a while.



  • Have you checked out the StringEscapeUtils class in the Jakarta Commons library?



  • @ron_g said:

    Have you checked out the StringEscapeUtils class in the Jakarta Commons library?


    Thanks, that's what I was looking for.



  • In the code above, you might also want add a check for the & character and replace with &amp; 



  • @lukejackson said:

    In the code above, you might also want add a check for the & character and replace with &amp; 

     

    So... You registered and made your first post a response to an almost two year old post. Welcome to the WTF. 



  • Did someone say Cold Fusion?

     

    HTMLEncode() ???? 

     

     

    Edit: wait why did i help bump this old thread WTF 


Log in to reply