ASP/VBScript text "manglipulations"



  • Hello:

        I've seen a lot of posts commenting on the braindead
    use of simple string operations in ASP scripts, such as string
    concatenations and replacements.  I have, at times, been forced to
    work on ASP scripts using (gasp!) VBScript, and was not really aware of
    such deficiencies in the language that give string manipulations such
    poor performance.  I've already found resources point out how to
    perform faster string concatenation by using dynamic Arrays and the
    Join() function, but now I'm curious as to what would be the better way
    of performing repetitive substring replacement.  For example,
    suppose you get a string from a database or some other source, and you
    need to replace every instance of certain terms (abbreviations,
    acronyms, macros, whatever) with an "expanded" string, how would you do
    this instead of using Replace() multiple times?



    Would it really be more efficient to use the RegExp object instead of
    Replace() every time you need to replace a substring?  I've used
    RegExps quite often, but mostly when complex patterns need to be
    replaced; for straightforward translations I've used Replace(), like:


        strText = Replace(strText, "FOO", "FooBar") 

    What would you recommend?

    -dZ.


  • In my experience, Replace() has rarely been a bottleneck. I don't believe you'd see a speed difference with a simple replace of FOO with Foobar with RegEx vs Replace(), but for more complex patterns (uBB code like {b}asdf{/b}), it's really the way to go.

    As far as complex patterns that a regex cannot handle (scripting, tokenizing within regular text), you'll just have to roll your own character-by-character parser if you find Replace to be too slow.



  • @Alex Papadimoulis said:

    In my experience, Replace() has
    rarely been a bottleneck. I don't believe you'd see a speed difference
    with a simple replace of FOO with Foobar with RegEx vs Replace(), but
    for more complex patterns (uBB code like {b}asdf{/b}), it's really the
    way to go.

    As far as complex patterns that a regex cannot handle (scripting, tokenizing within regular text), you'll just have to roll your own character-by-character parser if you find Replace to be too slow.



    Thanx, Alex!  And what about for something like this week's WTF, where the programmer was using multiple Replace() calls to expand @@...@@ macros in templates?  This is pretty close to what I had to work with before, and I am curious as to what alternative solutions people in this forum can offer, since they seemed so intent in critizising it as a completely braindead aproach.

        dZ.


  • @DZ-Jay said:

    And what about for something like this week's WTF, where the programmer was using multiple Replace() calls to expand @@...@@ macros in templates?  This is pretty close to what I had to work with before, and I am curious as to what alternative solutions people in this forum can offer, since they seemed so intent in critizising it as a completely braindead aproach.

    I have handled it in the past with a char-by-char parser:

     For Pointer = 1 to String.Length
        If String[Pointer] = "@" Then
            Determine if next few characters represent an expression (@@ ... @@)
            If so, add to NewString the appropriate text and advance pointer past expression
        Else
           NewString = NewString + String[Pointer]
         End If
      Next

    This of course assumes strings are mutable ... otherwise you'd use a stringbuffer of sorts.

    I don't know RegExs very well, maybe it could be done with a regex replace? I would think each replace creates a new string, so multiple replaces would be lots of space (in other words, as good as doing a Replace()).

    Find - ((@@ORDERNUM@@)|(@@ORDERTOTAL@@)|(@@CUSTNUM@@))
    Replace $1 with OrderNum, $2 with OrderTotal, ...

     



  • Alex,

        Thank you once again for your comments.  The
    parser/tokenizing system you describe is what I normally use in many
    other languages to process text; but currently I am forced to work on
    an ASP website in VBScript (which, in my personal opinion, is the most
    braindead language in existence!), and so have found myself with plenty
    of inherent limitations.  And recently discovering the WTFy way
    that VBScript handles string concatenation, I am at a loss on how to
    optimize string manipulation with the limited tools available in the
    language.



        I am not a programming n00b, I'm not even a beginner
    in VB -- I just never had the misfortune of having been exposed to it
    for long enough periods of times to force me to learn its most deepest
    secrets and flaws.



        The only thing I can think of to implement a parser
    in VBScript is to explode the string as an array of characters/words,
    or use the Mid() function to traverse the string character by
    character; then use an array of strings as a buffer to store all the
    tokens (since I cannot trust the strings to be mutable); and finally
    calling Join() to generate the output string.



        Would you say this is the way to go?  Or will I
    also discover that Mid() or For Loops have some secret flaw in VBScript
    and that I should use a complete WTFable unintuitive workaround to
    traverse a string?  I am certain that one single pass through
    the original (possibly long) string will be way more efficient than
    multiple passes from each Replace() function.  What I am not
    confident about is VBScript's available methods.



        Thanx!

        dZ.




  • I think VBScript is good at what it was designed for - a scripting language designed as a subset of VB. but at the same time, VBS was never designed for doing what you want it to do.

    Have you thought of a different approach? I can infer a few things ...

    1) The project must not be that big if a constraint is VBScript only
    2) Due to #1, traffic cannot be expected to be that great
    3) Given #1 and #2, a bunch of Replace() should work just fine

    If #1 and #2 are not true, then I think you'll have lots of other problems (other than this) if VBS is the platform -- you'll have no choice but to go to COM.

    The biggest problem with trying to hack it in VBS is complexity. It's very complex, therefore more prone to errors now and after future maintenance. That, and if they do ever go to COM, you'll probably having poriting issues (unlike just a copy/paste of the VBS to VB6 code).

    Have you tried Replace()? I think you'll either need incredibly big texts (100+k) or lots of simultaneous hits to bring down performance. Just remember the rule about computers ... they're offensively fast (I forgot this too sometimes). Just so long as they have the RAM to do things, you'll be fine ... it's the page swapping that kills performance.



  • @Alex Papadimoulis said:

    I think VBScript is good at what it
    was designed for - a scripting language designed as a subset of VB. but
    at the same time, VBS was never designed for doing what you want it to
    do.


    Well, I think VB is braindead also, but that's another discussion :)  Seriously, I have no inherent hatred for VBScript; as you say it is good at being a subset of VB.  I just cringe at the thought of it being used as the end-all-be-all platform for the development of complex dynamic web sites by some.  Unless it is limited to plain presentation handling.

    @Alex Papadimoulis said:

    Have you thought of a different approach? I can infer a few things ...

    1) The project must not be that big if a constraint is VBScript only
    2) Due to #1, traffic cannot be expected to be that great
    3) Given #1 and #2, a bunch of Replace() should work just fine



    1. False.  The code handles multiple e-commerce sites.  It uses COM to handle *some* of the business logic, but it was written by one of those highly-priced "professional consultants", and so it had so many deficiencies and flaws that a second highly-priced "professional consulant" hired decided to fix and work around them in the front end code.  I was hired by the company who owns the website, originally to re-design the entire codebase, but with so much committee wrangling and management inertia, I have been relegated to the perpetual maintainer of this aweful hybrid monster.

    2. False. Its a seasonal merchant, and the code is used for 6 websites, which get a large amount of traffic during its peak season.

    3. A bunch of Replace() are already been used, along with a bunch of RegExps.

    @Alex Papadimoulis said:


    If #1 and #2 are not true, then I think you'll have lots of other problems (other than this) if VBS is the platform -- you'll have no choice but to go to COM.

    The biggest problem with trying to hack it in VBS is complexity. It's very complex, therefore more prone to errors now and after future maintenance. That, and if they do ever go to COM, you'll probably having poriting issues (unlike just a copy/paste of the VBS to VB6 code).


    VBS complex? Well, I guess this is a matter of opinion.  I agree that it is prone to errors, but I believe its more due to the way it encourages bad coding and lack of discipline, it is limiting in the intrinsic tools available to do simple or common tasks in efficient ways, it is too constricting, often without offering a positive trade-off for it, and overall lowers the barriers to entry by discouraging responsible coding practices (i.e. any two-bit idiot can put a few keywords together and call himself a "Web Developer" and convince management that he knows what he is doing.)
    </rant> :)

    Let me add that I agree with your thoughts that I have bigger problems than just my choice of string replacement tactics, and that the entire design is horrendous and should be replaced.  This monster is a COM/COM+/.Net/VBScript hybrid that seems to do nothing in any particularly good way (at some point in the past, it was using both a MySQL database *and* an Access database to store temporary session information.  Each one used by the code pieces written by the different consultants.)  However, a re-design of the system is not forthcoming for at least another year, as it takes not only convincing management to accept its errors on its choice of consultants, but to actually allocate time and resources for it, which it has not been keen to do so far. (Its easy to show them how braindead the current system is, and how much it needs changing, but as long as it sorta-kinda works, they feel compelled to keep it -- and add features to it!)  And so, in lieu of this, I have been trying to improve on the worse parts of the system as possible.  I have no access to Visual Studio or any other development framework of the sort (all the COMs were written by external consultants), so short of re-writing the whole thing right now, I am stuck only able to refactor the VBScript code, which -- by the time I acquire the code -- has grown to comprise a significant part of the business logic.  Hence my attempt at improving the string handling of the script.

    @Alex Papadimoulis said:

    Have you tried Replace()? I think you'll either need incredibly big texts (100+k) or lots of simultaneous hits to bring down performance. Just remember the rule about computers ... they're offensively fast (I forgot this too sometimes). Just so long as they have the RAM to do things, you'll be fine ... it's the page swapping that kills performance.



    Although I admit that most of the texts that need transforming are only within the 1k to 5k size range, I believe there are enough hits to the server to bring performance down -- particularly when the server is handling 6 websites all running the same code, and all handling roughly the same amount of hits.  Like I said -- since my hands are tied at the moment as to which parts are available to me for optimization -- if in fact the multiple passes on a ~3k text file done by repetitive Replace() calls in VBScript are impacting performance, a better string transformation routine might help.  But if you think it is not worth it, I'll concede.

    In any case, know for sure that I am aware of the necessity to re-design the system, that management has finally been convinced of this, and that work on this will start next year (hopefully!).  In the meantime, we have the coming peak season hurrying up on us, and I tremble at the thought of another few months of constant webserver crashings, almost daily system reboots, and customer complaints of random performance problems. :(

        Thank you, Alex, for all your suggestions and insightful comments, and I am sorry if some of mine amount to no more than rants or bursts of ignorance. :)

        dZ.




  • @DZ-Jay said:


    Well, I think VB is braindead also, but that's another discussion :)

    One of the greatest things that VB did was offer programmers the ability to easily create business programs and components quickly without all the overhead of memory mangement, MFC, Win32, etc.

    One of the worst things that VB did was offer programmers the ability to easily create business programs and components quickly.

    Note that I don't consider VB.NET to be in the "VB Family" -- I see no difference between J#.NET, C#.NET, COBOL.NET, etc; no matter what you're in you can make a mess of things if you don't know the framework.

    @DZ-Jay said:


    Seriously, I have no inherent hatred for VBScript; as you say it is good at being a subset of VB.  I just cringe at the thought of it being used as the end-all-be-all platform for the development of complex dynamic web sites by some.  Unless it is limited to plain presentation handling.

    Totally. That's all it was ever really meant for; COM was supposed to do the complex stuff. 

    @DZ-Jay said:

    the entire design is horrendous and should be replaced.  This monster is a COM/COM+/.Net/VBScript hybrid that seems to do nothing in any particularly good way (at some point in the past, it was using both a MySQL database *and* an Access database to store temporary session information.  Each one used by the code pieces written by the different consultants.)

    Ah yes, so it definitely sounds like you have more problems than string replacement :-).

    @DZ-Jay said:


    Its easy to show them how braindead the current system is, and how much it needs changing, but as long as it sorta-kinda works, they feel compelled to keep it -- and add features to it!

    The more and more places I work at, this seems to be the case. Talking about VBScript, the place that I'm at right now has written one of the most complex modules (the thing that handles the stack of Closing Documents for a mortgage) in VBScript. Client-side VB Script that uses XMLHTTP to query the database and builds forms based on it. When the page loads, if you watch IE's memory usage, it shoots up to 250+ megs. Their solution to fix this so far has been to buy clients faster computers.

    @DZ-Jay said:


    I have no access to Visual Studio or any other development framework of the sort

    Ouch. Been there, so I know how you feel. One thing to consider as that we've officially entered an employee's market ... so there are definitely other, less absurd places out there.

    @DZ-Jay said:


    Although I admit that most of the texts that need transforming are only within the 1k to 5k size range, I believe there are enough hits to the server to bring performance down -- particularly when the server is handling 6 websites all running the same code, and all handling roughly the same amount of hits. 

    Understandable ... so we're back to the original problem. Let's see, if I absolutely had to do this in VBScript ... given these assumptions: (1) Templates are infrequently updated, (2) On average, templates have more than 3 tags needing replacement, (3) Replacement is page-view specific (order number, total, etc) ... I'd say cache the templates and the indexes to the tags.

    One way you could do it is with a Scripting Dictionary at the application scope ... but now that I think about it, that probably won't work because SD is probably apartment threaded. I'd say go with a FreeThreadedXMLDOM to store your templates and indexes. From there, you can easily retreive them and do your replacements much quicker.

    @DZ-Jay said:


    Thank you, Alex, for all your suggestions and insightful comments, and I am sorry if some of mine amount to no more than rants or bursts of ignorance. :)

    Happy to help!



  • @Alex Papadimoulis said:

    @DZ-Jay said:


    Although I
    admit that most of the texts that need transforming are only within the
    1k to 5k size range, I believe there are enough hits to the server to
    bring performance down -- particularly when the server is handling 6
    websites all running the same code, and all handling roughly the same
    amount of hits. 

    Understandable ... so we're back to the original problem. Let's see, if I absolutely had to do this in VBScript ... given these assumptions: (1) Templates are infrequently updated, (2) On average, templates have more than 3 tags needing replacement, (3) Replacement is page-view specific (order number, total, etc) ... I'd say cache the templates and the indexes to the tags.

    One way you could do it is with a Scripting Dictionary at the application scope ... but now that I think about it, that probably won't work because SD is probably apartment threaded. I'd say go with a FreeThreadedXMLDOM to store your templates and indexes. From there, you can easily retreive them and do your replacements much quicker.


    Yes, the SD is apartment threaded (I learned this the hard way, since using it in application scope was another bonehead idea of one of the consultants. -- "But I don't understand what happened! It worked fine on my computer. Why, yes, I was testing it by myself, why do you ask?" :)  However, I have installed a third-party replacement which does not have threading issues.  So loading up commonly used things in application scope using a dictionary is an option.

    But let me explain the nature of the templates.  In the e-commerce site, they are snippets of text containing product descriptions.  Each one is saved in a separate file.  So,
    (1) True - templates are infrequently updated
    (2) True - on average, templates have about 5 tags that need replacement
    (3) False - replacement is not page-view specific.  The replacement is just simple macro expansion of company policies or additional info related to a particular product.  In any case, they are self contained (i.e. the macros contain all the information needed for their expansion.)  Here's an example:

    @@ITEM_LINK:1234@@

    Will expand to an <A HREF> hyperlink linking to the product page of item #1234.

    The one potential problem I see of pre-loading the templates into memory -- if that was what you were suggesting -- is that each site has on average 5k product files, of an average 4kb in size; and there are 6 sites, all running on the same server. (Forgetting for a moment all the possible memory leaks and the obscene memory/CPU consumption of the entire codebase monster, this is still pretty big, no?)

    But I think we are getting out of the scope of our original discussion.  I didn't set out to try to solve the site's problems.  I was just curious to see if anybody had an alternative to multiple Replace() calls, since all were so quick to say how "wrong" it was in previous WTF posts.  I appreciate that you are taking the time to discuss this with me, and it has been very interesting, but I really do not think it would be wise to add another layer of complexity to the code, if the performance benefit is not going to be substantial.  (I fear finding my own code posted out of context in this site when I leave this place He he he :)

        Thanks,
        dZ.



Log in to reply
 

Looks like your connection to What the Daily WTF? was lost, please wait while we try to reconnect.