GetFileName using the worst possible method



  • On a new job "Cleaning up" some code and getting familiar with things... This is in C# .NET, but the WTF is good in any language.  The cycles wasted here are criminal.

    First our guy wants the filename from a fully qualified path.  So creates a regex in order to split the string on a backslashes...
    regex = new Regex (@"\\");
    string[] parts = regex.Split(xFile.PostedFile.FileName.ToString());

    next he does a for each (without curlies{} for shame) assigning the current element to our variable... Untill we fall out.  You heard me, we simply fall out of the for statement. and the last element is the filename.  This guy does the same thing for extensions he just splits on the dot.  argh! not even a parts[parts.Length - 1]

    foreach (string part in parts)
    strFileReviewName=part.ToString();

    in .NET for those who may not use it the following is all thats needed
    sfName = Path.GetFileName(xFile.PostedFile.FileName)
    sExt = Path.GetExtension(xFile.PostedFile.FileName)

    Thanks for sharing my pain,
    Joe Johnston



  • Don't forget the unneeded ToString() on a string (which just does what you expect: return this; )


  • Considered Harmful

    So he does in O(n) time what could be done in... O(n) time.  And it takes a few milliseconds longer.  The most deeply nested path I could imagine would be about 64 levels deep, wth the majority being much more shallow.  Yes, it would have been clearer and more concise to use the built-in functions, but as for performance, I imagine it's not bad enough to bother changing.  He could have just as easily changed the regular expression to match the last segment rather than using a foreach; but any of these methods yield the same result in roughly the same amount of time.

    I'd still change it, but just to improve readability.



  • I'm pretty sure that all recent releases of Windows allow a forward slash as a path separator. Oops.
    Try running this: "c:/windows\system32/cmd.exe" and then consider what the regex makes of it.
    Java does the same at a API level, and C# most likely also (since it's somewhat cross platform compatible with Mono etc.)

     



  • @joe.edwards@imaginuity.com said:

    So he does in O(n) time what could be done in... O(n) time.  And it takes a few milliseconds longer.  The most deeply nested path I could imagine would be about 64 levels deep, wth the majority being much more shallow.  Yes, it would have been clearer and more concise to use the built-in functions, but as for performance, I imagine it's not bad enough to bother changing.  He could have just as easily changed the regular expression to match the last segment rather than using a foreach; but any of these methods yield the same result in roughly the same amount of time.

    I'd still change it, but just to improve readability.

    I see your point, but if this is performance code, then the OP is perfectly correct imho. I'm a C++ coder, and I'd do something like this:

    // I have no idea if this would work in C# =) 

    const char *GetFilename(const char *pPath)
    {
        const char *pRetVal=pPath+strlen(pPath);
        while(pRetVal>=pPath)
        {
            if (*pRetVal=='\\' || *pRetVal=='/')
            {
                break;
            }
            pRetVal--;
        }
        return pRetVal+1;
    }

    Which will be considerably faster because it doesn't:

    Allocate a new RegEx object

    Parse a string to a  RegEx data structure (though the compiler might do this, I guess)

    Allocate a dynamic array to store intermediate data

    Fill the dynamic array with more allocated objects to store said data

    Like you say, this might not add up to very much, but if the function gets called a few thousand times per second it could certainly start having an impact.

    Fake Edit: Ouch, Firefox 2.0.0.4 doesn't seem to like the code tags in this editor much o_O
     



  • Wait a minute, did you say milliseconds? O_O


  • Considered Harmful

    @Devi said:

    @joe.edwards@imaginuity.com said:

    So he does in O(n) time what could be done in... O(n) time.  And it takes a few milliseconds longer.  The most deeply nested path I could imagine would be about 64 levels deep, wth the majority being much more shallow.  Yes, it would have been clearer and more concise to use the built-in functions, but as for performance, I imagine it's not bad enough to bother changing.  He could have just as easily changed the regular expression to match the last segment rather than using a foreach; but any of these methods yield the same result in roughly the same amount of time.

    I'd still change it, but just to improve readability.

    I see your point, but if this is performance code, then the OP is perfectly correct imho. I'm a C++ coder, and I'd do something like this:

    // I have no idea if this would work in C# =) 

    const char *GetFilename(const char *pPath)
    {
        const char *pRetVal=pPath+strlen(pPath);
        while(pRetVal>=pPath)
        {
            if (*pRetVal=='\\' || *pRetVal=='/')
            {
                break;
            }
            pRetVal--;
        }
        return pRetVal+1;
    }

    Which will be considerably faster because it doesn't:

    Allocate a new RegEx object

    Parse a string to a  RegEx data structure (though the compiler might do this, I guess)

    Allocate a dynamic array to store intermediate data

    Fill the dynamic array with more allocated objects to store said data

    Like you say, this might not add up to very much, but if the function gets called a few thousand times per second it could certainly start having an impact.

    Fake Edit: Ouch, Firefox 2.0.0.4 doesn't seem to like the code tags in this editor much o_O
     

    I'll agree with you, if it were being called that frequently.  It seems to be ASP.NET code referencing an uploaded file, and, unless it's a huge batch upload, that would be a one-off deal.

    Oh, and, Microsoft junkie that I (unfortunately) am, I failed to consider forward slashes as well.


  • Considered Harmful

    Here's the code as compiled in .NET 2.0 (Reflector disassembly):

    internal static void CheckInvalidPathChars( string path ) {
        for ( int i = 0; i < path.Length; i++ ) {
            int num2 = path[i];
            if (((num2 == 0x22) || (num2 == 60)) || (((num2 == 0x3e) || (num2 == 0x7c)) || (num2 < 0x20))) {
                throw new ArgumentException( Environment.GetResourceString( "Argument_InvalidPathChars" ) );
            }
        }
    }
    

    public static string GetFileName( string path ) {
    if( path != null ) {
    CheckInvalidPathChars( path );
    int length = path.Length;
    int num2 = length;
    while( --num2 >= 0 ) {
    char ch = path[ num2 ];
    if ( ( ( ch == DirectorySeparatorChar ) || ( ch == AltDirectorySeparatorChar ) ) || ( ch == VolumeSeparatorChar ) ) {
    return path.Substring( num2 + 1, ( length - num2 ) - 1 );
    }
    }
    }
    return path;
    }

    It looks like the built-in method may even be less efficient, since it scans the whole string for one of 5 forbidden characters before going back to find the last of three separator characters.



  • @Nandurius said:

    I'm pretty sure that all recent releases of Windows allow a forward slash as a path separator. Oops.
    Try running this: "c:/windows\system32/cmd.exe" and then consider what the regex makes of it.
    Actually, DOS (verified on v6.22) and Windows support both \ and / as path separator, although Windows NT only supports \ if you use \?\ syntax to access paths longer than MAX_PATH.



  • @Devi said:

    I'm a C++ coder, and I'd do something like this:

    // I have no idea if this would work in C# =) 

    const char *GetFilename(const char *pPath)
    {
    [snip]

    No, you're not. You're a C coder who happens to use a C++ compiler. 



  • The fact that this code in particular is not "that worse" in terms of performance compared to the good way to find the filename is not the point. If the guy who did atrocity could do that to something as simple as getting the name of a file, imagine what he'd do in more complex scenarios, like searching for a file in a disc. I absolutely don't want to see this guy parsing XML...


Log in to reply