Reinventing pretty much everything

alias

Found my first WTF in a system wrote in C++ that we have:

#define BOOL int
#define TRUE 1
#define FALSE 0

... ... ...

BOOL stringstartswith(const char* str1, const char* str2)
{
    if (strlen(str2) > strlen(str1))
    {
        return FALSE;
    }

    BOOL equal = TRUE;

    while (*str2 && equal)
    {
        if (*str1 != *str2)
        {
            equal = FALSE;
        }

        str1++;
        str2++;
    }

    return equal;
}

Nice to see that they have reinvented the boolean and didn't realise that they could use strstr("xxyyyxx", "xx") == 0

ammoQ

@alias said:

Nice to see that they have reinvented the boolean and didn't realise that they could use strstr("xxyyyxx", "xx") == 0

The BOOL thingy is obviously a useless WTF.

If speed matters, they should get rid of those expensive calls to strlen; otherwise, in terms of speed, it's not too bad.

strstr(x,y)==0 is not a good idea if speed matters.

qbolec

bool beginswith( char * a , char * b)
{
	while( *a==*b && *a++)b++;	
	return !*b;
}

Michael_J

#include <cstring>

bool stringstartswith(const char* str, const char* startswith)

{

return 0 == std::strncmp(str, startswith, strlen(startswith));

}

If you
have a decent compiler/Run Time library, built-in functions are usually
well optimised and probably faster than anything you can write.

Michael J

Ixpah

A lot depends on how long the startswith string is. The break even point for the strncmp/strlen version is probably quite a long string in reality. So it reallys depends on how the application uses the function.

VGR

The definition of BOOL is just a holdover from C, where it's quite
common to do that. Even the ancient and venerable Xlib.h does it.

I am inclined to think it's just cruft --- very old code that no one
ever bothered to refactor out of whatever other code is calling
it. I'm pretty sure strstr and strncmp were later additions to
string.h, so at one time it might have made some sense.

I know of a project that's been using Java 1.4 for years but still uses
a third-party regex package, because they just didn't feel like
updating the code that uses regex.

ammoQ

@asuffield said:

@qbolec said:
bool beginswith( char * a , char * b)

{

while( *a==*b && *a++)b++;	

return !*b;

}
bool beginswith(const char *str, const char *suffix)
{
  size_t suffix_len = strlen(suffix);
  return strncmp(str, suffix, suffix_len) == 0;
}
It's not just easier to read, it's probably faster (most compilers have optimised versions of the string functions - the naive while loop is actually quite slow on modern processors, up to a factor of 10 on early athlons).

It's easy to create a case where this version is slower (much slower), no matter how well optimized the runtime libs are - choose "a" for str and "bbb(100000 times)..b" for suffix. strlen(suffix) will eat any advantage from the libs. (Though any real-world usage of this function will most likely use a short string for suffix)

eimaj2nz

@Michael J said:

If you
have a decent compiler/Run Time library, built-in functions are usually
well optimised and probably faster than anything you can write.

I agree that it's best not to re-invent the wheel, and that built-in functions are usually well-optimzed.

However, it is entirely possible to write a function that's as fast as, if not faster than, a library function. For example, qbolec's function is about as optimal as you can get. I doubt it's even possible to get the same result in fewer processor cycles.

eimaj2nz

Ok, the forum won't let me edit my post. It's telling me I hit the edit time limit, after only a few minutes. WTF?

Anyway, here's what I was going to change to:

@Michael J said:

If you
have a decent compiler/Run Time library, built-in functions are usually
well optimised and probably faster than anything you can write.

I agree that it's best not to re-invent the wheel, and that built-in functions are usually well-optimzed.

However,
it is entirely possible to write a function that's as fast as, if not
faster than, a library function. For example, qbolec's function is
about as optimal as you can get.

@asuffield said:

It's not just easier to read, it's probably faster (most compilers have
optimised versions of the string functions - the naive while loop is
actually quite slow on modern processors, up to a factor of 10 on early
athlons).

This is possible, I guess, if the code wasn't pipelined. However, most
modern compilers will take care of this for you. They'll unroll the
loop a little in order to try and get more instructions executing at
once.

However, this shouldn't change the C++ code that's used to define the
function. If there's any better way to do this than a simple loop, I
can't think of it.

@eimaj2nz said:

This is possible, I guess, if the code wasn't pipelined. However, most
modern compilers will take care of this for you. They'll unroll the
loop a little in order to try and get more instructions executing at
once.

However, this shouldn't change the C++ code that's used to define the
function. If there's any better way to do this than a simple loop, I
can't think of it.

In VS2005, at least, strstr (and a lot of string.h) appears to be written in assembler, which would allow for a bit more tweaking over what the compiler would easily allow. I don't know if this is the case for the C++ string functions. Maybe he needed a slower version?

Michael_J

@eimaj2nz said:

(... beginning snipped by MJ)

@Michael J said:


If you
have a decent compiler/Run Time library, built-in functions are usually
well optimised and probably faster than anything you can write. 



I agree that it's best not to re-invent the wheel, and that built-in functions are usually well-optimzed.

However,
it is entirely possible to write a function that's as fast as, if not
faster than, a library function. For example, qbolec's function is
about as optimal as you can get.

(... rest snipped by MJ)

My assembler programming is a bit
rusty, so I can't say for sure. I've seen very fast
implementations for std::memcpy() that utilised special op-codes for fast copying of data. This was much faster than
a naïve loop implementation. Knowing the specialised op-codes and
the way the optimiser works can yield some amazing speed improvements.

Whether this can be done for std::strncmp() or not, I don't know.
In general, I prefer to use system functions than to roll my own as
they've usually been written by smart folks who know the platform
intimately - but that's just my prejudice.

As with any advice I give, this is probably worth every penny you paid for it.

Michael J

ammoQ

@TheDauthi said:

In VS2005, at least, strstr (and a lot of string.h) appears to be written in assembler, which would allow for a bit more tweaking over what the compiler would easily allow. I don't know if this is the case for the C++ string functions. Maybe he needed a slower version?

Even if the assembler version strstr is much faster that a hand-written version, you should not use it to implement startsWith(). It's stupid to search the whole (probably large) string for the prefix, just to compare if it is found at the beginning.

@ammoQ said:

@TheDauthi said:

In VS2005, at least, strstr (and a lot of string.h) appears to be written in assembler, which would allow for a bit more tweaking over what the compiler would easily allow. I don't know if this is the case for the C++ string functions. Maybe he needed a slower version?

Even if the assembler version strstr is much faster that a hand-written version, you should not use it to implement startsWith(). It's stupid to search the whole (probably large) string for the prefix, just to compare if it is found at the beginning.

Ok, agreed, my mistake. I was thinking in terms of the earlier "However, this shouldn't change the C++ code that's used to define the function," not in terms of solving the problem above.

Missing-the-forest-for-the-trees syndrome.

Offhand, since everyone's having a go, I'd probably try the following

bool startsWith(const char* a, const char* b)
{
        //which is probably buggy to embarass myself
	int i = strspn(a, b);//it is strspn and not strsnp, right?
	return (b[i] == 0) && (i > 0);
}

and when that didn't work, write the original WTF =)

Edit: Immediately realized that it's not strspn that I was thinking of. Ah, well. I was pretty confident that there was a standard function that did almost exactly this...

byte_lancer

@alias said:

Found my first WTF in a system wrote in C++ that we have:

#define BOOL int
#define TRUE 1
#define FALSE 0

... ... ...

Maintenance code I presume.
My good fer nuthing engineering school apparently had Turbo C++ 1988 build (sic) installed on all machines and students had to write the same code to make the compiler 'recognize' bool values.
Needless to say, some of the students found internships to be pretty exhausting when they had to relearn most of the concepts.
Some of the students even went to the extent of asking Turbo C++ to be installed on their workstations at the companies instead of using gcc or VC++. Arrh!
Come to think of it, some of the chaps were just not made for engineering.