The $120/h consultant



  • A while ago I was hiring programmers/consultants through recruitment agencies. We worked with several of those, but one of them particularly expensive and famous. They found a good candidate for a position we had open, for both a C(++) programmer and a very application specific skill set. During the interview I focussed mainly on the latter, and I wasn't disappointed though, to be fair, I knew very little on the subject. I didn't dare ask questions about C(++), because this was an expensive consultant, having passed a test by the famous recruiting company easily, and me being new and insecure in the field (it was my very first interview).
    We did ask him for some of his source code, and he proudly sent it. Here are some of his more impressive pieces of code (slightly anonymized):

    struct String_eq {
    	bool operator()(const String *bb1, const String *bb2) const
    	{
    		uint64_t len1 = bb1->getLength();
    		uint64_t len2 = bb2->getLength();
    
    		uint64_t len = len1 > len2 ? len = len2 : len = len1;
    
    		uint8_t *p1 = bb1->c_str();
    		uint8_t *p2 = bb2->c_str();
    		for(uint32_t i = 0; i < (uint32_t)len; i++) {
    			if(p1[i] != p2[i]) {
    				return false;
    			}
    		}
    		return true;
    	}
    };
    

    Where String is a custom class. I'm particularly fond of this line out of that:

    uint64_t len = len1 > len2 ? len = len2 : len = len1;
    

    Or the fact that it doesn't even compare for equality, but only the first min(len1, len2) elements.

    Or:

    	if(outfile == NULL) {
    
    	} else {
    		fflush(outfile);
    		fclose(outfile);
    	}
    

    This code is run several times in the application that doesn't use C++ IO or types because it's too slow, in a performance critical portion of the code, for the entire buffer zise (sic):

    	uint64_t tmp;
    	uint8_t *ptr;
    	[…]
    
    	if(!runData) {
    		for(int i = 0; i < BUFFER_ZISE; i++) {
    			tmp =ptr[i];
    		}
    	}
    

    Where runData is set to true and never changed in the code. I also think that's about the only time he ever uses "int" in the application.

    Also, threads speed things up, right?

    void *initFunction(void *in)
    {
    	[… snip: Read some data from file …]
    
    	initDone = 1;
    }
    
    	[…]
    
    	uint64_t previous_time = get_time_millisec(NULL);
    	uint64_t startInit = previous_time;
    
    	initDone = 0;
    	ret = pthread_create(&thread, NULL, initFunction, NULL);
    
    	uint64_t endInit = get_time_millisec(NULL);
    	uint64_t totalInit;
    	uint64_t delta0 = endInit - startInit;
    	totalInit = delta0;
    	while(initDone == 0) {
    		usleep(100);
    	}
    

    I guess his tests showed that initialization took a lot less time…

    This configuration file reading wasn't used. I guess that may have been because he couldn't find out why it wasn't working:

    void Config::read(const char *filename)
    {
    	FILE *fh = fopen(filename, "r");
    	char *read;
    	char line[1024];
    	while((read = fgets(line, 1024, fh)) != NULL) {
    		char key[64];
    		char middle[8];
    		char value[64];
    
    		int num = sscanf(line,"%s%s\n", key, value);
    		if(num != 2) {
    			break;
    		}
    
    		keys.push_back(key);
    		values.push_back(value);
    	}
    }
    

    It may have been due to the fact he's pushing pointers to the stack, not only overwriting what it points to for every line in the file, but after the function returns it no longer points to anything (valid).

    Sadly to say we never hired the guy...



  • @Evo said:

    void Config::read(const char *filename)
    {
    FILE *fh = fopen(filename, "r");
    char *read;
    char line[1024];
    while((read = fgets(line, 1024, fh)) != NULL) {
    char key[64];
    char middle[8];
    char value[64];

    	int num = sscanf(line,"%s%s\n", key, value);
    	if(num != 2) {
    		break;
    	}
    
    	keys.push_back(key);
    	values.push_back(value);
    }
    

    }


    It may have been due to the fact he's pushing pointers to the stack, not only overwriting what it points to for every line in the file, but after the function returns it no longer points to anything (valid).

    Sadly to say we never hired the guy...

    That's not entirely true... If keys and values are containers of std::string, the copy constructor for std::string will (silently) convert a C-style char * into a std::string instance, taking a copy of the original data in the process. However, since he's rolling his own String classes, who knows....



  • @mdc said:

    @Evo said:
    void Config::read(const char *filename)
    {
    FILE *fh = fopen(filename, "r");
    char *read;
    char line[1024];
    while((read = fgets(line, 1024, fh)) != NULL) {
    char key[64];
    char middle[8];
    char value[64];

    	int num = sscanf(line,"%s%s\n", key, value);
    	if(num != 2) {
    		break;
    	}
    
    	keys.push_back(key);
    	values.push_back(value);
    }
    

    }


    It may have been due to the fact he's pushing pointers to the stack, not only overwriting what it points to for every line in the file, but after the function returns it no longer points to anything (valid).

    Sadly to say we never hired the guy...

    That's not entirely true... If keys and values are containers of std::string, the copy constructor for std::string will (silently) convert a C-style char * into a std::string instance, taking a copy of the original data in the process. However, since he's rolling his own String classes, who knows....

    You're right, I forgot to add the class definition:

    class Config {
    	std::vector<char *>keys;
    	std::vector<char *>values;
    
    	Config()
    	{
    	}
    
    	~Config()
    	{
    	}
    
    public:
    	static Config* getConfig(const char *filename);
    
    private:
    	void read(const char *);
    };
    

    Great use of static functions as well. All it does is allocate the Config object, read the passed filename, and returned the new object.



  • @mdc said:

    If keys and values are containers of std::string,

    If.

    If keys and values are containers of std::string and if the file can always be opened and if no lines are longer than 1022 characters and if no keys or values are longer than 63 characters and if push_back never fails to allocate memory and if all file handles are closed afterwards...



  • From uint8_t, am I led to believe that this is an embedded system (with a file system ;-) ?

    BTW, who seriously uses (s)scanf?



  • @Daniel Beardsmore said:

    From uint8_t, am I led to believe that this is an embedded system (with a file system ;-) ?

    BTW, who seriously uses (s)scanf?

    No, the hardcoded paths ("C:/Users/...", not included in my post) lead me to believe it is to be ran on Windows... However, it processes data received over the internet, so I guess it's fair enough to use uint*_t for those variables. Of course, he simply started using them everywhere, even in for-loop counters or other completely unrelated variables.



  • What is C(++) ???

    You mean you want C++?

    Or you are not sure whether you want C or C++?

    There is a lot wrong with this person's code, but you get what you ask for. Incidentally the file one has the immediate noticeable bug that he fails to close the file handle. The vector push_back is a bug if it is a vector of pointers not of strings.

     

     



  • No, it tells me he is a C programmer who tries a bit of the ++, but that's what they asked for.

    uint8_t is a standard C type to mean an unsigned value of (at least, and on a 8-bit-per-byte system exactly) 8 bits.

     


  • Discourse touched me in a no-no place

    @Cbuttius said:

    uint8_t is a standard C type to mean an unsigned value of (at least, and on a 8-bit-per-byte system exactly) 8 bits.
    No. You appear to be confusing it with uint_least8_t. If the system cannot support a type that is exactly 8 bits (without padding,) then uint8_t doesn't exist on that system. (C99 7.8.1.1 - Exact-width integer types.)



  • @Evo said:

    No, the hardcoded paths ("C:/Users/...", not included in my post) lead me to believe it is to be ran on Windows... However, it processes data received over the internet, so I guess it's fair enough to use uint*_t for those variables. Of course, he simply started using them everywhere, even in for-loop counters or other completely unrelated variables.

    Oops, my C is a bit rusty. Still, he's built a new string system that's not Unicode (which is just insanely wrong), and he's using uint8_t for string pointers? I'd have been a lot less confused if he at least typedef'd a readable type for what he was doing ... My favourite though is (s)scanf, unless anyone's since released a version that doesn't go bananas when it receives bad data. (Caveat: I haven't touched C in a loooong time.)


  • Discourse touched me in a no-no place

    @Daniel Beardsmore said:

    My favourite though is (s)scanf, unless anyone's since released a version that doesn't go bananas when it receives bad data. (Caveat: I haven't touched C in a loooong time.)
    sscanf() tends to be pretty sane; scanf() is the one with issues.



  • @Cbuttius said:

    What is C(++) ???

    You mean you want C++?

    Or you are not sure whether you want C or C++?

    There is a lot wrong with this person's code, but you get what you ask for. Incidentally the file one has the immediate noticeable bug that he fails to close the file handle. The vector push_back is a bug if it is a vector of pointers not of strings.

     

     

    We asked for a C or C++ programmer. Not C with a bit of ++ ;-).

    And I added the class definition in a later post: they were indeed vectors of char*. I'll edit the original post.

    EDIT: Or not, I can't edit the original post... Ahh, well. See the third post here.



  • Incidentally, it is good practice to flush your output handle before you close it, rather than relying on the close to do the flush.

    This really applies though only if you check the result of the call and report an error.  You should never throw C++ exceptions from a destructor and the same would go for an fclose() function. Therefore it is ideal that you flush the buffer and check the write has succeeded in completion before you close / destroy your objects.

    The code as actually written is WTF'y style in that he puts an empty if with all the code in the else rather than negating the condition to put the logic in the if block.

     



  • Do you want the project developed in C++, as in with the classes, exceptions, the standard library, boost libraries etc.

    or in C?

    Why do you think one will be good at the other?

    Unless parts of your project are developed in C++ and parts are developed in C and you are seeking multiple positions.

     



  • @Evo said:

    Great use of static functions as well. All it does is allocate the Config object, read the passed filename, and returned the new object.

    There is a school of thought that constructors shouldn't do expensive operations like read files.



  • incidentally your "dreaded and" would almost have certainly ruled me out as a possible candidate for the job (even if you could hire someone from the UK)

     



  • @Evo said:

    struct String_eq {
    bool operator()(const String *bb1, const String *bb2) const
    {
    uint64_t len1 = bb1->getLength();
    uint64_t len2 = bb2->getLength();

    	uint64_t len = len1 &gt; len2 ? len = len2 : len = len1;
    
    	uint8_t *p1 = bb1-&gt;c_str();
    	uint8_t *p2 = bb2-&gt;c_str();
    	for(uint32_t i = 0; i &lt; (uint32_t)len; i++) {
    		if(p1[i] != p2[i]) {
    			return false;
    		}
    	}
    	return true;
    }
    

    };

    Or the fact that it doesn't even compare for equality, but only the first min(len1, len2) elements.


    Comparing two strings past the length of the shortest one would be a form of "buffer overflow" error. What's shown is a typical K&R style ASCII string compare function shoe-horned into C++ class. But the fuck-ups come from not kicking out as false if the lengths are not equal, and casting down from uint64_t to uint32_t for the string lengths.



  • @Cbuttius said:

    Do you want the project developed in C++, as in with the classes, exceptions, the standard library, boost libraries etc.

    or in C?

    Why do you think one will be good at the other?

    Unless parts of your project are developed in C++ and parts are developed in C and you are seeking multiple positions.

     

    Either C or C++ for the project would have been fine. It would've been a one-man part of the project. The other requirements were harder to find, and were more important. Had this guy just applied for only those requirements, he may have been acceptable (though I'm not experienced enough in the field to tell for sure). He didn't, though, he claimed to be a good programmer as well...

    To be fair, it would probably have been better to get the two people, one as a coder and one as a consultant about the other requirements. My managers wouldn't have any of that though. The project was disbanded by those same managers shortly after this incident anyway, and we never got started.

    Those managers are a full WTF story in itself, with the standard manager who thinks they know better than the CTO and overruling all decisions, only to claim they did no such thing time and time again...

    @Cbuttius said:

    incidentally your "dreaded and" would almost have certainly ruled me out as a possible candidate for the job (even if you could hire someone from the UK)

     

    What do you mean by "dreaded and"?


  • @OzPeter said:

    @Evo said:

    struct String_eq {
    	bool operator()(const String *bb1, const String *bb2) const
    	{
    		uint64_t len1 = bb1->getLength();
    		uint64_t len2 = bb2->getLength();
    
    	uint64_t len = len1 &gt; len2 ? len = len2 : len = len1;
    
    	uint8_t *p1 = bb1-&gt;c_str();
    	uint8_t *p2 = bb2-&gt;c_str();
    	for(uint32_t i = 0; i &lt; (uint32_t)len; i++) {
    		if(p1[i] != p2[i]) {
    			return false;
    		}
    	}
    	return true;
    }
    

    };

    Or the fact that it doesn't even compare for equality, but only the first min(len1, len2) elements.

    Comparing two strings past the length of the shortest one would be a form of "buffer overflow" error. What's shown is a typical K&R style ASCII string compare function shoe-horned into C++ class. But the fuck-ups come from not kicking out as false if the lengths are not equal, and casting down from uint64_t to uint32_t for the string lengths.

    Even more of a WTF is using a uint64_t in the first place. I mean, if you're using a naked int (or unsigned int if your prefer), and on your system that happens to be 64 bits, fine. But if you explicitly need to declare 64 bits to store the length of a string, something has gone horribly wrong. uint32_t wraps at 2 GB. Do you really need a 2 GB string? Like, you honestly expect to be working with strings that big?

    Because if you are, a flat string is probably the wrong data structure to deal with it.



  • @Evo said:

    What do you mean by "dreaded and"?

    As you just explained it. Being a strong C++ developer wasn't enough, they had to have this "other requirement" too.

     



  • @OzPeter said:


    Comparing two strings past the length of the shortest one would be a form of  "buffer overflow" error.  What's shown is a typical K&R style ASCII string compare function shoe-horned into C++ class.  But the fuck-ups come from not kicking out as false if the lengths are not equal, and casting down from uint64_t to uint32_t for the string lengths.

    The WTF is rolling out your own string comparison function when even if you want to use C, the library comes with two perfectly good methods, strcmp and memcmp.

    Writing your own string class is usually a WTF too (although having said that I have written my own).

     

     

     



  • @Cbuttius said:

    The WTF is rolling out your own string comparison function when even if you want to use C, the library comes with two perfectly good methods, strcmp and memcmp.

    The WTF is anyone in the 21st century still using 8-bit encodings. And no, strcmp doesn't work on Unicode: it won't handle precomposed vs decomposed characters for starters, but it gets a lot worse, especially if you want case-insensitive comparison. There is a reason that every OS provides these services for you!



  • @Daniel Beardsmore said:

    @Cbuttius said:

    The WTF is rolling out your own string comparison function when even if you want to use C, the library comes with two perfectly good methods, strcmp and memcmp.

    The WTF is anyone in the 21st century still using 8-bit encodings. And no, strcmp doesn't work on Unicode: it won't handle precomposed vs decomposed characters for starters, but it gets a lot worse, especially if you want case-insensitive comparison. There is a reason that every OS provides these services for you!

    *cough*UTF-8*cough*



  • @pkmnfrk said:

    coughUTF-8cough

    UTF-8 to C, is like seeds that pass through your intestines unharmed.



  • @curtmack said:

    uint32_t wraps at 2 GB. Do you really need a 2 GB string?


    4 gigs. Signed 32-bit ints wrap at 2 gigs. Which makes your point stronger.



  • @pkmnfrk said:

    coughUTF-8cough
    That still doesn't help you when you're comparing decomposed and precomposed characters: č and č should compare as the same string, even though their binary representation is different.



  • @Daniel Beardsmore said:

    @Cbuttius said:

    The WTF is rolling out your own string comparison function when even if you want to use C, the library comes with two perfectly good methods, strcmp and memcmp.

    The WTF is anyone in the 21st century still using 8-bit encodings. And no, strcmp doesn't work on Unicode: it won't handle precomposed vs decomposed characters for starters, but it gets a lot worse, especially if you want case-insensitive comparison. There is a reason that every OS provides these services for you!

    No, it is a WTF writing your code to deal with UNICODE or extended character sets and locales when your code is never going to use anything other than regular ASCII for its strings.

    The libraries are there for those who really do need it, although it's pretty badly supported in C++, even though there is a whole locale section in the standard library.

    In most cases, it is not important where extended characters or any other non-alphabetic characters fit in an "alphabetic order" as long as it remains consistent so that you maintain strict ordering if you want to use sorted collections for searching.

    Not having a standard case-insensitive comparison is a bit of a WTF but any such comparison should be part of the std::locale library.

     



  • @Cbuttius said:

    No, it is a WTF writing your code to deal with UNICODE or extended character sets and locales when your code is never going to use anything other than regular ASCII for its strings.

    You know for certain that all text usage is to be 100% self-contained with no possibility of input of any non-ASCII characters and no localisation of any text ever?

    The problem is not Unicode, the problem is that it's not the complete no-brainer that it should be to just have string = UTF-8 and nothing else, and forget that such horrors as 7- and 8-bit character sets ever existed. Not just for language support but for quality typography! Real minus signs for example.

    (In fairness, this does assume free text protocols and storage types, as fixed size fields will overrun and have an undefined character limit.)



  • @Daniel Beardsmore said:

    You know for certain that all text usage is to be 100% self-contained with no possibility of input of any non-ASCII characters and no localisation of any text ever?
     

    If that's what the specs say - and you add routines to sanitise (or reject) characters falling outside of the acceptable range - then yes.

    I think what cbuttius is alluding to is the tendency to expend unnecessary effort and cost on over-engineering products to cater for situations that won't arise for the foreseeable future. It's just as dangerous a situation as the short-sightedness of building a limited product that requires vast re-engineering when requirements change in future.



  • You're saying this in the context of a guy who's spent time implementing an 8-bit string library for no reason at all? :-P

    What I was trying to say is that when you declare an object of class String, or use a string literal, it should just automatically be UTF-8, as should be all APIs. Text and UTF-8 should be virtually synonymous. These libraries should of course all already exist!

    The anglo-centric bent of IT is quite depressing though – all the writhing and squirming about anyone who might want to use any language besides American English, or for that matter all characters that got left off early typewriters. English may never be the One True Language™ and it certainly isn't right now.



  • @Daniel Beardsmore said:

    English may never be the One True Language™
     

    I never really got the impression it was. I mean, a lot of English is derived from other languages, so it's quite an amalgamation.

    Sentence constructs of other languages draw attention to the foibles and flaws of English. I really pity those trying to learn English and coping with idioms and inconsistencies. I'm quite shocked at just how much better Europeans are at the English language, compares to our local natives.

    But, yeah - English words for coding constructs. Kinda arrogant to believe other nations may want to code in a language other than...



  • I meant that I doubt that the whole world willl switch to English, ever. Probably we'll all end up learning Chinese :)

    I don't mean variables and syntax, so much as being able to press a key on your keyboard and feel sure that it will be saved and retrieved correctly (on your own PC or anywhere) even if the majority of people in the US or the UK have no idea what language it is, or what it means.


  • ♿ (Parody)

    @Daniel Beardsmore said:

    I meant that I doubt that the whole world willl switch to English, ever. Probably we'll all end up learning Chinese :)

    No, they're learning English, too. Their language might have a chance if they abandon their ridiculous writing system, but I'm not sure the future demographics for them are favorable enough even if they adopt the Latin alphabet.



  • @Daniel Beardsmore said:

    all the writhing and squirming about anyone who might want to use any language besides American English
    I can't tell you how many compile time error's I've had from using XNA's "Color" class and typing it in "Colour" by habit. Same with HTML, though I never use it now-a-days.

    I even tried to make a class "Colour" to extend from "Color", didn't work.

    I honestly don't care about the "Colour" vs "Color" debate. But I don't want to be penalized for using the term I've been taught my whole life.

    That's got to be one of the most pathetic rants on this board, with plenty of non-english-speakers hating me for complaining, but I DONT CARE.



  • English may not be the One True universally spoken language, but it is the formal language of programming. And that is American English so you should initialize rather than initialise, and you should use color rather than colour.

    That doesn't mean you should print all your output in American English or use American date locale or decimal points. The user-interaction bit, that the user sees, remains international.

    For those parts, use extended character sets by all means if you are going to need them. However, the majority of string handling within applications is often "under the scenes" - log files, config files, exception and error messages seen only internally. Databases that are local and store UTF-8 strings as keys.

     



  • @Cbuttius said:

    For those parts, use extended character sets by all means if you are going to need them. However, the majority of string handling within applications is often "under the scenes" - log files, config files, exception and error messages seen only internally.

    Which of these would truly require writing a custom implementation, instead of your framework's existing, or a widely available, UTF-8-aware implementation?

    Besides, log files may need to log user display names, filenames, data etc that exist in a language other than English. Not a lot of good if your log files are full of "Missing object: ???????? ?? ?????? ????????????" or "Error creating file C:\????????\?????\???????".

    Configuration could easily reference non-English data. Error messages are most definitely localised, and how do you define an "internal" error message? The ones that say "You should never see this message"? There's no distinct boundary between internal and external messages, and generally the more information a user can be presented (even if it's inside an "More Details >>" panel) the better, so that at least they have some idea what's happened, otherwise tech support get the joy of another "An unexpected error occurred", or some generic message with a myriad of possible unrelated causes.



  • I agree with Beardsmore and now I feel dirty.



  • lol


Log in to reply