CodeSOD collection

Kamil Podlesak

@BernieTheBernie said in CodeSOD collection:

We can use those pseudo-hungarian type indicators for causing confusion. What do you think of string dtX = "Blah!";? If you see that variable later on in the code, you may think it's some DateTime. But it isn't, it is a string (and also Intellisense could tell you).

That is a correct use of Hungarian notation. It's purpose is to define that even when it's string, the string actually contains date-time in some format.

O course, you might say that "Blah!" is not a valid date/time... but maybe it is, in some culture. Maybe it's equal to "¡Mañana!" ?

HardwareGeek

@Kamil-Podlesak said in CodeSOD collection:

"Blah!" is not a valid date/time... but maybe it is, in some culture. Maybe it's equal to "~~¡Mañana!~~Monday" ?

dkf

@Bulb said in CodeSOD collection:

format specification

Because using a useful method name like Format in that case is too hard?

TwelveBaud

@dkf Wow! You guessed the name of the formal parameter! First try, too!

Zecc

@dkf I figure they figured they'd reuse the same name as the default ToString() method from class Object, which gets called when you use any object in string interpolation without a format specifier.

class T: IFormattable
{
	// From Object
	public override string ToString()
	{
		return $"ToString()";
	}
	
	// From IFormattable
	public string ToString(string format, IFormatProvider provider)
	{
		return $"ToString('{format}')";
	}
}

void Main()
{
	var t = new T();
	Console.WriteLine($"{t}");       // ToString('')
	Console.WriteLine($"{t:bleh}");  // ToString('bleh')
}

Whether that's a good idea...

Bulb

@dkf The trouble with Format is, is it formatting the invocant according to the parameter or the parameter according to the invocant? I also think that IFormattable.ToString(format, provider) basically just calls provider.Format(this, format) to add to that confusion.

dkf

@Bulb When there's a confusing mess, adding to the confusion isn't a good plan.

Benjamin Hall

@dkf said in CodeSOD collection:

@Bulb When there's a confusing mess, adding to the confusion isn't a good plan.

But it absolutely seems to be SOP just about everywhere.

Bulb

@dkf said in CodeSOD collection:

@Bulb When there's a confusing mess, adding to the confusion isn't a good plan.

In this case, naming it Format would add more confusion than naming it ToString and overriding the Object implementation. It does have the same purpose as the Object.ToString after all, just adds some configurability.

JBert

@topspin said in CodeSOD collection:

@Bulb what an extremely click-baity title. Not going to watch a 70 minute video if they can't bother to tell me what it's about.

From some random jumping around it seems to be an enjoyable talk, like this bit here:

https://youtu.be/FyCYva9DhsI?t=3621

I'm not going to get rid of them just yet.

The first thing I'm going do is I'm going to correct all those damn spelling and punctuation errors.

Right.

Now I've corrected it, now I delete them, okay. Okay, but... You know, this is... Do the right thing before you do the right thing, okay?

dkf

@JBert said in CodeSOD collection:

I'm not going to get rid of them just yet.

The first thing I'm going do is I'm going to correct all those damn spelling and punctuation errors.

Right.

Now I've corrected it, now I delete them, okay. Okay, but... You know, this is... do the right thing before you do the right thing, okay.

I am in this post. I am OK with this.

remi

@JBert said in CodeSOD collection:

From some random jumping around it seems to be an enjoyable talk, like this bit here:

It's nice to listen to, with just enough (programming) humour in there to make it easy.

But with a tiny bit of bad faith, the gist of that part (until the end) really reads as:

Oh that code is bad, look, if I just move all of it to other functions elsewhere, then I'm left with the cleanest code of all i.e. no code. Sure, it's up to the caller to do the error handling. And the parameters conversion. And everything else. Let them all do the work so we don't have to. But hey, now this specific function is so clean you can't even see it!

Which is, well, , but not very helpful either. So a good fit for both TD and this thread specifically

dkf

@remi said in CodeSOD collection:

Oh that code is bad, look, if I just move all of it to other functions elsewhere, then I'm left with the cleanest code of all i.e. no code. Sure, it's up to the caller to do the error handling. And the parameters conversion. And everything else. Let them all do the work so we don't have to. But hey, now this specific function is so clean you can't even see it!

Which is, well, , but not very helpful either. So a good fit for both TD and this thread specifically

The point was more about moving the damn verification and error generation to where it belongs, and how lots of code just goes away when you do that. Little patterns like checking whether an object has been made in the OK state are just horrible.

The bits I hate most of all are where you have parallel hierarchies of objects because the things you have in your database interface/ORM simply cannot be the things that you hand out over the service interface/web. Because they're just nastily different in some subtle ways (e.g., the web version needs this kind of decoration, and the DB version needs that but has to have some things that are in there but which shouldn't be handed out to the wider world, etc.) So you build code to copy lots of properties over, back and forth, and it really makes me grind my teeth.

PleegWat

@dkf I'm pretty sure the product we'll be moving to is autogenerating that code from a yaml definition. I do not yet know why they're checking the result into source control instead of it being a build artifact.

Bulb

@PleegWat said in CodeSOD collection:

I do not yet know why they're checking the result into source control instead of it being a build artifact.

Well, sometimes if changes in the ultimate source are rare and the code generator is big it makes sense to check in the generated code so the developers don't all need to have the code generator installed and don't need to run it if it is slow.

HardwareGeek

@Bulb said in CodeSOD collection:

code generator is big

Or if it's not part of the build chain. E.g., in my project, everything is on Linux, except there's a code generator that has to run on Windows, because . So the output of that generator is checked into the repo with all the other code.

Bulb

@HardwareGeek I've done builds that (on the CI server) split or chained themselves across multiple platforms several times already in different shapes and forms, but that way also makes sense.

BernieTheBernie

The concept of an Enum is rather simple in C#. Actually, you just create a bundle of integer constants.

enum E
{
    A,
    B,
    C
}

Here, A is 0, B is 1, and C is 2. All of them are signed 32 bit integers.
You may define the values of the enum members explicitly, like A = 42, but that's not required. And, you are allowed to mix implicit and explicit value assignments. Duplicate values are also allowed, but not duplicate keys.

Originally, the values in this enum were ordered according to their importance. Later on, Johnny added 2 more values, at the end:

enum E
{
    A,
    B,
    C,
    D,
    E
}

But the "natural" order should be E,D,A,B,C.

How can we get there without manipulating the enum?

We need a mathematical manipulation which places 4 (E) before 3 (D), and 3 (D) before 0 (A), while not affecting the order of 0,1,2.

Hint:
Arithmetic overflow. Let's use some unchecked statement.
65535 and -1 are equivalent - with the right selection of number encodings. And 65536 gets easily truncated to 0 when you cast an Int32 to an Int16.
Next, Math.Abs will change their order.

So, we could add the value of 2 to all values, and define D sufficiently large to yield -1 afterwards:

enum E
{
    A,
    B,
    C,
    D=65533,
    E
}

with the formula:

E[] e;
unchecked { e = ((IEnumerable<E>)Enum.GetValues(typeof(E))).ToList().OrderBy(x => Math.Abs((Int16)(x + 2))).ToArray(); }

Enum.GetValues(typeof(E))just produces a non-generic (i.e. old-style) array of the values of the enum in the order they are defined. We can cast that to an IEnumerable<E>. For the fun of it, let's add a non-necessary ToList() (it really works without that, too. But why omit it? It is not wrong at all!).
Now we can do the ordering with LinQ. The predicate is a simple x => Math.Abs((Int16)(x + 2). It will lift A from 0 to 2, and D from 65533 to 65535 which gets cast to -1 when we change that to an Int16. And E is lifted from 65534 to 65536 which is all 0s in its first 16 bits. Eventually let's add a ToArray() so we can later use the index in the array for the importance of the enum value.

Still, one of the enum values had to be manipulated. Can we do better?
Yes, we can!

Hint: terns.
In the formula, we used the magic number 2. We can replace it with a tern: ((int)x < 3 ? 2 : 65532). A,B,C will get 2 added, and D,E 65532 (do not forget: without an explicit value defined, D is 3).
So we arrive at our conclusion:

E[] e;
unchecked { e = ((IEnumerable<E>)Enum.GetValues(typeof(E))).OrderBy(x => Math.Abs((Int16)(x + ((int)x < 3 ? 2 : 65532)))).ToArray(); }

And everyone who will touch the enum, must also touch this little gem.
Bugs will arise.
Yes, I like bugs.
But I do not write them. Only code that bug-gers you.
:pride:

remi

@dkf said in CodeSOD collection:

The point was more about moving the damn verification and error generation to where it belongs,

Yes, I agree with that part.

and how lots of code just goes away when you do that.

But it doesn't really "go away," it's just moved elsewhere. Well, code that is factored out in a separate function avoids duplication so the duplicated code goes away. But when it's about e.g. the string to int conversion, yes it makes sense (maybe, depending on context, but I can accept his premise here) to move it elsewhere but it's somewhat misleading to then point at the function that no longer does that and say "look how small it is" because you haven't removed code, just moved it. Which, again, might be a good thing. But not in terms of "less code is better code"-ideal, which is what his way of showing it implies.

I'm not saying he's wrong, but that he is kind of cherry-picking amongst the consequences of his changes and doing so is possibly slightly misleading.

Not related to the point above, I'm not convinced by his simplification of the strings (in the error logging bits) when they are just constants. to watch again but IIRC his argument is something like "there are lots of repeated blocks, each one builds a string by adding some constants bits and some variable bits, but when the string actually only has constant bits then a different syntax is more appropriate (with string literals) so I'm gonna change some of the repeated blocks to use that different syntax."

I get his point about building strings being less efficient (clean, bug-free...) than using literals, and I agree with that. But given how the code he's changing is really doing the same thing in all blocks, just with different arguments, I feel that "doing the same thing but with slightly different syntaxes" is a bad idea. I know that, as a developer discovering that code, I might spend a couple more seconds trying to understand why those blocks are not visually exactly the same and wondering if they do the exact same thing. Plus if I needed to modify one such block (and either add a variable bit where there wasn't before, or removing all the variables bits), I would have to think a bit more about how to do it than if everything uses the same format.

In that specific case, I would say that consistency across all blocks that functionally do the same thing is better than "optimal" code in each of those blocks. Unless of course there are good reasons (e.g. you can show a significant performance hit or whatever). And of course the best solution is to shove all that in a single function rather than duplicate it. But if you have to duplicate, I'd rather keep all blocks as similar to each other as possible.

remi

Also I'm not entirely convinced by the "move string to int conversion" elsewhere part. It may make sense in this case, I'm not saying it never would.

(for those who haven't watched: the function connects to a server. Initially it reads server/port from some configuration class, doing some string-to-int conversion along the way because the configuration class is string-based, he changes it so that server/port are passed as int's to the function, moving the conversion "elsewhere." There are other things he changes, but I'm focusing on that one here.)

If that function is called in several places, and the server/port is the same in all cases (which is implied from the original function that read the same configuration in every call!), then now every place that calls this function has to do the fetch-from-config-then-convert dance. So you've in effect duplicated code by removing it from that function.

Now obviously you're going to say that you should make another function that reads the config, converts the strings, and then calls the function we're looking at. But doing so was just splitting the function in several cascading ones, and the only justification is the whole "simple code" idea that each function should only ever do one thing. When pushed a bit like here, it ends up with each function being just one line. Well done, you've multiplied by 4 (at least) the number of lines of code, and in many cases maybe obfuscated it (by hiding that single line of code inside a function whose name may or may not accurately represent what that line does).

I'm putting a bit of bad faith here, granted (there's a world of compromise and good practices between a single humongous function, which is Very Bad, and all one-liners, which is IMO also Bad), but still, I'm not entirely convinced by his arguments. It may be valid in his specific case, but I don't think you can really make any sort of rule out of it, there are too many other aspects to consider.

dkf

@remi said in CodeSOD collection:

If that function is called in several places, and the server/port is the same in all cases (which is implied from the original function that read the same configuration in every call!), then now every place that calls this function has to do the fetch-from-config-then-convert dance. So you've in effect duplicated code by removing it from that function.

What you'd do is make a constructor that takes a configuration (or part thereof) and make the configuration have the string-to-int conversion code if it needs it (i.e., the conversion of the value to a type that is internally acceptable happens when the configuration is read, not when the value is consumed). Little helper functions that everyone has to use are a sign that you got the original API really wrong.

And that's most definitely not a C++-specific observation.

remi

@dkf That might work, yes. I agree that if your only choice is between duplicating stuff, or putting it in a place that doesn't make sense, then the design is probably wrong.

More generally speaking though, I'm not a big fan of the "lots of tiny classes/functions" style of coding, because I've been bitten by it a few times. I remember one case where some 3D data (i.e. a float-array + 3 dimensions (nx/ny/nz), essentially) was described with a Data, a DataCube, a DataInfo, a DataDimension, a DataValue, a ValueInfo... and that's without going into the DataUpdater or DataPersistence, or actually accessing the numbers which required going through a DataReader or a DataWriter...

OK, I'm might be making some of it up, there weren't that many classes. But there were enough, each one being just one or two lines of code, that trying to understand the logical flow through it was impossible because you had to jump around so many classes. And I'm not making up that part, I remember getting many headaches trying to understand that code.

(To be clear, this code was badly designed, but not all code that uses that pattern are necessarily so. It's just that I encountered some bad examples and therefore I'm a bit wary of seeing it promoted too blissfully!)

Parody

@dkf said in CodeSOD collection:

The bits I hate most of all are where you have parallel hierarchies of objects because the things you have in your database interface/ORM simply cannot be the things that you hand out over the service interface/web. Because they're just nastily different in some subtle ways (e.g., the web version needs this kind of decoration, and the DB version needs that but has to have some things that are in there but which shouldn't be handed out to the wider world, etc.) So you build code to copy lots of properties over, back and forth, and it really makes me grind my teeth.

I'm having scripting interface flashbacks...something something COM something something object lifetime something something horrific macros for (near-)duplicate code something something lazy programmer something something long compile times. sdjakflp4839%#$@%g9 NO CARRIER

Kamil Podlesak

@BernieTheBernie said in CodeSOD collection:

The concept of an Enum is rather simple in C#. Actually, you just create a bundle of integer constants.
...
So we arrive at our conclusion:
E[] e;
unchecked { e = ((IEnumerable<E>)Enum.GetValues(typeof(E))).OrderBy(x => Math.Abs((Int16)(x + ((int)x < 3 ? 2 : 65532)))).ToArray(); }
And everyone who will touch the enum, must also touch this little gem.
Bugs will arise.
Yes, I like bugs.
But I do not write them. Only code that bug-gers you.
:pride:

My conclusion is: the way enums were added to Java was freaking genial.

topspin

@BernieTheBernie

dkf

@remi said in CodeSOD collection:

I'm not a big fan of the "lots of tiny classes/functions" style of coding

QFFT! I term that “ravioli coding” because it's still pasta but now you've put it all in lots of little objects and surrounded them with ~~sauce~~ glue code.

dkf

@Kamil-Podlesak said in CodeSOD collection:

the way enums were added to Java was freaking genial

Java had the advantage of looking at the way things were done, both in practice in Java as it was before then and in other languages, and being able to say “what are we really trying to do here anyway?”

Python's enums copy Java's, except they also decided to make IntEnums for when you want your special things to also be numbers because multiplying two enum values together is an obviously great plan.

Applied Mediocrity

@dkf Not sure what's wrong with IntEnums (and IntFlags), if for one reason: they can have undefined values that are not, however, invalid values. By nature of being integers they are also more lightweight. There's only one instance per assembly that holds all the blah.

BernieTheBernie

@topspin said in CodeSOD collection:

@BernieTheBernie

Hey, I am just providing an interesting lesson here at the University of What The Fuck.
Far beyond the quality of the beginner lessons provided on the front page. This here is really Highly Advanced Matter for Experienced Professionals ().

dkf

@BernieTheBernie said in CodeSOD collection:

@topspin said in CodeSOD collection:

@BernieTheBernie

Hey, I am just providing an interesting lesson here at the University of What The Fuck.

Don't look at me. WTF-U has mostly been behaving themselves recently, at least when it comes to IT.

BernieTheBernie

@BernieTheBernie said in CodeSOD collection:

(int)x < 3

That no good! The human brain struggles with neagtive statements. So let's just turn it round and negate it:
!((int)x >= 3)

A little better now.

BernieTheBernie

And e is then used in:

foreach (var r in configuredItems)
{
    var s = e[r.GetThings(null).Max(x => Array.IndexOf(e, x.Property.EnumProperty) as int?) ?? 0];
    rs.Add(r);
    ss.Add(s);
}

var maxS = e[ss.Max(x => Array.IndexOf(e, x))];
var maxR = rs[Array.IndexOf(ss.ToArray(), maxS)];
return !(Array.IndexOf(e, maxS) < Array.IndexOf(e, configuredMinimum));

in order to collect the highest value from some components which were set during configuration.
Unfortunately, r.GetThings and x.Property.EnumProperty still have well readable names.
Of course, you could stop the collection process (which takes time: TCP) when the first Thing was found which fulfills the threshold criterion. But Fritz prescibed the complicated way (likely he does not have that basic knowledge of logic to see that).

This snippet contains another little gem. GetThingsmay return an empty list, so Max would fail. But by explicitly casting the value to int? (i.e. a nullable integer), Max would return null for an empty list, and that can then be replaced with a 0 with the ?? 0 residue.

dkf

@BernieTheBernie said in CodeSOD collection:

Of course, you could stop the collection process (which takes time: TCP) when the first Thing was found which fulfills the threshold criterion.

That reminds me of code that would laboriously fetch a big list of tens of thousands of IDs of objects from a web service, and then fetch the full details for each of those objects to flesh them out, before returning the list to the caller. Who would almost invariably just get the length of the list and compare that against zero, and that was the only use of that call anywhere.

When the code was replaced with a simple hoisted existence check (which could be essentially just passed straight through to the DB on the back end with only minimal wrapping), the application suddenly went a lot faster…

Applied Mediocrity

@dkf said in CodeSOD collection:

the application suddenly went a lot faster…

Damnit, who dun altogether removed the speed-up ~~loop~~ query?

Bulb

@dkf said in CodeSOD collection:

That reminds me of code that would laboriously fetch a big list of tens of thousands of IDs of objects from a web service, and then fetch the full details for each of those objects to flesh them out, before returning the list to the caller. Who would almost invariably just get the length of the list and compare that against zero, and that was the only use of that call anywhere.

Databases and SQL were created for processing data, and are processed inside the database engine to allow specifying what exactly is needed to avoid any extra work. But it's “modern” to load the data up into the application and do processing there, in a general purpose programming language much less suited for the purpose and passing up the opportunity for many optimization the database engine could do if it was just given the whole job.

izzion

@Bulb said in CodeSOD collection:

@dkf said in CodeSOD collection:

That reminds me of code that would laboriously fetch a big list of tens of thousands of IDs of objects from a web service, and then fetch the full details for each of those objects to flesh them out, before returning the list to the caller. Who would almost invariably just get the length of the list and compare that against zero, and that was the only use of that call anywhere.

Databases and SQL were created for processing data, and are processed inside the database engine to allow specifying what exactly is needed to avoid any extra work. But it's “modern” to load the data up into the application and do processing there, in a general purpose programming language much less suited for the purpose and passing up the opportunity for many optimization the database engine could do if it was just given the whole job.

Though there are certain operations (sorting / pagination) that the database server is less good at than the front end... so if you really do need all the data, then you should pull it all down and let ~~God~~ the web server sort it out. And don't get me started on reporting / data roll-up queries.

dkf

@izzion said in CodeSOD collection:

Though there are certain operations (sorting / pagination) that the database server is less good at than the front end...

^{[citation needed]}

izzion

@dkf
. https://www.brentozar.com/archive/2019/10/how-to-think-like-the-sql-server-engine-adding-an-order-by/

https://www.brentozar.com/archive/2013/02/7-things-developers-should-know-about-sql-server/ (#2)

https://stackoverflow.com/a/326652 (and the answer two further down, with 5 upvotes from Draemon)

Bulb

@izzion said in CodeSOD collection:

pagination

The purpose of pagination is to avoid sending all the data at once, so that does not make sense to do anywhere but the server. If you already got the data on the client, just show them.

@izzion said in CodeSOD collection:

sorting

Yes, provided pagination isn't getting in the way. I've seen a couple systems that load the first page of data only, then re-sort or filter client side. Cue puzzled user looks.

Zecc

@Bulb said in CodeSOD collection:

The purpose of pagination is to avoid sending all the data at once, so that does not make sense to do anywhere but the server. If you already got the data on the client, just show them.

Don't forget the database server and the application server are two different things. You may decide that it's okay to get all data from the database, but still not want to render it all.

dkf

@Zecc said in CodeSOD collection:

You may decide that it's okay to get all data from the database, but still not want to render it all.

Data that is never moved at all is data that is not slowing your code down.

turingmachine

@Bulb said in CodeSOD collection:

Yes, provided pagination isn't getting in the way. I've seen a couple systems that load the first page of data only, then re-sort or filter client side. Cue puzzled user looks.

You mean like AWS when listing ECS services? Well, I guess Amazon is just a small startup who can't afford to do it right...

Bulb

@turingmachine Or like Azure when selecting logs. Or like JDEdwards OneWorld when generating any bigger report. I guess Microsoft and Oracle are also just small startups that can't afford to do it right …

dkf

@Bulb Sounds like there's a lot of chuckleheads who can't get it right (and yes, I've seen a coworker of mine make the same blunder). Don't know why it's such an attractive mistake…

Kamil Podlesak

@dkf said in CodeSOD collection:

@izzion said in CodeSOD collection:

Though there are certain operations (sorting / pagination) that the database server is less good at than the front end...

^{[citation needed]}

There is is one area where DB engines usually struggle: collated sorting. Which makes sense - to do it really efficiently, you pretty much need one index for each possible locale. And not every DB engine allows you to do that, so in the end you have to create special columns with "sort key" (ie the value processed through collator).

Paging, OTOH, is exactly the thing that has to be done in database for any non-trivial amount of rows.

Bulb

@dkf I think the reason lies in the proverbial efficiency of web browsers together with web UI design still being a bit of a cowboy operation even in big companies with shortage of well made data components and everybody rolling their own. And notorious underappreciation for QA for business applications (I've never seen a web shop get this wrong).

dkf

@Kamil-Podlesak said in CodeSOD collection:

Paging, OTOH, is exactly the thing that have to be done in database for any non-trivial amount of rows.

And paging has to be done after sorting for it to be correct. If paging isn't applied to items in a consistent repeatable order, the results will be intensely confusing to users.

Bulb

@dkf If you want to apply sorting and/or filtering client-side, you should get the first page, display it, but then proceed to fetch the rest of the data in the background, or fetch them when the user reaches for the sort or filter buttons. But many fail to do that, either because they simply forget, or because they fear they'll slow things down.

dkf

@Bulb A lot of things you actually want to sort by aren't localized in the first place. Either because they're not data that changes with locale (such as the quantity in stock of something), or because they're user-supplied data and there ain't nobody got time for correctly localizing all that (machine localization being a good source of laughs for us here).

Bulb

@dkf … or because the application does not deal with multiple locales in the first place. Business applications rarely need to.

The proverbial performance of web browsers is part of the issue here. A report with ten thousand records is on order of megabyte of raw data, which should be nothing on modern computers. But if you just feed it into the DOM, the browser will probably grind to halt and even the javascript objects may be too much on slower computers. Which is why the application developers do page the data and then get into the fix.