Sample Code Reviews....

TheCPUWizard

Wherein some proposed changes are presented, and people can comment on them (Approve/Reject)...

A simple one to start:

Is it acceptable to replace the "Before" implementation with the "After"???
public class CodeReview1 {


    public int Before(List<int> src)
    {
        int total = 0;
        for (int i = 0; i <= src.Count; ++i)
        {
            total += src[i];
        }
        return total;
    }


    public int After(List<int> src)
    {
        int total = src.Sum();
         return total;
    }
}

pie_flavor

@thecpuwizard Given that that's literally the implementation, I'd say yes, if not removing the variable entirely:

public int After(List<int> src) => src.Sum();

Not to mention the fact that an iterator is faster than indexing, and unless you actually expect the contents to change inside the loop, a variable should always be used as a for-loop's upper limit, rather than a method call or property access.

Tsaukpaetra

@thecpuwizard IInteerresting, looks like your <code> tag didn't work.

Unperverted Vixen

@pie_flavor said in Sample Code Reviews....:

...unless you actually expect the contents to change inside the loop, a variable should always be used as a for-loop's upper limit, rather than a method call or property access.

Method call, agreed. But a properly-implemented property is supposed to be lightweight; creating another variable just to copy the value out smacks of premature optimization.

pie_flavor

@unperverted-vixen Nearly every property should just be { get; } or { get; set; } anyway - why make them properties instead of fields? Same concept - there will eventually come along something that breaks it, and it's better just to have the rule in place already. Never assume that the code using yours will use it correctly - if that was taking IList instead of List, then you have no idea what the implementation of Count is.

TheCPUWizard

@pie_flavor said in Sample Code Reviews....:

@unperverted-vixen Nearly every property should just be { get; } or { get; set; } anyway - why make them properties instead of fields? Same concept - there will eventually come along something that breaks it, and it's better just to have the rule in place already. Never assume that the code using yours will use it correctly - if that was taking IList instead of List, then you have no idea what the implementation of Count is.

Not going to response on the original (yet), but changing a field to a property (or vice versa) is 100% a breaking change. So if one is going to follow SOLID [specifically OCP] then once a field, always a field; once a property, always a property.

Unperverted Vixen

@pie_flavor said in Sample Code Reviews....:

Never assume that the code using yours will use it correctly - if that was taking IList instead of List, then you have no idea what the implementation of Count is.

You also shouldn't assume that the content and size of an IList you take in won't change mid-flight. Using the property increases the likelihood of completing without an IndexOutOfRangeException. (Which I'm not mistaking for correctness.)

Both are incredibly worthless implementations of the interface (IMO), but taking the performance hit from the property is less likely to end in tears and sadness.

pie_flavor

@unperverted-vixen And in both situations, a foreach loop should be used instead. That's actually my primary gripe with Before.

Unperverted Vixen

@pie_flavor Agree, that's definitely the best approach.

AgentDenton

A null check wouldn't be out of place here

dkf

@pie_flavor said in Sample Code Reviews....:

Given that that's literally the implementation,

Except for it being more careful with checked (plus using an iterator and doing some sensible pre-flight checks).

kt_

@agentdenton said in Sample Code Reviews....:

A null check wouldn't be out of place here

Sum already does that. ;)

AgentDenton

@kt_ It would do that for the items in the list. But what if the list (src, in this instance) is null?

pie_flavor

@agentdenton Sum does that.

kt_

@agentdenton said in Sample Code Reviews....:

@kt_ It would do that for the items in the list. But what if the list (src, in this instance) is null?

dotnet-bot

Enumerable.Sum Method (System.Linq)

Computes the sum of a sequence of numeric values.

Check out the Exception section. ;)

DoctorJones

@pie_flavor said in Sample Code Reviews....:

@thecpuwizard Given that that's literally the implementation, I'd say yes, if not removing the variable entirely:
public int After(List<int> src) => src.Sum();
Not to mention the fact that an iterator is faster than indexing, and unless you actually expect the contents to change inside the loop, a variable should always be used as a for-loop's upper limit, rather than a method call or property access.

Agreed. I'd go one further and eliminate this method entirely. They should be calling Sum directly.

boomzilla

@thecpuwizard said in Sample Code Reviews....:

Is it acceptable to replace the "Before" implementation with the "After"???

First question: why are we spending time on this (this change, not reviewing code)?

TheCPUWizard

OK, not what this has had some time to percolate (and I did get the type of responses expected)...

Given that that's literally the implementation, I'd say yes, if not removing the variable entirely:

No, look at the material in the link. Sum() is implemented as a foreach, the before is a for. THis is the key difference (because of compiler/runtime issues, the code from locally implementing the same source as shown is not exactly identical, but willing to ignore that for the first pass).

Not to mention the fact that an iterator is faster than indexing

For each uses an Enumerator!, and the implementation of the enumerator has code in in what actually makes is slower in most cases. For a List<T> (where the data elements are contiguous,) this is true 100% of the time. For some other collection types, the enumerator may be faster. So the rationale is false.

You also shouldn't assume that the content and size of an IList you take in won't change mid-flight. Using the property increases the likelihood of completing without an IndexOutOfRangeException. (Which I'm not mistaking for correctness.)

Here we have the key point, and is basically what I am looking for. Given the provided code, it is impossible to determine that the list will not be modified. The class and method are public, so anyone in the world who has access to the assembly may be calling this method under any possible circumstance.

With the Count check as part of the loop, there is a very small (but real) window where a delete (remove) of an object at the very end of the processing will create an IndexOutOfRangeException.

When switched to a foreac, then any change to the list at any time will (because of the check for the verion number inside the enumerator, which contributes to the enumerator being slower) will result in a CollectionModifiedException.

=====
Bonus: Write a test to ensure that the behavior in "Before" remains, and will fail (reliably) if the behavior changes to that in "After"...

TheCPUWizard

@boomzilla said in Sample Code Reviews....:

@thecpuwizard said in Sample Code Reviews....:

Is it acceptable to replace the "Before" implementation with the "After"???

First question: why are we spending time on this (this change, not reviewing code)?

Because of the other thread discussing code reviews.

If this simple little change can require such a level of discussion, what will happen when there are hundreds (or thousands) of lines involved in the change? Can you be sure that a subtle, but potentially critical, change such as this will be caught?

boomzilla

@thecpuwizard said in Sample Code Reviews....:

@boomzilla said in Sample Code Reviews....:

@thecpuwizard said in Sample Code Reviews....:

Is it acceptable to replace the "Before" implementation with the "After"???

First question: why are we spending time on this (this change, not reviewing code)?

Because of the other thread discussing code reviews.

If this simple little change can require such a level of discussion, what will happen when there are hundreds (or thousands) of lines involved in the change? Can you be sure that a subtle, but potentially critical, change such as this will be caught?

Oh, jeez. You're missing the whole point of my question, which was supposed to be part of the review!

IOW, why are you wasting my time with this? What's the point of this change? People brought up all sorts of potential issues but that's all wankery because none of us have the context of the change.

Were any of those things actually problems? That's the question that has to be answered before the code can be reasonably reviewed, because it looks a lot like someone wasting time and money on stuff that doesn't matter.

Benjamin Hall

@boomzilla @TheCPUWizard aren't small edge cases like that why you have unit tests anyway? If it's a problem/requirement, the tests should catch it. If not, the people probably won't either.

Unless you're running code that requires provable correctness, in which case you're in a whole different world of pain.

boomzilla

@benjamin-hall said in Sample Code Reviews....:

aren't small edge cases like that why you have unit tests anyway?

Race conditions are very difficult to test for.

Benjamin Hall

@boomzilla But you can generate race conditions and make sure they are handled properly. Can't catch them all, sure, but "collection modified"? That one's pretty easy I'd think.

And that level of code review requires everyone to be intimately familiar with the implementation details of every piece of framework code. For 99.99999999....999999...99999999% of all programmers, that's both unwarranted and impossible.

It also doesn't scale--things that work when everyone is a perfect, fanatical expert don't work when scaled out. In fact, they tend to work worse than the status quo ante. That's the case in education, anyway.

TheCPUWizard

@benjamin-hall said in Sample Code Reviews....:

aren't small edge cases like that why you have unit tests anyway?

As I posted, special bonus for anyone who can provide such a test (reliably pass and fail - no intermittent). It is quite difficult (but not impossible).

But lets presume it has been done. The there would have been a review on the testing (and the requirements). Those are different types of reviews.

Regardless, the "smaller is more manageable" principle holds. The time to do a review is "superlinear" [has a Big O greater than O(n)] relative to the amount of information that is being reviewed. 10 reviews each looking at 10 lines of code will take much less time than 1 review of 100 lines of code as a general rule [things link long but simple array initializations that simply fall through are a common exception to that rule]. Cyclomatic complextity is almost always a good indicator.

Benjamin Hall

@thecpuwizard but you're still requiring a level of expertise that is unsustainable for anything outside of framework code. You're also requiring that the reviewers pierce the encapsulation and worry about the implementation details of the framework/library code instead of the contract.

That's important if the new code must be bug-for-bug identical or in life-or-death code, but that's a tiny minority of code. There's a place for different styles. One size fits none.

This is why I say you come across (intentionally or not) as a bragging evangelist--"look at me all you wicked people who are , convert to the right way (my way) and be saved!" It's a black-and-white, my-way-or-perdition attitude that turns people off and makes them less likely to adopt your practices, even if they're correct.

blakeyrat

@thecpuwizard Did anybody mention the possibility of an overflow.

blakeyrat

@benjamin-hall said in Sample Code Reviews....:

This is why I say you come across (intentionally or not) as a bragging evangelist--"look at me all you wicked people who are , convert to the right way (my way) and be saved!" It's a black-and-white, my-way-or-perdition attitude that turns people off and makes them less likely to adopt your practices, even if they're correct.

It's like a question you'd get asked at an interview that would instantly convince me never to work there.

In any case, the bigger issue is this code doesn't do anything to handle an overflow. That's a way more obvious bug than the race condition thing. (Although I admit the race condition thing would be more of a heisenbug, harder to solve.)

TheCPUWizard

@benjamin-hall said in Sample Code Reviews....:

This is why I say you come across (intentionally or not) as a bragging evangelist--"look at me all you wicked people who are , convert to the right way (my way) and be saved!" It's a black-and-white, my-way-or-perdition attitude that turns people off and makes them less likely to adopt your practices, even if they're correct.

Definitely not intentionally, but thanks for the feedback. Those who know me in the real world know that I am a firm proponent for "There are no universal best practices - even the "worst anti-pattern" can be the "right" solution under various circumstances.

but you're still requiring a level of expertise that is unsustainable for anything outside of framework code. You're also requiring that the reviewers pierce the encapsulation and worry about the implementation details of the framework/library code instead of the contract.

And yet, many companies are adopting it (completely independently from them even knowing I exist) and succeeding [the feedback one gets at conferences is enlightening]. One element is "the reviewers" and differentiating the CODE reviewers (who, IMPO need to be at that level) from the CONTRACT (aka Requirements) and VALIDATION (aka Testing) reviewers. If/When that differentiation is clear, many of the issues (statistically speaking) are eliminated.

blakeyrat

@thecpuwizard said in Sample Code Reviews....:

And yet, many companies are adopting it (completely independently from them even knowing I exist) and succeeding [the feedback one gets at conferences is enlightening].

But how many commits do they put in an hour?

Gąska

@thecpuwizard said in Sample Code Reviews....:

For each uses an Enumerator!, and the implementation of the enumerator has code in in what actually makes is slower in most cases. For a List<T> (where the data elements are contiguous,) this is true 100% of the time. For some other collection types, the enumerator may be faster. So the rationale is false.

Source? I seriously doubt that in 2018, one of the most used JITs in the world would make one of the most used operations on one of the most used data structures suboptimal.

Gąska

@thecpuwizard said in Sample Code Reviews....:

@benjamin-hall said in Sample Code Reviews....:

aren't small edge cases like that why you have unit tests anyway?

As I posted, special bonus for anyone who can provide such a test (reliably pass and fail - no intermittent). It is quite difficult (but not impossible).

I heard there are fuzzers that can repeatedly run a test in literally every single possible order in which instructions can run in multithreaded code.

topspin

@thecpuwizard said in Sample Code Reviews....:

Here we have the key point, and is basically what I am looking for. Given the provided code, it is impossible to determine that the list will not be modified. The class and method are public, so anyone in the world who has access to the assembly may be calling this method under any possible circumstance.
With the Count check as part of the loop, there is a very small (but real) window where a delete (remove) of an object at the very end of the processing will create an IndexOutOfRangeException.
When switched to a foreac, then any change to the list at any time will (because of the check for the verion number inside the enumerator, which contributes to the enumerator being slower) will result in a CollectionModifiedException.

But why is that relevant? (Note, not the same question as @boomzilla's.)

If you're modifying the list from outside without synchronization, you got a race condition, which is a bug. In both cases you get an exception when it happens, and with the revised code arguably the more appropriate exception, too.
It shouldn't really matter which of these two exceptions you get, unless you put in a handler for this exact exception to mask the bug, instead of fixing it.

My review would've been "sure, looks better". (Hypothetically, I don't speak C#)

topspin

@blakeyrat said in Sample Code Reviews....:

@thecpuwizard Did anybody mention the possibility of an overflow.

That's common to both the "Before" and "After" implementation, I think?

robo2

I'd definitely use the After, as the Before gives an IndexOutOfRangeException at every use (when i == src.Count)
But aside: this is another reason for me to prefer the simpler construct.

boomzilla

@topspin said in Sample Code Reviews....:

@blakeyrat said in Sample Code Reviews....:

@thecpuwizard Did anybody mention the possibility of an overflow.

That's common to both the "Before" and "After" implementation, I think?

This is a way that "Why are we doing this?" can help. We have no idea if overflow is really a problem or if we'd want to deal with such a thing here.

TheCPUWizard

@topspin said in Sample Code Reviews....:

If you're modifying the list from outside without synchronization, you got a race condition, which is a bug

Not really, there are plenty of cases where iterating over a dynamically changing collection makes perfect sense and can be a desired design choice.

topspin

@thecpuwizard said in Sample Code Reviews....:

@topspin said in Sample Code Reviews....:

If you're modifying the list from outside without synchronization, you got a race condition, which is a bug

Not really, there are plenty of cases where iterating over a dynamically changing collection makes perfect sense and can be a desired design choice.

Just to be sure I understand you: are you disagreeing it's a race condition or that a race condition is a bug?

blakeyrat

@topspin said in Sample Code Reviews....:

That's common to both the "Before" and "After" implementation, I think?

So is the threading race condition CPU Wizard was so proud of pointing out.

blakeyrat

@thecpuwizard said in Sample Code Reviews....:

Not really, there are plenty of cases where iterating over a dynamically changing collection makes perfect sense and can be a desired design choice.

You'd want to use a collection type that communicates that intent, like a BlockingCollection.

boomzilla

@blakeyrat said in Sample Code Reviews....:

BlockingCollection

0_1520370948497_a2a00b2b-ad8c-4529-a978-ef71ad12375e-image.png

TheCPUWizard

@topspin said in Sample Code Reviews....:

@thecpuwizard said in Sample Code Reviews....:

@topspin said in Sample Code Reviews....:

If you're modifying the list from outside without synchronization, you got a race condition, which is a bug

Not really, there are plenty of cases where iterating over a dynamically changing collection makes perfect sense and can be a desired design choice.

Just to be sure I understand you: are you disagreeing it's a race condition or that a race condition is a bug?

Depends on usage of terminology. Having a "dependency on order of execution with the potential of different orders occurring in a non-deterministic manner" is one definition of "race condition". Such situations can be normal and proper or something that is unexpected and causes problems. It is in the latter case that I would use the word "bug".

TheCPUWizard

@blakeyrat said in Sample Code Reviews....:

@thecpuwizard said in Sample Code Reviews....:

Not really, there are plenty of cases where iterating over a dynamically changing collection makes perfect sense and can be a desired design choice.

You'd want to use a collection type that communicates that intent, like a BlockingCollection.

Would not a BlockingCollection indicate the opposite intent? [ just seeking clarity ]

One aspect I see as part of it is the question of does "reading the detailed documentation" count as "communication of intent"?

blakeyrat

@thecpuwizard said in Sample Code Reviews....:

Would not a BlockingCollection indicate the opposite intent? [ just seeking clarity ]

Fine; you pedant. Something in the System.Collections.Concurrent namespace, if not specifically BlockingCollection.

Point is, if you're using the collection concurrently, use a collection that's designed to be used concurrently. ConcurrentQueue, ConcurrentStack, BlockingCollection, whatever one fits your need.

@thecpuwizard said in Sample Code Reviews....:

One aspect I see as part of it is the question of does "reading the detailed documentation" count as "communication of intent"?

Look, you might have a List<Poop> and tell all your developers "hey guys this list should be sorted all the time, 'kay?" And you might do all these code reviews where you get to feel really smart and smug by pointing out to developers when they added something to the list and forgot to re-sort it, and now you're a software development superhero for "saving" all these bugs from the production code.

Or you can just use a SortedList<Poop> in the first place.

Oh noes. TheCPUWizard doesn't get to show off how smart and clever he is anymore, but, oh wait, guess what? the code's fine, every time, because the fucking compiler does a way better job at enforcing the sorting than CPUWizard ever could.

Got it?

If you want the list to always be sorted, use a SortedList in the first damned place. If you want the collection to be accessed concurrently by multiple threads, use the ConcurrentQueue in the first damned place.

topspin

@thecpuwizard said in Sample Code Reviews....:

Having a "dependency on order of execution with the potential of different orders occurring in a non-deterministic manner" is one definition of "race condition". Such situations can be normal and proper

If deterministic results don't matter, I propose the following simplified implementation:

public int After(List<int> src)
{
    return 0;
}

// somewhere else
public void stuffThatHappensOnDifferentThread(List<int> src)
{
    List<int> original = src
    for (int i = 0; i < src.Count; ++i)
        src[i] = 0;

    // After happens to execute here

    src = original;
}

TheCPUWizard

@blakeyrat - Kind of missing the point.... At any point in time code does something. If the something changes, then there is a risk of unintended consequences. Code Review has as one of its goals the identification of these items.

During the design (not implementation) activities, the decision of List<T>, IList<T>, or any of the other concrete implementations or interfaces/contracts, as well as the public (or internal or protected) choices for both the class and the method are all important.

Doesn't it make sense for those aspects (I am asking seriously) to be reviewed and agreed upon before doing the implementation?

blakeyrat

@thecpuwizard No; you're coming at this from the wrong angle.

If the "bug" here is "well if you edit that collection in another thread, you'll get an OutOfBoundsException", the solution to the code review isn't to add a lock in this particular function or whatever you recommend, the solution is to ensure that types expected to be accessed concurrently are the correct types.

So while you do review the code, and do identify an issue, the fix has nothing to do with the code quoted above. It has to do with using the correct types in the first place.

Also the potential overflow is a way bigger deal.

TheCPUWizard

@blakeyrat said in Sample Code Reviews....:

f the "bug" here is "well if you edit that collection in another thread, you'll get an OutOfBoundsException",

But the scenario could easily be "I am replacing intems in the list, or adding them and it is not important exactly what elements are summed" [perhaps we are trying to get the average of the latest 1000000 samples - though this would be a sub optimal way".

The behavior of the original is correct and expected by some consumer (remember public class/public method) who the developer does not even know exists.

TheCPUWizard

@blakeyrat said in Sample Code Reviews....:

Also the potential overflow is a way bigger deal.

Apologies, you did mention this before, and I meant to ask for clarification because I am not following....

blakeyrat

@thecpuwizard said in Sample Code Reviews....:

But the scenario could easily be "I am replacing intems in the list, or adding them and it is not important exactly what elements are summed" [perhaps we are trying to get the average of the latest 1000000 samples - though this would be a sub optimal way".

This just confuses me. The only thing the method does is sum the integers. How is it adding or replacing items in the list? How is it trying to get an average?

@thecpuwizard said in Sample Code Reviews....:

Apologies, you did mention this before, and I meant to ask for clarification because I am not following....

It's not obvious? You're summing an arbitrary number of integers and putting the result in an integer. You're not doing anything to handle an overflow. You either need to handle overflows smartly (presumably by raising an exception), or put the result in a data type that can handle arbitrarily large integers (like System.Numerics.BigInteger.)

CreatedToDislikeThis

@blakeyrat - I like using the ostrich algorithm to handle integer overflows.