Unit Fighting

TheCPUWizard

es, except then you've got to make your code in such a way that it is reasonably practical to plug that mock in without doing a horrendous amount of other work too. It's not too bad if you have in mind doing this from the beginning… but otherwise it's pretty awful.

I am going to avoid the "M" word - because that actually refers to a specific technique [one of many] to break fixed coupling.

Like most things, it is easier if designed in, so lets look at that first: If you adopt the principle of "Dependency Inversion" (which is different than Dependency Injection or Inversion of Control, and does not necessarily involve the use of an containers) then the battle is already (largely) won.

For existing code, the techniques depend much on the language. As an Example, .NET allows for IL re-writing so that you can have a hard reference to "x = new Foo()" and yet have the code return a different type at runtime.

HardwareGeek

@kt_ said in Unit Fighting:

The problem with mocking is, you need to hardcode values and thus internal stare

Self-loathing?

Filed under:

kt_

@thecpuwizard said in Unit Fighting:

@dkf said in Unit Fighting:

es, except then you've got to make your code in such a way that it is reasonably practical to plug that mock in without doing a horrendous amount of other work too. It's not too bad if you have in mind doing this from the beginning… but otherwise it's pretty awful.

I am going to avoid the "M" word - because that actually refers to a specific technique [one of many] to break fixed coupling.

Like most things, it is easier if designed in, so lets look at that first: If you adopt the principle of "Dependency Inversion" (which is different than Dependency Injection or Inversion of Control, and does not necessarily involve the use of an containers) then the battle is already (largely) won.

For existing code, the techniques depend much on the language. As an Example, .NET allows for IL re-writing so that you can have a hard reference to "x = new Foo()" and yet have the code return a different type at runtime.

If you ever do weaving or ILEmit, you should be shot.

TheCPUWizard

@kt_ said in Unit Fighting:

If you ever do weaving or ILEmit, you should be shot

Depends on the context... For the case of ever I would have been bullet ridden over a decade ago [it would have been like The Gauntlet].

Gąska

@kt_ said in Unit Fighting:

The problem with mocking is, you need to hardcode values and thus internal stare of the class under test. If I set up this property, it means it’ll be used, right?

You only have to mock public interface - the externally visible stuff. If you have to mock internal state, it means this internal state is externally visible, and you have much bigger design problem than fragile mock.

And if by fragile, you mean that you have to update the mock whenever public interface (or how the public interface works) changes, well, duh.

Gąska

@thecpuwizard said in Unit Fighting:

@kt_ said in Unit Fighting:

If you ever do weaving or ILEmit, you should be shot

Depends on the context... For the case of ever I would have been bullet ridden over a decade ago [it would have been like The Gauntlet].

There's a fair chance you'd deserve it.

TheCPUWizard

@gąska said in Unit Fighting:

@thecpuwizard said in Unit Fighting:

@kt_ said in Unit Fighting:

Depends on the context... For the case of ever I would have been bullet ridden over a decade ago [it would have been like The Gauntlet].

There's a fair chance you'd deserve it.

Why? As one example I have a program that is highly dynamic based on "user" [human and device] inputs. Dynamically generating tightly optimized routines via ILEmit has been proven to be a powerful technique.

Gąska

@thecpuwizard in 99% cases you don't need that much performance. Of the remaining 1%, in 99% cases CLR itself invokes too much overhead for whatever you're doing, so ILEmit isn't the way to go anyway.

kt_

@gąska said in Unit Fighting:

@kt_ said in Unit Fighting:

The problem with mocking is, you need to hardcode values and thus internal stare of the class under test. If I set up this property, it means it’ll be used, right?

You only have to mock public interface - the externally visible stuff. If you have to mock internal state, it means this internal state is externally visible, and you have much bigger design problem than fragile mock.

It is the internal state of the class under test. You need to know, that it uses properties X and Y, but not Z. That’s what I’m getting at.

And if by fragile, you mean that you have to update the mock whenever public interface (or how the public interface works) changes, well, duh.

Not only that. Whenever your usage of the interface changes, you need to update your mock. If the way the interface works changes and you’re mocking it, you won’t take this into account in your tests. Or it’s an additional step you might not remember about.

So these are the cons. What are the pros?

TheCPUWizard

@kt_ said in Unit Fighting:

It is the internal state of the class under test.

More precisely, it is the state of something external to the class under test which in turn impacts the state/behavior of the class itself (internally or externally)

Lets say you a requirement to implement y=m*x+b...

int f(m,x,b) { return (g(m,x)+b }
int g(m,x) { return m+x }

From a functional point of view, f(...) does not work properly.... But there is no bug in the body of code represented by f(...), the bug is in the code represented by g(...)

Gąska

@kt_ said in Unit Fighting:

@gąska said in Unit Fighting:

@kt_ said in Unit Fighting:

The problem with mocking is, you need to hardcode values and thus internal stare of the class under test. If I set up this property, it means it’ll be used, right?

You only have to mock public interface - the externally visible stuff. If you have to mock internal state, it means this internal state is externally visible, and you have much bigger design problem than fragile mock.

It is the internal state of the class under test. You need to know, that it uses properties X and Y, but not Z. That’s what I’m getting at.

Oh, that. Yeah, you're right. That's why I usually prefer stubs and fakes to mocks. With them, you don't usually need to change them unless UUT logic changes significantly. Also, small classes make it all less painful.

Gąska

@kt_ said in Unit Fighting:

So these are the cons. What are the pros?

Fine-grained control over test scenarios. E.g. there's two mocks provided to UUT, A and B, and A.foo() returns X which has to be passed to B.bar(), and then X.baz() and B.qux() have to be called in that order. Mocks are very powerful tool, but have some problems that make them not a good idea most of the time. Just like dynamic programming (especially of ILEmit kind).

TheCPUWizard

@gąska said in Unit Fighting:

stubs and fakes to mocks

The reason I "avoided the M word" in my initial posts... There are two distinct elements:

Using "real implementation" vs. "something else"
The technology/design/approach used to trigger the usage of "something else"

I am a big fan of the "use something else" to break coupling. As far as specific technique, that can vary greatly; with the single largest influence (in my experience) being where in the design cycle the need for "something else" is first identified.

Gąska

@thecpuwizard real implementation is never a good idea. The whole point is to test units in isolation.

TheCPUWizard

@gąska said in Unit Fighting:

@thecpuwizard real implementation is never a good idea. The whole point is to test units in isolation.

I despise absolutes, at least usually :)

Do you use the real implementation of the runtime? How about the real implementation of the CPU?

Boundaries and balance are (with perhaps very, very rare exceptions) quite important.

dkf

@gąska said in Unit Fighting:

The whole point is to test units in isolation.

If you isolate too much, you end up having tests which don't help as much as you'd like. Often, while you can easily test the higher-level layers of an application, checking that the baseline real implementations of critical classes behave correctly is both important and difficult. For example, a class for launching a particular subprocess: you can easily check that arguments are assembled meaningfully in isolation, but not easily check that the arguments that you assemble are actually correct, yet that's the far more valuable check as that's the one which gates whether things will work at all.

Because of this sort of thing, small-scale unit tests are not nearly as valuable as one might hope. Large-scale unit tests (where the unit under test is the entire software module, not just an individual class) are usually more meaningful and useful (though that does depend on what the code is doing; complex bits may need their own special tests anyway). It's only once you get multi-component systems (e.g., several processes in different security contexts) that real integration testing rears its ugly head.

TheCPUWizard

@dkf said in Unit Fighting:

@gąska said in Unit Fighting:

The whole point is to test units in isolation.

If you isolate too much, you end up having tests which don't help as much as you'd like.

I woulod say that depends very much on what one "likes"

you can easily check that arguments are assembled meaningfully in isolation, but not easily check that the arguments that you assemble are actually correct

But perhaps the "bug" is in the argument parsing of the launched process.... Being able to differentiate the two scenarios is not without value.

Because of this sort of thing, small-scale unit tests are not nearly as valuable as one might hope.

Again it depends on what one values

Large-scale unit tests (where the unit under test is the entire software module, not just an individual class)

While the adoption of using "unit" from a test rather than SUT has become common. It really causes more problems than it solves. Being a purist that "unit" refers to the smallest testable piece of code that will eventually become a "system" and using other more appropriate terms such as Acceptance Test [ as applied in different levels for a PBI/Story/Feature], Behavioral Test, Functional Test keeps the semantics much cleaner and helps also identify the value proposition of each type of testing.

Gąska

@thecpuwizard said in Unit Fighting:

Do you use the real implementation of the runtime?

Well, duh. It's the unit under test. Why would you stub out the thing you're testing?

How about the real implementation of the CPU?

Everywhere the target platform isn't x86-64, unit tests are most likely to be done on x86-64 and not target platform.

My point is, there's UUT, and there's the environment. The UUT is the only real thing, and entirety of environment is stubbed/faked/mocked, so UUT can be tested independently of everything else. Except for System Integration/Verification stage, where entirety of application and its environment is tested together to see if it all works - a very important stage, but (I hope) not the only one. They're pretty much the opposite of the topic we're discussing, which is unit tests specifically.

@dkf said in Unit Fighting:

Because of this sort of thing, small-scale unit tests are not nearly as valuable as one might hope. Large-scale unit tests (where the unit under test is the entire software module, not just an individual class) are usually more meaningful and useful (though that does depend on what the code is doing; complex bits may need their own special tests anyway).

These large-scale unit tests shouldn't be called "unit" tests IMO. Depending on how large exactly the scale is, they ought be called feature test, integration test, component test, or something similar. They are just as important as unit tests, yet their purpose is slightly different (test the whole program behavior vs. prevent accidental changes in future). That's why you should have both small-scale unit tests and large-scale feature tests.

TheCPUWizard

@gąska said in Unit Fighting:

@thecpuwizard said in Unit Fighting:

Do you use the real implementation of the runtime?

Well, duh. It's the unit under test. Why would you stub out the thing you're testing?

Is it the "unit under test"? If I write a method/function, and want to test my work, then anything that is external is not [IMPO] part of "the unit".

But I do see a major distinction on what that external item is. If it is "the runtime" then I will most likely trust it; I will not be (overly) concerned that the root cause of a failure in testing my code would be triggered by a defect in the runtime. If it is "other code under development" then I would not have that level of trust, and would (most likely) want to be able to differentiate between a problem in my code and a problem in "the other code".

How about the real implementation of the CPU?

Everywhere the target platform isn't x86-64, unit tests are most likely to be done on x86-64 and not target platform.

This is an interesting scenarios (and I believe worthy of an independent conversation). Yes, it is often easier to evaluate code in a rich (typically x86/64 environment), but I regularly run granular unit tests natively on many target environments.

For example, I am working on an Arduino based system right now. During a build (which runs on x64) the "product code" along with "test code" (implementing granular class/method tests). Each of these tests are then invoked (via the x64 build agent) so that my build report contains pass/fail status of each one as part of the build [and if the tests fail, then the build fails, and the code does not get committed to the repository]

My point is, there's UUT, and there's the environment. The UUT is the only real thing, and entirety of environment is stubbed/faked/mocked, so UUT can be tested independently of everything else. Except for System Integration/Verification stage, where entirety of application and its environment is tested together to see if it all works - a very important stage, but (I hope) not the only one. They're pretty much the opposite of the topic we're discussing, which is unit tests specifically.

This I agree with though me may differ on what exact components are "UUT" and "Environment", and I also add the category of "Trusted Material" .

These large-scale unit tests shouldn't be called "unit" tests IMO....

100% agreement there :)

kt_

@dkf said in Unit Fighting:

@gąska said in Unit Fighting:

The whole point is to test units in isolation.

If you isolate too much, you end up having tests which don't help as much as you'd like. Often, while you can easily test the higher-level layers of an application, checking that the baseline real implementations of critical classes behave correctly is both important and difficult. For example, a class for launching a particular subprocess: you can easily check that arguments are assembled meaningfully in isolation, but not easily check that the arguments that you assemble are actually correct, yet that's the far more valuable check as that's the one which gates whether things will work at all.

Because of this sort of thing, small-scale unit tests are not nearly as valuable as one might hope. Large-scale unit tests (where the unit under test is the entire software module, not just an individual class) are usually more meaningful and useful (though that does depend on what the code is doing; complex bits may need their own special tests anyway). It's only once you get multi-component systems (e.g., several processes in different security contexts) that real integration testing rears its ugly head.

That's exactly my point. @Gąska says that "the whole point is to test the unit in isolation". The thing is, code rarely works in isolation. It's better to make the tests small, i.e. test for this one little thing in a certain scenario, but still use the actual code whenever possible. Otherwise you're just testing that implementation detail didn't change, i.e. it calls certain methods in a certain order.

Of course, you still need to be wary of conditional logic. So if you know there's an if(returnedValue > 0) there, you'll want to mock returnValue. But unless that's necessary, why bother?

Gąska

@kt_ said in Unit Fighting:

The thing is, code rarely works in isolation.

I'd even go as far as say it NEVER works in isolation (except Haskell). But it doesn't mean TESTING in isolation is worthless.

@kt_ said in Unit Fighting:

It's better to make the tests small, i.e. test for this one little thing in a certain scenario, but still use the actual code whenever possible. Otherwise you're just testing that implementation detail didn't change, i.e. it calls certain methods in a certain order.

It all comes down to what we mean by "unit". For some people, a unit is a single function or method. For others, it's a single class. For me, it's an abstract term for a single independent piece of code that can be reasonably tested in isolation. Usually a single class, but sometimes a couple classes that are very close together from architectural standpoint. Occasionally, a unit might be just a fraction of a single class, but then it's usually violating Single Responsibility and needs refactoring.

Personally, I design my code, along with file and class hierarchies, in a way where every piece is composed of smaller pieces, and each has clearly defined inputs (arguments for functions and methods, constructor arguments and DI services for classes) and clearly defined outputs (return values, service calls, messages, events). Also, minimize dependencies between classes, and always try to make dependencies one-way wherever possible. This makes it easier to reason about code at any abstraction layer, makes it very testable, and minimizes amount of stubs and mocks needed.

Also, there are cases where "calls certain methods in a certain order" is a functional requirement. Happens especially often when doing IPC or when interfacing with 3rd party code.

TheCPUWizard

@gąska said in Unit Fighting:

ersonally, I design my code, along with file and class hierarchies, in a way where every piece is composed of smaller pieces,
[ well it cant be every piece -- there has to be a bottom somewhere - unless you are designing turtles :) ]

So lets take a case where you have two "little pieces" A, B which are composited into C. When testing "C" the critical focus is on the composition since A and B can each have been tested independently. If a defect is ever found at this (or higher) level that is not related to the composition, then it means there is one or more missing tests for A or B

Now add F which is a composite of D and E...then G which is a composite of C and F. Same logic applies.

Gąska

@thecpuwizard depending on what exactly the code looks like, I might provide A and B to C as stubs/mocks, or test C as a single unit. It really depends on fine details and my mood on particular day.

Bulb

@ixvedeusi said in Unit Fighting:

Mainly what he does is complain that unit tests can't verify that your system behaves according to spec. That seems rather obvious to me. Unit tests verify that a specific unit behaves (and continues to behave) as the developer intended it to behave, no more no less; and in my personal experience they have been very valuable for that. Unit tests are somewhat like scaffolding in a construction site, they provide additional rigidity for the code base and "somewhere to stand on" during development. They are not a QA tool but a development tool.

This.

And the complaint in the article really is that many companies do give unit tests way too much weight for QA then they actually have.

Basically every time somebody requires specific coverage target they are doing it wrong. Coverage is great tool to discover cases that should be tested, but how much hunting for the remaining cases is worth the effort basically depends on how much confidence in the layer it gives to the developer. Because it does not really give much confidence to anybody else.

Of course functional and integration tests are another matter. Those are an important QA tool. Except many companies don't understand the difference. They think if they have this coverage, it's fine and don't see that what really matters is feature coverage.

@gąska said in Unit Fighting:

These large-scale unit tests shouldn't be called "unit" tests IMO. Depending on how large exactly the scale is, they ought be called feature test, integration test, component test, or something similar. They are just as important as unit tests, yet their purpose is slightly different (test the whole program behavior vs. prevent accidental changes in future).

Preventing accidental changes in the future is also mainly done by the feature, component and integration tests. The unit tests are most useful for debugging. Including debugging when accidental change occurs, but they are not reliable enough for catching them.

@thecpuwizard said in Unit Fighting:

If it is "other code under development" then I would not have that level of trust, and would (most likely) want to be able to differentiate between a problem in my code and a problem in "the other code".

To the extent you can actually tell though. The problem is that the separation is on interface that does not have any external requirement, so it may not be clear what should be done on which side. This kind of interfaces also tends to change a lot, resulting in breakage that does not come up in the unit test—because the test for one side gets updated with the component it tests, but the mock on the other side does not. That's why you still need integration tests to find regressions.

TheCPUWizard

@bulb - Great post!!!

As to:

separation is on interface that does not have any external requirement,

While there is no external requirement, having formal definitions can have significant value. This is especially true in environments that strongly foster code re-use. Class X may have originally been written for consumption only be class Y, but once it exists there is always the possibility that it will also be consumed by Z (unbeknownst to the original author of X and Y).

Bulb

@thecpuwizard As with all the other things about this there is a continuum here. The smaller units you make, the more arbitrary the interfaces become. So the art is to split the code-base to units small enough that you can test them thoroughly, but large enough that their interfaces make some higher level sense so it is possible to tell what behaviour is sensible and so they are at least moderately stable so you don't end up updating the test with each update to the code.

TheCPUWizard

@bulb said in Unit Fighting:

@thecpuwizard As with all the other things about this there is a continuum here. The smaller units you make, the more arbitrary the interfaces become. So the art is to split the code-base to units small enough that you can test them thoroughly, but large enough that their interfaces make some higher level sense so it is possible to tell what behaviour is sensible...

Again, many good points. It is interesting to compare to the ISP [Interface Segregation Principle]
which (for those not familiar) advocates creating the smallest interfaces possible for specific targets.

In a particular project I am working on "Customer" has 11 distinct targeted interfaces. Since I also utilize DIP [Dependency Inversion Principle] the code is consistently written that a Producer (a custom Design Pattern that has traits similar to Factory and Repository) is invoked and only the appropriate interface is returned.

...and so they are at least moderately stable so you don't end up updating the test with each update to the code.

Depending on the goal of the test, this can vary wildly as to applicability. For "pinning" tests, any observable change in behavior should require a change in the test! An observable depends very much on the observer.

As another example, I am working on an IoT type project written in straight "C". Due to constraints it is necessary to measure at a very granular level memory usage patterns. This includes stack and heap allocations. Functions are divided into "non-allocating" (heap) and "allocating". If a "non-allocating" function is changed to do a call to malloc(), the test must fail. For "allocating" functions the allocation must be one of a sized set of standard sizes (to avoid fragmentation), so an allocation which does not match one of the sizes must fail a test.