We have Junit tests

snoofle

A colleague was tasked with making sure that all aspects of every one of our POJOs had a corresponding JUnit test. After all, it's important to make sure that the logic contained with them works correctly.

He wrote a code generator:

  // class SomePojo {
  //   private Integer i;
  //   private String  s;
  //   public void    setI(Integer i) { this.i = i; } 
  //   public Integer getI()          { return i;   }
  //   public void    setS(String s)  { this.s = s; }
  //   public String  getS()          { return s;   }
  // }

  public class TestSomePojo extends TestCase {
    public void testSomePojo() {
      SomePojo x = new SomePojo();
      assertNotNull(x);
    } 
  	  
    public void testSetI() {
      SomePojo x = new SomePojo();
      x.setI(1);
      assertNotNull(x);
    } 
  	  
    public void testGetI() {
      SomePojo x = new SomePojo();
      x.setI(1);
      Integer y = x.getI();
      assertNotNull(y);
      assertEquals(new Integer(1),y);
    }
 
    public void testSetS() {
      SomePojo x = new SomePojo();
      x.setS("s");
      assertNotNull(x);
    } 
 
    public void testGetS() {
      SomePojo x = new SomePojo();
      x.setS("s");
      String y = x.getS();
      assertNotNull(y);
      assertEquals("s",y);
    }
  }

We now have tens of thousands of JUnit tests that all verify that Java can, in fact, instantiate objects, accept parameters to procedures, return values from functions, and that these features can be used to implement setters and getters.

At least it looks good on reports.

JimLahey

We've got tests like that too. Meaningful tests for any methods are conspicuous by their absence, whereas you can be 100% sure that all of the public static property getters work.

zelmak

@snoofle said:

public void testSetS() {
  SomePojo x = new SomePojo();
  x.setS("s");
  assertNotNull(x);
}

Man, I've been doing it wrong -- I've missed this test case!. I didn't realize I needed to test for CRAP (Complete Removal Access Pattern) ... I need to add this to my tests as you never know when a method call on an object will suddenly cause that object to be removed and/or garbage collected.

OhNoDevelopment

While they're at it, you should tell them they forgot to unit test the finalize(), notifyAll(), and wait() methods. That will keep them busy.

flabdablet

@OhNoDevelopment said:

While they're at it, you should tell them they forgot to unit test the finalize(), notifyAll(), and wait() methods. That will keep them busy.

Just make sure the guy you task with that is not the guy who wrote the existing test code generator. It would be a shame to miss out on watching some poor schlub copy-paste-modify the same useless tests into 10,000 auto-generated source files.

t_wheeler

Does the code generator have generated unit tests? Otherwise you can't know it's generating correct code....

TheCPUWizard

"Stupid" tests are not necessaily WTF..... Consider a large codebase 30% complex code, 70% "fluff" (auto-properties, etc).... There is 100% realy good testing on th6e 30% of the code. As a side efect of this 1/2 of the 70% gets invoked... 65% code coverage....

Now a change is made... The code coverage drops from 65% to 62%...... Was the change in the "Complex" or "Fluff" code??? There is NO way to tell..

On the otherhand, if a generator (ok, a bitt better than the one shown) gets 100% of the fluff...then it is MUCH easier to find exactly what code is not even being covered in a test run and how to best address it....

Seol

Firstly, the main WTF is testing only the fluff, but also the pattern for testing it is distinctly sub-optimal. We test all our simple getters and setters - by passing a list of property names to an abstract test superclass which exercises them properly. So when you look at the test, all the test methods test something substantial about the class, and the fluff stuff is referenced in a concise and unobtrusive manner; also, if we decide there's something about simple getters and setters which needs additional testing, there's one place to implement it.

The_Assimilator

@TheCPUWizard said:

"Stupid" tests are not necessaily WTF..... Consider a large codebase 30% complex code, 70% "fluff" (auto-properties, etc).... There is 100% realy good testing on th6e 30% of the code. As a side efect of this 1/2 of the 70% gets invoked... 65% code coverage....
Now a change is made... The code coverage drops from 65% to 62%...... Was the change in the "Complex" or "Fluff" code??? There is NO way to tell..
On the otherhand, if a generator (ok, a bitt better than the one shown) gets 100% of the fluff...then it is MUCH easier to find exactly what code is not even being covered in a test run and how to best address it....

Code coverage is an important metric. Including "fluff" code in that metric is the easiest way to distort it and render it meaningless. Cover the complex code that matters, not the code that will never fail.

If you explain this to your PHB, and s/he still demands coverage of everything, setup two different coverage profiles: one that covers everything, and one that covers relevant code. Use the former for reporting stats to the PHB, the latter for actually determining whether coverage has changed.

Cassidy

@TheCPUWizard said:

"Stupid" tests are not necessaily WTF

Depends upon your definition of "stupid".

The quality of the finished product is directly related to the quality of the testing. If the tests are flawed then they'll potentially miss flaws during testing which will then be found once into production (with disastrous effects).

blakeyrat

And while he's setting up those 20,000 tests, he's missing the typo-ed word "cmputer" on the program's splash screen.

TheCPUWizard

@The_Assimilator said:

Code coverage is an important metric. Including "fluff" code in that metric is the easiest way to distort it and render it meaningless. Cover the complex code that matters, not the code that will never fail.

If you explain this to your PHB, and s/he still demands coverage of everything, setup two different coverage profiles: one that covers everything, and one that covers relevant code. Use the former for reporting stats to the PHB, the latter for actually determining whether coverage has changed.

I am definately NOT talking about PHB...I am talking about very real world and directly important to the development team....

Again, if you DONT have full coverage, changes are made, tests pass without modification [because fluff was deleted, and new critical code was added, but not testes] and the code coverage stays the EXACT SAME...... How the ... are you going to be able to even use code coverage to DETECT that something has changed???

And can you be 100% sure that deleting that "fluff" really didnt impact an edge case [perhaps it was only rarely accesses and the only access was via reflection]......

If someone makes a change [ANY CHANGE] to the code the goal [IMPO] is that the existing tests MUST fail [if there are things other than brand new code] AND the Code Coverage must drop [assuming there is new code]....

On critical projects [think safety related], we have actually set these up as rules that must be passed before the modified code can even make it into the repository [which is extreme for the "norml" case, but appropriate when a simple bug can kill someone]

esoterik

this is why i don't trust people who use # of tests as a metric. One company i worked for boasted, proudly, that their tests had 1.5 times the lines of code of the code base. Choosing a few tests at random showed that most of them weren't even testing anything at all.

dhromed

@TheCPUWizard said:

kxdjhvds [xkjdhbvds].... kdsjhgf [jksdf]

I think your mind is breaking, or you're drunk.

esoterik

@TheCPUWizard said:

If someone makes a change [ANY CHANGE] to the code the goal [IMPO] is that the existing tests MUST fail [if there are things other than brand new code] AND the Code Coverage must drop [assuming there is new code]....
On critical projects [think safety related], we have actually set these up as rules that must be passed before the modified code can even make it into the repository [which is extreme for the "norml" case, but appropriate when a simple bug can kill someone]

If something is that critical you should have corresponding goad tests, an inverse test that should fail, and assert.fail(...) on success of the actual test. This makes sure that your tests don't get neutered and end up always passing. Even with 100% code coverage if a test gets neutered it won't start failing if changes are made, but also won't change the coverage! If a goad test fails you know your test always succeeds and need to be looked at (or you goad isn't goading anymore). This is easier to detect.

TheCPUWizard

@dhromed said:

@TheCPUWizard said:
kxdjhvds [xkjdhbvds].... kdsjhgf [jksdf]

I think your mind is breaking, or you're drunk.

Mind broken for decades is always a possibilty.... the other is not.

TheCPUWizard

@esoterik said:

@TheCPUWizard said:
If someone makes a change [ANY CHANGE] to the code the goal [IMPO] is that the existing tests MUST fail [if there are things other than brand new code] AND the Code Coverage must drop [assuming there is new code]....
On critical projects [think safety related], we have actually set these up as rules that must be passed before the modified code can even make it into the repository [which is extreme for the "norml" case, but appropriate when a simple bug can kill someone]
If something is that critical you should have corresponding goad tests, an inverse test that should fail, and assert.fail(...) on success of the actual test. This makes sure that your tests don't get neutered and end up always passing. Even with 100% code coverage if a test gets neutered it won't start failing if changes are made, but also won't change the coverage! If a goad test fails you know your test always succeeds and need to be looked at (or you goad isn't goading anymore). This is easier to detect.

So you are saying that it would be impossible (or at least extremely difficult) for a person to make a change to the code, have all unit tests passing, have code coverage remain the same (mid-high value) and intoruduce a bug that would not be caught by a review of changes to the tests????

If you are, I would love to come to your team, and demonstrate just how easy it is. About 1/3 of my companies contracts involve this exercise in the early days. Nearly every customer who though they had "robust testing" are shocked when they see how mangled the code can be made without there being any way to notice it by looking at solely test information. With only one exception (out of 50+ cases) they made significant changes to their methodology that included what I have been talking about here.

esoterik

@TheCPUWizard said:

So you are saying that it would be impossible (or at least extremely difficult) for a person to make a change to the code, have all unit tests passing, have code coverage remain the same (mid-high value) and intoruduce a bug that would not be caught by a review of changes to the tests????

What i am saying is that you can guarantee that your tests make Distinctions by coding the test in a way that you can test that it can fail. If a code change causes the test to never fail, this will be detected. This is the kind of failure that can be invisible to traditional unit testing. A new defect can still fall trough the cracks, but it won't be because of a neutered test. If your tests make meaningless distinctions then there will be lots of cracks defects can fall into.

TheCPUWizard

@esoterik said:

@TheCPUWizard said:
So you are saying that it would be impossible (or at least extremely difficult) for a person to make a change to the code, have all unit tests passing, have code coverage remain the same (mid-high value) and intoruduce a bug that would not be caught by a review of changes to the tests????
What i am saying is that you can guarantee that your tests make Distinctions by coding the test in a way that you can test that it can fail. If a code change causes the test to never fail, this will be detected. This is the kind of failure that can be invisible to traditional unit testing. A new defect can still fall trough the cracks, but it won't be because of a neutered test. If your tests make meaningless distinctions then there will be lots of cracks defects can fall into.

I use such tools [note: most of my work is in .NET, but that does not influence the conceptual, only the choice of tools]. They ae of some help, but are not "magic bullets". So I do not "count on them" but instead use them as simply ONE of the tools I use to do QA against my testing. But again this does NOTHING to make scenarios where a critical piece of code is added (rather than changed) and the test is NOT updated be detectable simply by looking at information from the "test side of things"...[e.g. you are testing compiled code (in my case assemblies) without access to the source code changes, you simply get a "new binary" and need to have the maximum assurance that your tests have "kept up"

DrPepper

@t_wheeler said:

Does the code generator have generated unit tests? Otherwise you can't know it's generating correct code....

I worked with a contractor who was tasked with writing automated UI tests (yes, I know...). He spent the first half of his contract writing the framework (a WTF; since there are a number of them out there); then the second half writing unit tests for the framework (great, but...). At the end of the contract, he had a working framework, but no actual tests for the application he was meant to be testing.

And so it sits - no one knows how to write tests using this framework, so no automated tests and he's gone.

DrPepper

@Seol said:

Firstly, the main WTF is testing only the fluff, but also the pattern for testing it is distinctly sub-optimal. We test all our simple getters and setters - by passing a list of property names to an abstract test superclass which exercises them properly. So when you look at the test, all the test methods test something substantial about the class, and the fluff stuff is referenced in a concise and unobtrusive manner; also, if we decide there's something about simple getters and setters which needs additional testing, there's one place to implement it.

Cool idea; could you use reflection to find the setters/getters? And do you have overrides/other tests for getters/setters that have complex business logic? (Thinking of a getter that returns the full name of a person from the firstname/lastname properties, for example)

Seol

@DrPepper said:

Cool idea; could you use reflection to find the setters/getters? And do you have overrides/other tests for getters/setters that have complex business logic? (Thinking of a getter that returns the full name of a person from the firstname/lastname properties, for example)

It doesn't actually create a test method for each property: there's just one test method which iterates over all properties and reports failures. So you can't override the test, but you can implement a test and remove that property from the initial configuration list of "simple properties". That's one reason why you don't find the properties reflectively - sometimes the naive getter/setter test doesn't fit what your methods do - the other reason being you want to make assertions about the properties that exist.

For example, we bind our domain objects to screen components reflectively, and enable and disable them reflectively using property names - so we have constants in the classes for each property name, which we use for the arguments in the tests. This way if there's a typo in the constant, the tests catch that.

error

@DrPepper said:

@t_wheeler said:
Does the code generator have generated unit tests? Otherwise you can't know it's generating correct code....
I worked with a contractor who was tasked with writing automated UI tests (yes, I know...). He spent the first half of his contract writing the framework (a WTF; since there are a number of them out there); then the second half writing unit tests for the framework (great, but...). At the end of the contract, he had a working framework, but no actual tests for the application he was meant to be testing.
And so it sits - no one knows how to write tests using this framework, so no automated tests and he's gone.

The whole idea here was probably job security by lock-in. "I'm the only one who knows how to write tests for my homebrewed testing suite, so they'll have to renew my contract if they want any tests written!"

Except, for once, sanity prevailed and both the guy and his junk were tossed out the door.

Cassidy

@joe.edwards said:

and both the guy and his junk were tossed out the door.

.. without getting paid for what he'd done, since he was contracted to write automated UI tests and failed to deliver.

TheCPUWizard

@Cassidy said:

@joe.edwards said:
and both the guy and his junk were tossed out the door.

.. without getting paid for what he'd done, since he was contracted to write automated UI tests and failed to deliver.

All depends on how the contract was written. There are two basic ways:

1) [rare these days] The contract is for actual deliverables, there is typically not "per hour" billing.

2) [common] ongoing work is performed, and "Reviewed" by the client. ypically the acceptance is the signing of a timesheet for the week (or ocassionally sightly longer period). In this model, it is the customer/client responsibility to track if the goal is being approached. At worst the consultant may not get paid for the last period, if it can be demonstrated that it deviated wildely from expectatations based on the previous period(s)

DrPepper

2) [common] ongoing work is performed, and "Reviewed" by the client. ypically the acceptance is the signing of a timesheet for the week (or ocassionally sightly longer period). In this model, it is the customer/client responsibility to track if the goal is being approached. At worst the consultant may not get paid for the last period, if it can be demonstrated that it deviated wildely from expectatations based on the previous period(s)

It's worse than that -- we had daily standups so the manager knew every day just what was being worked on, including where the test tool was at. There were demos on a regular basis. At first I think we were caught up in the "shiny object" thing; later on when it became obvious that it would be hard to write actual UI tests using this tool, we were in the midst of trying to push a product (the thing the tool was intended to test) out the door.

There were certainly opportunities to redirect the efforts of this particular individual. I'm thinking that the manager got caught up in the need to deliver our product, and so just let this particular person work on, as long as he appeared to be making some progress.

Cassidy

Yeah, I was fantasising that his attempts at lock-in backfired and he starved as a result of expending preparatory effort and not upon actual deliverables.

To my mind, if he was contracted to write the tests, writing a framework for it should be outside of the scope. If I contract someone to build me a set of gates, I don't expect to be paying for them to build a forge and tools for the job first - I'd have expected that to already be sorted before offering their services as a gatemaker.

But as said.. it's down to the content of the contract, amongst other things.

The_Assimilator

@TheCPUWizard said:

Again, if you DONT have full coverage, changes are made, tests pass without modification [because fluff was deleted, and new critical code was added, but not testes] and the code coverage stays the EXACT SAME...... How the ... are you going to be able to even use code coverage to DETECT that something has changed???

Did you even read what I wrote? If you cover the critical code separately, then that coverage will change when the critical code is changed.

@TheCPUWizard said:

And can you be 100% sure that deleting that "fluff" really didnt impact an edge case [perhaps it was only rarely accesses and the only access was via reflection]......

Fluff code doesn't cause edge cases. Code that does weird stuff with fluff code (reflection etc.) causes edge cases. That is critical code and should be tested.

@TheCPUWizard said:

If someone makes a change [ANY CHANGE] to the code the goal [IMPO] is that the existing tests MUST fail [if there are things other than brand new code] AND the Code Coverage must drop [assuming there is new code]....

Not sure what you're trying to say here... how will making a code change cause tests to fail?

Don't make the logical fallacy of equating code coverage to code correctness. Just because it's covered doesn't mean it's correct, and just because it isn't covered doesn't make it wrong. Generated unit tests do not verify correctness in any way, shape, or form; they're merely verifying that your programming language works, which is an utterly useless exercise.

Cassidy

@The_Assimilator said:

@TheCPUWizard said:
If someone makes a change [ANY CHANGE] to the code the goal [IMPO] is that the existing tests MUST fail [if there are things other than brand new code] AND the Code Coverage must drop [assuming there is new code]....

Not sure what you're trying to say here... how will making a code change cause tests to fail?

I don't understand that thought either. If you're refactoring code, passing the same tests shows deliverable functionality has not changed.

It's down to the tests, mind.

TheCPUWizard

So good comment which I will try to respond to:

1) I agree that cover coverage and code (or test quality) have little or no direct relationship. My point has been that if there is 100% code coverage at a given point in time, then many more types of changes to code and/or test will make a noticable different in this number (i.e. it is not 100% anymore) and therefore easier to see when it is necessary for additional review. If the number is something else (random choice 83%) then seeing a change (down to 81% or up to 85%) provides not useful information in and of itself.

2) I see three types of "Refactoring". There is some which has ZERO effect [e.g. renaming a local variable). There is some which has side effects (e.g. change in memoy allocation). And there is a whole bunch that has changes in effect [ e.g. reordered operations]. IMHO the first (while it will not impact tests) is exteremely rare; however the number of cases that fall into the third category, but are treated as "deliverable functionality has not changed" represents a major risk that is often ignored.

3) It is also important to realizethat testing has a cost, and in many cases the right "business decision" is to not expend the effort; yet in other cases, the cost of a potential bug (not even a real one) is so high that the testing dollars are well spent.

Cassidy

@TheCPUWizard said:

however the number of cases that fall into the third category, but are treated as "deliverable functionality has not changed" represents a major risk that is often ignored.

But your point was that post-change, existing tests MUST fail. I'd have thought that post-change, existing tests that passed must still continue to hold.

(at least that's how I read it)

TheCPUWizard

@Cassidy said:

@TheCPUWizard said:
however the number of cases that fall into the third category, but are treated as "deliverable functionality has not changed" represents a major risk that is often ignored.

But your point was that post-change, existing tests MUST fail. I'd have thought that post-change, existing tests that passed must still continue to hold.
(at least that's how I read it)

1) I thought my original post had included "idealy" before the bulleted list, but perhaps it did not....

2) My point is the "post change" there is likley to be SOME difference [perhaps a new event raised, or some other subtle change]. If the existing tests were 100% robust (granted 100% can neer be achieved] then SOMETHING should fail. If nothing does [as is often the case] then the existing tests are incomplete. So..BEFORE making the chage, add tests which will pass on current behaviour but fail on modified behavior. This first set clearly identifies to the test team what is about to change. Then the "refactor" will either cause tests to fail [and not make it into repository] OR will include changes to the tests [which give the visibility]

Cassidy

@TheCPUWizard said:

So..BEFORE making the chage, add tests which will pass on current behaviour but fail on modified behavior.

Okay, that part makes sense.

@TheCPUWizard said:

My point is the "post change" there is likley to be SOME difference

Oh, granted - else there's no point in making the change. I was thinking of situations where "under-the-bonnet" changes would result in more efficient back-end processing or refactored (and more maintainable) code but the end-user results remained the same - I would expect some UI tests to still hold true. I misunderstood the point of designing further tests that would discern differences between the change.

TheCPUWizard

Cassidy - good that is seems were are understanding each other. Once there is understanding there can be either agreemeent or disagreement; but without understanding the others are meaningless....

The way I see it is:

If you are going to make a change (any change) then there must be something "wrong" (less than ideal) about the current situation. Otherwise where is the benefit of making the change?

If there is something "wrong", then why not create some artifact (test) that illustrates the problem...this also has the benefit that if the change ever regresses [and we all know sometimes people take out code they dont understand] then there is already a build failure mechanism that will prevent such activity.