Why does the C# compiler think I'm accessing unassigned variables?
-
Consider the simplified class at the end of this example. Right now, the compiler claims that in my call to
B()
, parametersb
andc
are accessing a variable that's unassigned.If they're not assigned, then I would have bailed out in the
d == null
check.Asides from defaulting
b
andc
to null, is there any way to get it to realize that no, I'm not using unassigned variables?Or alternately, am I the real , and it's possible for them to end up unassigned at the time of the call to
B()
that I'm missing?namespace WantToGoToSpace { class Test { public static string GetA(int key) { return null; } public static string B(string a, string b, string c, string d) { return null; } public static bool Busted(int key) { var a = GetA(key); string b; string c; string d = null; if (a != null) { b = GetA(key); if (b != null) { c = GetA(key); if (c != null) { d = GetA(key); } } } if (d == null) { // Do some shared "we failed" logging, which is why I didn't just return immediately. return false; } if (d.Length > 0) { B(a, b, c, d); } return true; } } }
-
@unperverted-vixen could they be being garbage collected before then? I've had such weirdness before when I wasn't tracking the lifetime of variables and they were going out of scope somewhere before that. That's my only idea.
-
@benjamin-hall I’m getting compile failures, not runtime failures.
-
@unperverted-vixen Eric Lippert had a couple very interesting articles on a related case.
They're not specifically about this, but they deal with the C# compiler's analysis to prove things about method calls. The relevant conclusion is that trying to give the compiler a way to prove things in the general case turns out to be equivalent to solving the Halting Problem, and so their decision was to use heuristics that might have false positives, which are annoying to the developer, but can't have false negatives, which could cause runtime issues and be annoying to the end-user.
What you're dealing with is one of those false positive cases. The compiler doesn't have a heuristic such that in one context you can define "if x then y" and then outside of that context you can say "if x" and it knows that
y
holds true.
-
@unperverted-vixen Checking it in VS2017 I see the same behavior. I think what it's seeing is the code path
a == null
(so the first if fails to execute). Because of this, it's hitting the "unassigned variables" check before it hits the "well,d
is null" check and bailing out. That is, since it can't assume that a isn't null and thatd
hasn't been adjusted elsewhere (maybe in another thread?) it bails on the first thing it sees.One potential fix (WOMM): put the
if (d.Length)
branch into the nested ifs:if (a != null) { b = GetA(key); if (b != null) { c = GetA(key); if (c != null) { d = GetA(key); } if (d.Length > 0) { B(a, b, c, d); } } } if (d == null) { return false; }
This compiles fine.
-
@masonwheeler Basically, at the point where the semantic rule assigned-before (i.e., the safety guard for reading a variable) is evaluated, the hasn't been any flow-graph rewriting done, as that rewriting is a lot easier to do if you start with things that are known to be semantically correct. The standard way of describing the assigned-before rule for a converging flow graph is that a variable must be assigned-before on all paths through the graph without considering what the results of evaluating conditional expressions actually are, since that gets expensive (and is done later during the refinement/optimisation stages). It's relatively simple to convert the conceptual level for that into its implications for an
if
(loops are different).The optimiser will be able to throw out the dummy initialisations of
b
andc
once it has proved that the call ofB
must always follow the non-initialisation assignment tod
, but they do need to be there until that reorganisation is done.
-
@benjamin-hall said in Why does the C# compiler think I'm accessing unassigned variables?:
This compiles fine.
But moves the “we failed” check to after the length check, and in this case will cause the code to generate a
NullReferenceException
. Ooops!
-
@dkf said in Why does the C# compiler think I'm accessing unassigned variables?:
@benjamin-hall said in Why does the C# compiler think I'm accessing unassigned variables?:
This compiles fine.
But moves the “we failed” check to after the length check, and in this case will cause the code to generate a
NullReferenceException
. Ooops!That's a micro-optimization. Really.
So rewrite that as
if (d != null && d.Length > 0) { #stuff } else { return false; }
and remove the other branch. That seems to meet all the criteria and show the basic point of having to have checked
b
andc
before getting to theB(a, b, c, d)
call.
-
@masonwheeler, @dkf Thanks for the links and background. Annoying, but understandable.
@Benjamin-Hall, @dkf Thanks for the thoughts on code fixes. The last version looks good to my eyes. I'll give it a whirl and make sure csc agrees. :)
-
@masonwheeler I scrolled down on that text website and got an unskippable video ad. WHAT THE FUCK IS GOING ON THAT THOSE WORDS MAKE ANY SENSE?
-
@ben_lubar Argh. I'm sorry that happened. Maybe use the contact info (I think there is some somewhere) to email him about bad advertisements on his site? :(
-
@masonwheeler said in Why does the C# compiler think I'm accessing unassigned variables?:
@ben_lubar Argh. I'm sorry that happened. Maybe use the contact info (I think there is some somewhere) to email him about bad advertisements on his site? :(
I found a Chrome extension because that pushed me over the edge:
Now I'll never have to worry about it ever again!
-
@benjamin-hall said in Why does the C# compiler think I'm accessing unassigned variables?:
That seems to meet all the criteria and show the basic point of having to have checked
b
andc
before getting to theB(a, b, c, d)
call.Alas, csc disagrees. :(
I ended up rewriting it a bit more - the four GetX methods (simplified version had one, but there's separate GetA/GetB/GetC/GetD methods) now have null checks built into them, instead of relying on the caller doing that check. Slightly more inefficient, but it's more reliable and readable code.