Excel is hard. Actually, no, coding is hard.

Maciejasjmj

So I'm back at work, and I get the project to work on. The project's backstory is not very encouraging - back in the dark days of the company, they threw together a bunch of fresh graduates with no project management in place and made them do stuff.

It went about as well as you'd expect. It has since grew into a huge behemoth, too, which means any refactoring effort is out of the question - it would simply take way too much time and effort to bash this code into sanity. End result - the client got mighty pissed, a lot of putting out fires commenced, the new programmers tried not to break the old code, but brought new libraries into the mix, and somehow they managed to make the thing somewhat work.

I, unfortunately, got tasked with Excel import - one of the places with legacy code. Basically, it uses COM interop to get the data and stuff them into strongly-typed datasets (why strongly-typed datasets? I like to think there's a good reason for them, but judging by the amount of wrong in the code, the reason is probably closer to "it didn't occur to us that you can just make a model class"), which then go to the database. But the Excel reports' format has changed since, and we need to accommodate the code for the new version.

First slap in the face - most variables and function names are in Polish. Smelly, but not too bad.

Then the insanity begins - at first mildly...

private static void HandleException(Action action)
        {
            try
            {
                action();
            }
            catch (Exception ex)
            {
                Log.SaveException(ex);
            }
        }

...then fully-blown.

 private static void ThrowException(string cMsg)
        {
            throw new Exception(cMsg);
        }

There are 10 different things to be extracted - and each needs a very slightly different approach. Obviously, it ended with 10 functions - but how do you determine which one to call where?

private enum ExcelAlgorithm
        {
            WidgetsA = 1,
            WidgetsB = 2,
            //...
        }

private struct Header
{
    public ExcelAlgorithm Alg;
    public string Name;
    public int rowMin;
    public int rowMax;
    public int colMin;
    public int colMax;
}

private Header[] headers;

Okay, so there are multiple header strings, and for each one there's an algorithm. Now for the "filling" part:

headers[1].Alg = ExcelAlgorithm.WidgetA;
headers[1].Name = "Widget A";
headers[2].Alg = ExcelAlgorithm.WidgetB;
headers[2].Name = "Widget B";
//snip: 10 more of that, and a code that then searches a column for the magic strings here to determine the areas
//it doesn't help that three of the strings have typos in them

And then, we run each part in order, right? Well...

foreach (var header in headers)
{
    switch(header.Alg)
    {
        case ExcelAlgorithm.WidgetA:
            getWidgetAData();
            break;
        case ExcelAlgorithm.WidgetB:
            getWidgetBData();
            break;
         //...
    }
}

So, the good for-switch paradigm (except for headers[0], which just never gets set - counting from zero is hard!).

And some more fun:

            try
            {
                Thread.CurrentThread.CurrentCulture = new CultureInfo("pl-PL");
                _workbooks = _app.Workbooks;
                _workbook = _workbooks.Open(file, true);
            }
            catch (Exception)
            {
                Thread.CurrentThread.CurrentCulture = new CultureInfo("en-US");
                _workbooks = _app.Workbooks;
                _workbook = _workbooks.Open(file, true);
            }

private static bool IsDigitNonZero(string strValue)
        {
            var ret = false;
            if (strValue != null)
            {
                if (strValue.Length > 0)
                {
                    try
                    {
                        var value = Convert.ToDecimal(strValue);
                        if (value != 0)
                        {
                            ret = true;
                        }
                    }
                    catch (Exception)
                    {
                        ret = false;
                    }
                }
            }
            return ret;
        }

private void CloseFile(string excelFile, bool save)
        {
            HandleException(() =>
            {
                if (save)
                    _workbook.Save();
                _workbook.Close(false, excelFile);
            });
            HandleException(() => _app.Quit());
            _workbook = null;
            _workbooks = null;
            _app = null;
            HandleException(() =>
            {
//apparently, this is actually the correct solution...
                GC.Collect();
                GC.WaitForPendingFinalizers();
            });
        }

Oh, and there are barely any comments (and they mostly specify for which report version the piece of code was intended). And instead of magic cell positions (which would at least let me go on with writing my code), there are magic offsets to the positions of the magic strings. Which means you can move the header a few fields down, or up, but any other change to the worksheet breaks the code. Defensive programming!

Oh, and that's just one codefile. In a 40-project solution. I could probably fork TDWTF just based on this code...

Onyx

@Maciejasjmj said:

So, the good for-switch paradigm

No, actually, if you look carefully at the anatomy here, it has mutated into a foreach-switch.

Hopefully, this mutation will prove to be non-advantageous. If the natural selection starts favouring it... Well, that's half way to the theoretical while-switch paradigm. How could humanity survive such a creature is yet unknown.

Eldelshell

TIL C# has structs.

Jaime

@Maciejasjmj said:

why strongly-typed datasets? I like to think there's a good reason for them, but judging by the amount of wrong in the code, the reason is probably closer to "it didn't occur to us that you can just make a model class"

Strongly typed datasets are model classes - from twelve years ago. They're worse than most modern alternatives, but better than hand coding.

blakeyrat

I use structs all the time, am I doing it wrong?

Onyx

@blakeyrat said:

I use structs all the time, am I doing it wrong?

They don't have methods, so they are not OOP-y enough? So, maybe?

Honestly, I don't realize what's so weird about that either. But I don't use .NET so I thought I'm just ignorant. Again.

Eldelshell

It's not that it's wrong, I just didn't think C# had structs because of its copy/paste nature from Java, that doesn't have them. Honestly, the only .Net code I've seen in recent years has been produced by people I wouldn't trust to even write me a FizzBuzz so that might explain why I haven't seen it.

Gąska

@blakeyrat said:

I use structs all the time, am I doing it wrong?

If they're passed between functions and over couple dozen bytes big then yes you are.

FrostCat

@Eldelshell said:

It's not that it's wrong, I just didn't think C# had structs

IIRC structs are for when you want value semantics instead of object semantics?

Download Visual Studio 2003 Retired Technical documentation from Official Microsoft Download Center

Maciejasjmj

@Onyx said:

Honestly, I don't realize what's so weird about that either.

Structs themselves are fine (if a bit weird in C#). It's what's done with them later that makes me want to find whoever's responsible for the code and have him never program anything more complicated than a washing machine in his life.

Onyx

@Maciejasjmj said:

Structs themselves are fine (if a bit weird in C#).

I'm still confused by this slightly. I rarely use structs in C++ too (usually I need some functions to manipulate them, so making it a class makes sense), but I don't think they are in any way weird... oh well.

@Maciejasjmj said:

It's what's done with them later that makes me want to find whoever's responsible for the code and have him never program anything more complicated than a washing machine in his life.

I wouldn't let them interact with anything but a cluebat. And I wouldn't let them operate it.

FrostCat

@Onyx said:

I'm still confused by this slightly. I rarely use structs in C++ too (usually I need some functions to manipulate them, so making it a class makes sense), but I don't think they are in any way weird... oh well.

In C++, structs and classes are exactly the same, except the default access specifier is different. In C#, things derived from Object have object semantics, and things derived from Struct have value semantics, which means this: http://msdn.microsoft.com/en-us/library/aa664472(v=vs.71).aspx

Eldelshell

What bothers me about structs in C++ is they aren't objects so I find them weird, but I always attributed them to its C heritage and didn't care. OTOH, C# having structs is a nod for C++ devs to come aboard?

JBert

@Maciejasjmj said:

Structs themselves are fine (if a bit weird in C#). It's what's done with them later that makes me want to find whoever's responsible for the code and have him never program anything more complicated than a washing machine in his life.

You know, this might explain why I lose a sock every once in a while.

Please give them an abacus, that's easy enough to reset.

Eldelshell

Just you wait for the IoT and it will Tweet you the geolocation of your missing sock. That shall bring a new life to this site... Washing machines posting to FB, the future!

FrostCat

@Eldelshell said:

structs in C++ is they aren't objects

Wat. The only difference between structs and classes in C++ is the default access specifier. Read the C++ FAQ, section 7.8.

In .Net they are different as I outlined above.

PleegWat

I expect the example struct given in the doc is a good usecase - you keep an X and Y coordinate together, but there's no separate memory region or management.

Eldelshell

Go figure, the C# structs are closer to C structs than C++ ones. Again, TIL.

blakeyrat

I used to do coding for an old MUD based on SMAUG. Some previous coder had a bunch of C structs with functions in them that he treated basically as classes (with no access controls of course). I had no idea such a thing was possible until I'd seen it done.

Of course, over the next 6 months we converted the whole codebase to C++, which in retrospect was a really stupid thing to do but oh well.

Gied

Structs do have methods in C#. Also constructors.

HardwareGeek

@Maciejasjmj said:

have him never program anything more complicated than a washing machine in his life.

My employer makes chips that go in washing machines and the like. Programming a washing machine is harder than you think.

PleegWat

You can do some nice things with function pointers, and they get at their most useful if you store them in a struct somewhere, because you can initialize the same struct to have different parsing, next-layer processing, etc. It's a limited class concept though - there's no this, and no good way (that I know of) to do inheritence.

boomzilla

@Onyx said:

No, actually, if you look carefully at the anatomy here, it has mutated into a foreach-switch.

And really, it's not the classic anti-pattern at all.

dkf

@PleegWat said:

no good way (that I know of) to do inheritence

You can do it by making the type struct of the subclass start with a literal copy of the type struct of the superclass, so that a pointer to the subclass type can be safely cast to a pointer to the superclass type and everything will work. C actually does guarantee enough that this can work. (You can then wrap it all up in a union to make the converting between types a little less heinous, but that's totally unnecessary.)

I used to have this quite a bit waaay back when I used Xt and Motif. I'd rather not remember the details any further.

disieh

You can emulate this and inheritance. The trick is implementing vtables and whatnot by yourself. That's actually how Win32 API used to do things, since it was developed at the time when there was basically no C++ as we know it. See http://blogs.msdn.com/b/oldnewthing/archive/2011/10/24/10229097.aspx

To add on what nifty things you can do in C, Linux (i.e. kernel) has pretty good examples of how to do OO-ish stuff. Look for various *_ops structs, for example http://lxr.linux.no/linux+v3.17.3/drivers/usb/core/usb.c#L342

PleegWat

Huh, I've actually used a bunch of those. _ops structs in particular, for interfaces (mine was called _api).

scrib

That just gave me a flashback to playing around with SAS/C in high school on the Amiga. I was programming crappy MUI (Magic User Interface, a UI skin library, donchaknow) toolets, and the way to interface with the Workbench UI was something like that. Most of the details completely elude me (so I'm probably wrong in some way here), except that I learned that UI programming is something I prefer to avoid at all costs.

disieh

@Eldelshell said:

Go figure, the C# structs are closer to C structs than C++ ones. Again, TIL.

Actually, it's more complicated than that. Structs in C terms are PODs in C++. Basically they are a special case of a class that only has no private nor protected members, no user defined copy-constructors, destructors or assignment operators and has no non-POD static members or references. The primary use case is when you want to know exactly the memory layout of the object. See the following link for a more thorough explanation.

What are aggregates and trivial types/PODs, and how/why are they special?

This FAQ is about Aggregates and PODs and covers the following material: What are aggregates? What are PODs (Plain Old Data)? More recently, what are trivial or trivially copyable types? How are t...

I'm not totally clear on how structs behave in terms of memory layout in C#, but I'd imagine they are in the language for the same reason C and C++ PODs exist.

EvanED

The key general point is that almost everything in C++ has a very lightweight translation to something in C. (Or, in C++11, there's a very lightweight translation to something in simpler C++ that has a very lightweight translation to C. :-)) Most of these would actually be fairly reasonable to actually hand-write in C if you were forced to use that language.

The one real exception is exceptions, where the best implementations do things that have no reasonable C equivalent.

(You could also argue templates fall into the above category, but I disagree for reasons that I'm too lazy to type out now.)

Eldelshell

Wow guys, thanks, I've learned more of this stuff with your posts than with a 100 books.

EvanED

I guess dynamic_cast is nontrivial to implement too. I actually don't really know how that's done by C++ compilers in much detail.

CreatedToDislikeThis

Couldn't you implement dynamic_cast via a bunch of enums & comparison logic hand-coded to match the (also hand-coded) inheritance structure? (+ a class ID in the hand-coded vtable).

I know that the itanium ABI implements dynamic_cast via a horridly complicated traverse over a tree of class infos saved in the binary, but is that really required?
After all the input is an object with a class ID in the vtable, and a class ID to compare against. Each class ID has a fixed set of class ID it can dynamically be cast to, so can't a "trivial" (potentially huge) double-switch statement suffice?

(In the multiple & virtual inheritance case, you would also need to adjust the object pointer or do whatever dark magic your manual multiple & virtual inheritance implementation needs)

EDIT: Granted, the Itanium ABI's dynamic_cast handling is only horrendously complicated once multiple & virtual inheritance (gag) come to play. If you're only using single inheritance, you could trivially traverse a simple hand-made tree structure represnting the classes as well.

EvanED

There are a number of issues with this. For example, the compiler doesn't have the whole inheritance tree available, so you'd need either one dynamic_cast function per compilation unit or for the linker to have smarts to merge the dynamic_cast functions across compilation units. Even if you do the latter, shared libraries means that even the linker doesn't have the whole inheritance tree, so you'd have to somehow arrange that common classes have the same IDs between EXEs and SOs but non-common classes use mutually distinct ways.

It might be possible, but it certainly isn't easy.

Besides, the traversal for virtual & multiple inheritance doesn't need to be that bad seemingly: here is what Pathscale's compiler does: https://github.com/pathscale/libcxxrt/blob/master/src/dynamic_cast.cc#L146. I haven't looked enough to really understand it, but it doesn't look all that horrid.

CreatedToDislikeThis

Oh - that's indeed pretty simple.

The implementation I previously looked at managed to complicate this quite a bit by distinguishing between public & private inheritance (and making sure the target class was accessible publicly from the static class (possibly going through the actual class)) and by checking for duplicate inheritance of the target class (failing if it's publicly inherited multiple times, but not if only one of those inheritances was publicly accessible without going through the actual class...)

[size=9](Details might not be 100% right)[/size]

FrostCat

@EvanED said:

The key general point is that almost everything in C++ has a very lightweight translation to something in C.

And the reason for that, is that the first C++ compiler, CFront, worked by converting C++ into equivalent C.

Maciejasjmj

Exception was unhandled by user code:
"The template has been successfully updated"

Because we wanted to display an alert on the webpage, and there was already a nice code for that in the default exception handler, so the UpdateTemplate function just throws at the end.

Fucking hell.

Deadfast

@Maciejasjmj said:

Exception was unhandled by user code: "The template has been successfully updated"

Because we wanted to display an alert on the webpage, and there was already a nice code for that in the default exception handler, so the UpdateTemplate function just throws at the end.

Fucking hell.

Or maybe they really didn't expect their code would work? ;)

Steve_The_Cynic

@disieh said:

Actually, it's more complicated than that. Structs in C terms are PODs in C++. Basically they are a special case of a class that only has no private nor protected members, no user defined copy-constructors, destructors or assignment operators and has no non-POD static members or references.

To be C-like in C++, a struct must have no C++ elements in it at all. No non-POD members (static or otherwise, and bool is not a POD type for this purpose), no member functions, no static members, no access specifiers (not even "public:"), no inheritance list. I think the standards might even hint at such a struct being guaranteed to be passable between C and C++ (but it's been a long time so I'd be reluctant to assert this). In essence, a C-like struct is one that would be accepted by a C compiler without any special hand-waving.

@disieh said:

The primary use case is when you want to know exactly the memory layout of the object.

For fuck's sake, NO.

If you want to know exactly the memory layout of the object, make an array of char and pack stuff in by hand. The most you know in "standard" C is that the fields will be in the same order in memory that they are in the struct definition, that they won't overlap each other, and that there are no hidden data members. There will be unknown amounts, including none at all, of dead space between any two adjacent fields, and between the last field and the end of the struct's memory footprint. Hell, it's theoretically possible that any non-struct field (i.e. int, double, etc., but especially 16:32 segmented pointers on x86) may itself contain unknown amounts of dead space.

However, all that pales into insignificance compared to what happens in C++ objects (whether defined using the keyword "class" or the keyword "struct"). There will be hidden dead space as in C, but also hidden invisible member variables. A class that has virtual functions will have a hidden member somewhere that helps the code find the true function to call (usually a "vtable pointer" but it doesn't have to be done that way). A class that has a virtual base class ("class X : virtual public Y") will have an embedded offset to help trampoline virtual functions of Y that are overridden in X find (based on the location of Y) the part of X that is not shared with Y.

Caveat lector: C99 might have changed this for C, and C++11 might have changed it for C++, but I doubt it.

dkf

@Steve_The_Cynic said:

C99 might have changed this for C

This is the sort of thing that's usually locked down harder in the platform ABI. While compilers might do odd things with structure packing, if the generated code hopes to interact with the OS it had better use the same struct packing model as the OS and the standard libraries. Only programs content to live in splendid isolation need ignore such issues.

There's often some #pragmas (or equivalent) for when precise control is needed.

Steve_The_Cynic

@dkf said:

There's often some #pragmas (or equivalent) for when precise control is needed.

Yes, indeed. I know that. That's why I said 'in "standard" C' rather than 'in C'.

tarunik

@Steve_The_Cynic said:

To be C-like in C++, a struct must have no C++ elements in it at all. No non-POD members (static or otherwise, and bool is not a POD type for this purpose), no member functions, no static members, no access specifiers (not even "public:"), no inheritance list. I think the standards might even hint at such a struct being guaranteed to be passable between C and C++ (but it's been a long time so I'd be reluctant to assert this). In essence, a C-like struct is one that would be accepted by a C compiler without any special hand-waving.

Check your definitions (N3337 9 (class)/7):

A *standard-layout class* is a class that:

has no non-static data members of type non-standard-layout class (or array of such types) or reference,

has no virtual functions (10.3) and no virtual base classes (10.1),

has the same access control (Clause 11) for all non-static data members,

has no non-standard-layout base classes,

either has no non-static data members in the most derived class and at most one base class with non-static data members, or has no base classes with non-static data members, and

has no base classes of the same type as the first non-static data member.

and in 9 (class)/6:

A *trivially copyable class* is a class that:

has no non-trivial copy constructors (12.8),

has no non-trivial move constructors (12.8),

has no non-trivial copy assignment operators (13.5.3, 12.8),

has no non-trivial move assignment operators (13.5.3, 12.8), and

has a trivial destructor (12.4).

A trivial class is a class that has a trivial default constructor (12.1) and is trivially copyable.

In other words, A in this example:
[code]

class B
{
static double j;
};

template <typename T>
class A: private B
{
protected:
bool b;
void (A::foo*) (int, int*, T);
public:
void frobnicate (int, int*, T);
};
[/code]

is standard-layout and trivial, which makes it a POD class as per 9 (class)/10:

A *POD struct* is a non-union class that is both a trivial class and a standard-layout class, and has no non-static data members of type non-POD struct, non-POD union (or array of such types). Similarly, a *POD union* is a union that is both a trivial class and a standard layout class, and has no non-static data members of type non-POD struct, non-POD union (or array of such types). A *POD class* is a class that is either a POD struct or a POD union.

dkf

@Steve_The_Cynic said:

Yes, indeed. I know that. That's why I said 'in "standard" C' rather than 'in C'.

But you never build in pure standard C. You're always using a particular implementation on a particular platform, and C has always had a major goal of allowing you to interact with the OS on the platform (or even of allowing the implementation of the OS) so it compromises quite a bit. In reality, most structure packing is done pretty much the same by virtually everyone by default; it's too much of a hassle when dealing with networked applications otherwise. (Byte-swapping is easy by comparison, especially as there's decent macros for it.)

The C Standard doesn't require this, but reality has this habit of intervening and keeping shit real.

tarunik

Yeah -- there are only so many ways to lay out a structure, given target alignment requirements/recommendations, data type sizes, and the alignment requirements that are imposed by the C and C++ standards. Most of the disagreements you'll run into are either in data type sizes, or absolute minutiae such as bitfield and char signedness.