I asked for the data. But I got this.



  • (Anonymized via being vague on details.)

    So, I'm still at the same job I've been at for a while. It's mostly been non-WTF so far, but this year I sort of got promoted (going from being a subordinate on three different projects, to in charge of 1, without a pay rise to match). (The theory is that it's around the same amount of work; so far, I agree with that.) This isn't the WTF.

    Anyway, in order to do this job, I need some data. (It's the sort of thing that could reasonably be stored in a non-relational database, and that might reasonably be placed into one table of a relational database, which is what I'm planning to do with it.) So I email the person whose job it is to track the data asking for it.

    I get back an Excel spreadsheet. OK, so that's a reasonable way to transmit this sort of data; spreadsheets aren't the same thing as databases, but Excel had enough (non-relational) database features snuck into it that it can be used like that. I open it in LibreOffice (the work computers here run Linux by default, although I could request Windows if I needed it), and it looks reasonable at first glance. Then I try to sort it, and it refuses to. Huh. Something about merged cells.

    Closer investigation reveals something odd. You know those old websites with a fixed layout, where someone has used tables for absolutely everything and merged cells together in order to create rectangular regions? This Excel sheet was the same. It looks like a table, but one cell of the table doesn't correspond to one cell of the underlying spreadsheet, and there are merged cells everywhere as a result. (LibreOffice refused to sort it because it couldn't figure out what to do with the merged cells. That's reasonable too.)

    So, how do I get the data with the sort order I want? Well, it turns out that this spreadsheet is a workbook with multiple tabs. The tabs show various subsets of the data, with various sort orders. (It does do a reasonable job of anticipating all the sort orders and subsets I might reasonably want, actually. Although it'd make more sense to just let me filter and sort it myself; Excel and LibreOffice can both do that, and I can easily export the data into something else in order to do that too.) Two of them are basically identical, with marginally different formatting. I think it's because it's something that I might reasonably need two copies of. Or because I might not know how to reformat it, I guess.

    Scratch the bit about anticipating all the sort orders I might reasonably want. The one I was originally trying? It ain't on there.

    There's also an "Instructions" tab, which is basically a word processed document, stored with one paragraph per cell. It contains instructions for running the report that produced the Excel spreadsheet I've been given, complete with warnings that the instructions will become inaccessible at a certain point in the report-running process. And a warning that the report works best in PDF format. (My pet current theory: the reason the Excel spreadsheet I've got is in such a strange state is that it was automatically converted from a PDF, somehow.) It also contains descriptions of each of the other tabs, in case I somehow can't see how they work by myself. (BTW, I can't run the report myself; I don't have access to the main database or the software used to generate it, which is entirely reasonable.)

    Luckily, the last tab on the spreadsheet actually contains a reasonably sane tabular view of the data, so I will be able to do my job after all. Just wanted to enumerate the WTFs that occurred on the way to a happy but slightly confused ending.

    (Bonus WTF: the spreadsheet is missing one column I requested. The only reason I requested it was because the instructions left behind from the person previously in my position said that it was necessary, but not why, and I wanted to stave off a potential self-WTF down the road. I can't think of any reason I'd need to know that information. So I guess I'll just try to make do and see what happens.)



  • Maybe I missed something, but if you're going to put the data in a database table anyway, why does it matter how the Excel sheet is sorted?

    Of course the merged cells might screw up the SQL data importer, but that's a different issue.



  •  Well, what happens if you try to save each seperate sheet as a csv file from Excel?



  • @blakeyrat said:

    Maybe I missed something, but if you're going to put the data in a database table anyway, why does it matter how the Excel sheet is sorted?

    Of course the merged cells might screw up the SQL data importer, but that's a different issue.

    I was trying to sanity-check it by eye before putting it in the database, and was also trying to get an idea of how it was distributed. (You're right, it doesn't matter for the database itself.)

     



  • @ais523 said:

    Something about merged cells.
    Seems normal for an Excel spreadsheet. In database terms, this is a one-to-many relationship, meaning your database would need multiple tables. Exactly how many tables you'd need would depend on how many columns there are with merged cells, and how cleanly the merging pattern appears.

    If there's only one column with non-merged cells (or a simple pattern where it's always the same columns that are non-merged, and they are always non-merged at the same time as each other), then you only need two database tables. Table#1 would contain all columns that span two (or more) rows, as well as a key. Table#2 would contain the contents of the remaining rows, and a reference to which entry in table#1 the data relates to.

    Of course, it's possible to have merged cells in all sorts of horrendous patterns, and in those cases, you'd need more tables. It's possible they could represent a many-to-many relationship, which would require a minimum of three tables for the cleanest solution - one extra table that gives the links between the other two, referencing each table's data by their relevant keys in order to link them.



  • I used to see this sort of thing all to often, the mis use of spreadsheets that is.



    Someone, somewhere, perhaps a long time ago in a galaxy far far away decided that they needed a table and thought "Well, Excel has rows and columns, this makes sense.". For transfering simple tables or lists between departments where the data would most probably be processed manually, it kind of worked and so it became the standard. A flat CSV file would probably have done the same job, but when you click Save, it gets saved as a spreadsheet by default.



    Skip forward a few years and someone starts adding macros to automate some of the processing, conditional formatting and other little tweaks, suddenly the spreadsheet that should have been a CSV has become a database. If anyone dares suggest that it is getting a bit silly the answer is always "Well DEPARTMENT asks for it in this format, if we change it will just be more work converting it every time we need to send them a report.". Of course DEPARTMENT only asks for it in this format because way back when the system was started, that's the format you sent to them and so they worked with it. Now no-one wants to change.



    If any pesky IT person tries to suggest that it could be done better it suddenly gets political. The idea of putting it into one unified database is out of the question because, how DARE that IT guy from over in OTHEROFFICE try to mess with the systems in DEPARTMENT. The people back home get all uppity because taming the spreadsheet beast has become someone's job, if IT guy simplified and automated the whole process secretary would be out of a job!




    Its a symptom of something I've probably ranted about before, IT is supposed to do what the name suggests, employ technology to manage information. However there are layers and layers of management in any large enough organisation who exist solely to manage information and they see IT as a threat. Even the managers managing the IT don't want to take full advantage of the possibilities for this very reason.



    In my last job there were multiple examples of this:



    To get a new reservation on the DHCP server the details of the equipment would need to be put into an excel spreadsheet, this spreadsheet then had to be saved and emailed to a secretary who would open it up and copy the information into a web based form, the web server would actually update the records and the new DHCP configuration would be generated from those records nightly.

    Getting an IP reserved required all this effort and waiting around because the central IT people who managed the DHCP server didn't like the peripheral department IT people messing with their stuff, years back it used to be a paper form that you were required to fill in and send to secretary in the internal mail and it just evolved from there.



    To purchase something, you would have to fill in a form and get it approved by the boss before it could get purchased, nothing unusual there. However the system evolved, it became a process of opening the blank requisition template and typing the details in, emailing it to the boss, who would then email it onto someone else with the word "approved" tagged on. The form would then get printed anyway because the accounts people are used to working with paper forms and that is what their system is built around. Of course long ago software had been bought for purchasing so the paper form would get handed from one side of the office to another where it would get typed into that system.



    My boss was always requesting information for reports, or requesting we tour the building taking an inventory of something or checking the electrical safety test stickers. I started building databases and reports to automate all this, you didn't have to go and check if the electrical safety test was out of date because you'd get an email telling you. You didn't have to take an inventory of all the software installed on a computer because it would be queried remotely. The same went for all devices attached to our network, end of month reports on printer usage and so on...

    It all got pretty much ignored, if the reports generated and emailed themselves, what was his job? If his subordinates didn't have as much work to do, why does he need so many? If he has less than the required number of people under him he no longer qualifies for the pay grade he is on.




    In the end the IT department wasn't helping the organisation manage its information more efficiently with the use of technology, it just became another department juggling spreadsheets.



  • @phonon said:

     Well, what happens if you try to save each seperate sheet as a csv file from Excel?

     

    You die.

     


  • ♿ (Parody)

    @EncoreSpod said:

    I used to see this sort of thing all to often, the mis use of spreadsheets that is.

    Speaking of which...Your wall of text had too many paragraphs. Try merging them together next time for maximum effect.



  • @EncoreSpod said:

    I used to see this sort of thing all to often, the mis use of spreadsheets that is.
     

    AOL, Brotha.

    However, you can't disagree with the premise that small datasets are easily managable when presented in tabular format with the ability to search and update at your fingertips - and that's something Excel (or any spreadsheet software) is quite good at. I would formerly rail at people who use Excel as a database, but really they're using it as a tool that provides great accessibility to that data.

    What other tool exists to mine and manipulate data so easily?  And how does the way in which they're used differ from the way in which people (mis)use Excel?

    @EncoreSpod said:

    In my last job there were multiple examples of this:

    Many organisations still see IT as a department of "doers" so never involve them during the thinking side. Part of this is attributable to the IT culture in general, the stereotypical geeks that work their magic in darkened rooms to keep things working but never really interact with the rest of the business - possibly because they don't speak the same language as the business, but also because the business don't see learning a whole new jargonistic language just to be able to understand nerds as a worthwhile investment. IT has ostracised itself, and wonders now why it's fighting against this perception of being insular and separatist.

    I know business isn't blameless, and bridges need to be built. I've always advised geeks to introduce themselves to the boardroom with a simple question: "how can we serve the business?" - something as simple as listening to problems without rushing to provide solutions can easily engage the business and show that IT can easily be an enabler.



  • So, there's a database-backed system, that generates a report into this Excel spreadsheet based on the contents of said database ... yet you're taking that spreadsheet as input to build yet another database? Isn't that the WTF in and of itself?

    /confused

    Couldn't that 'report' be, instead, a read-only, materialized view which had wider access than the rest of the database? I don't understand taking data out of a perfectly good data structure and putting into a different data structure; we do that shit a lot here and it angers me (there are joins across multiple Oracle schemas, multiple Oracle servers -- no clustering -- that are pulled only to fill in a different schema's tables.)

     

     



  • I'm guilty of Excel (mis)use. For me, Excel gets used for everything. Writing a script to query a list of machines? If it's a simple 1-line command, I won't bother getting it to read a file of machines - I'll paste the list of machines into Excel, and use Concatenate to write the command. Autofill down, copy, paste into notepad, save as .bat, and go.

    Ditto for pretty much any simple data analysis. In most cases, a database is a better tool for the job, but generally, I'm being asked to produce a specific result for a given case, and once done, the source data will never be referred to again, nor will any fresh data be added to it. I just don't need the additional power provided by a database, and I'm more proficient in Excel than SQL or Access. In the time it would take me to put together a database to hold the data, I can paste the lot into multiple Excel sheets and get it to spit out the results I want, and only those results. That gets fired off th $boss, and I get back to the internet.



  • @zelmak said:

    So, there's a database-backed system, that generates a report into this Excel spreadsheet based on the contents of said database ... yet you're taking that spreadsheet as input to build yet another database? Isn't that the WTF in and of itself?

    It's but a minor WTF from my point of view. Due to some apparent deeper hidden layers of WTF I haven't even begun to appreciate yet, the main database doesn't support permissions checking at any sort of reasonable granularity, so to get data from it, I have to ask one of my superiors to get it for me, because it would be unwise to give everyone here access to the whole thing. And they check to see if I should be allowed it before handing it back. It's sort-of a case of delegating your security model to human secretaries.



  • @timbstoke said:

    Ditto for pretty much any simple data analysis. In most cases, a database is a better tool for the job, but generally, I'm being asked to produce a specific result for a given case, and once done, the source data will never be referred to again, nor will any fresh data be added to it.
     

    Take a look at OLAP tools. There are very powerfull and expensive ones, as well as free less powerfull options (I'm not aware of free very powerfull ones, if there is one, I'd love to know it). They are made to solve exactly that kind of thing.



  • @Mcoder said:

    Take a look at OLAP tools.

    OLAP? To do a one-off analysis you'll never look at again?

    Maybe I'm using some weird-ass OLAP tools, but are ANY OLAP tools optimized for this use-case? Any? Ever?



  • @blakeyrat said:

    OLAP? To do a one-off analysis you'll never look at again?

    Maybe I'm using some weird-ass OLAP tools, but are ANY OLAP tools optimized for this use-case? Any? Ever?

     

    No, not for ONE one-off analisys that you'll never look again. The tools are optimized for making several one-off analisys, each one you won't look again.

    If this were a one-off case, it would be overkill. But timbstoke seems to have a routine of making one-off analisys.

     



  • @Mcoder said:

    No, not for ONE one-off analisys that you'll never look again. The tools are optimized for making several one-off analisys, each one you won't look again.

    If this were a one-off case, it would be overkill. But timbstoke seems to have a routine of making one-off analisys.

    No?

    It takes longer to build and query your OLAP cube than it would to just query the damned thing directly each time. OLAP only saves you time if you're running the same small set of queries repeatedly on a hourly/daily/monthly/whatever basis.


Log in to reply