Moneyballing Movies

RTapeLoadingError

And at that point, it's by no means certain that replacing it with a monster business-critical custom application that no-one fully understands is going to be any sort of improvement.

Agreed. I've seen my share of these as well and they're rarely an improvement.

Even if a maintainable like-for-like replacement was designed it would be tricky to pull off. But what happens (in my experience) is that the shiny new application is seen as an opportunity to throw in a bunch of extra functionality.

I don't know what the answer is.

dkf

@RTapeLoadingError said:

I don't know what the answer is.

Accept that it is going to happen and roll with the punches. Try to do stuff in a comprehensible way so that when you come back to it, the WTF-index is relatively low.

RTapeLoadingError

@dkf said:

Accept that it is going to happen and roll with the punches. Try to do stuff in a comprehensible way so that when you come back to it, the WTF-index is relatively low.

I'm a DBA now so there are a whole different set of punches I have to roll with.

dkf

@RTapeLoadingError said:

I'm a DBA now so there are a whole different set of punches I have to roll with.

Fruit punch? Rum punch?

RTapeLoadingError

@dkf said:

@RTapeLoadingError said:
I'm a DBA now so there are a whole different set of punches I have to roll with.

Fruit punch? Rum punch?

There's a very strict "no drinking" policy at work but I get to use this 3 or 4 times a day.

ScholRLEA

@dkf said:

@RTapeLoadingError said:
I'm a DBA now so there are a whole different set of punches I have to roll with.

Fruit punch? Rum punch?

I want to say 'berry punch', but that running joke is from a different message board entirely.

blakeyrat

@xaade said:

Moneyballing doesn't account for one big thing in sports.

At the end of the day, it's the last game you want to win.

No weight was put on winning the final game, and so they lost.

All the Seattle teams must use that.

anotherusername

@dse said:

If Excel sheet is complicated it is just the time to address that by taking it out of prototype and into a product, and you are right it is a programmer's job.

So... time to rebuild it in Microsoft Access?

dse

Why not, even Access + VB will be better than Excel.
If rebuild is an option, it will depend on the long term strategy, budget, and the caliber of your team: do you plan to do any fancy analysis and real data mining, or just off-the-shelf plots, do you want to have a web front-end, do you want to have interactive plots in the web. Visualization is more important to explore data, if you want to find that hidden variable no one found, to make money from.

MS Access is perfect for small mom and pop businesses that want to have a report at end of the month to see how many napkins they sell for every yogurt. I made one of my earliest cash as youth, doing some little maintenance of some file backed Access, and it was fun (I remember some quirks about its security though).

Polygeekery

We had a client (that I ended up having to fire) that bid all of their jobs on these massive spreadsheets. Behemoths. This was probably 2010, and their two estimators both had what amounted to gaming laptops (in order to also run AutoCAD) and the spreadsheets had to be turned on to manual calculation as they would bring a modern gaming laptop to its knees. It would just lock up Excel for long periods of time if you tried to do real-time calculations. Calculations would cascade over many sheets and many cells all changing at once and it would bring an i7 with 8GB of RAM to a standstill.

They were a landscaping contractor...

I attempted to show them better ways to estimate their jobs, but the control freak business owner would have none of it. I used to bid multi-million dollar jobs with spreadsheets that were a scant fraction of the size of those. I bid entire corporate developments that included infrastructure (water, storm drainage, sanitary sewer, force main, etc.), earthmoving, paving, concrete flatwork (sidewalks, aprons, etc.), the whole lot with Excel spreadsheets that never needed manual calculation. These guys killed high-end machines estimating how many bags of mulch and how many petunias they would need.

Scarlet_Manuka

@Polygeekery said:

bring an i7 with 8GB of RAM to a standstill

I do this regularly at work. But it's usually in the case of "here are two data sets that I want to compare against each other" (regression testing hooray), when the data sets get large.

Of course I make it just do the hard part (initial MATCH or VLOOKUP) once, then paste values over the results so that everything after that is fast.

I really wish, though, that there was a built-in way to say to Excel "I'm using a sorted list but I want an exact value match". You can only do one or the other, so you only get binary searching efficiency if you don't need an exact match - which I do at least 99% of the time.

For particularly nasty spreadsheets I've resorted to doing the sorted list match first, then checking to see if the found value matches the search value. It's a little cumbersome, but when you're searching a list of 300k items, very much worth it.

PleegWat

That does not sound like an excel problem.

Arantor

Sounds like "when all you have is a hammer..."

dkf

“If I had a hammer” then all my problems would look like nails…

ijij

@Scarlet_Manuka said:

(regression testing hooray

At first, I read this as "aggression" testing.... after reading the whole post, maybe I was right after all.

Scarlet_Manuka

I'm open to suggestions. For this task, the basic workflow goes like this:

Run query in DB#1 and extract results.
Run similar query in DB#2 and extract results.
Compare results both ways for each transaction.
Investigate discrepancies. Where an issue is found, often this involves writing a query to pick up other places where the same issue has occurred, extracting the results and comparing these against the outstanding discrepancies in the original result sets.
Summarise results by type and cause of issues.

The investigation part is the main one, the rest is just setup. In Excel I can accomplish this part easily using a combination of autofiltering and COUNTIF style functions. (Typically: start with autofilter for an specific issue type and cause blank, examine the first case and establish cause, write a query for similar cases, add results to a worksheet, use COUNTIF in the cause column to see which cases match, autofilter on the results to assign a cause, autofilter on the non-matching cases to clear the COUNTIF formula, repeat until done.)

The third step (cross-comparison) is the only one which sometimes takes a significant amount of time, depending on the load. As I said, this is only because Excel doesn't let you combine exact match searching and ordered list searching, and if necessary I can get around this by writing a compound formula that uses ordered list searching and discards the result if it's not an exact match. I just don't do this by default, because laziness.

If it matters, I'm using TOAD for all the query work. (Much of the investigation work is done through database queries, but in some of it I also run reports out of the target application, which are downloaded as tab-separated text files.)

dkf

@Scarlet_Manuka said:

* Run query in DB#1 and extract results.

Run similar query in DB#2 and extract results.

Compare results both ways for each transaction.

I'm wondering if it is possible to get both result sets streamed in, doing the cross comparison as you go. That would require your cross-match IDs to be the same and for you to sort by them so that you can both detect a mismatch and resynchronise after a problem.

Streaming solutions are pretty neat, especially once you stop thinking in terms of simple linear pipelines.

Arantor

@Scarlet_Manuka said:

TOAD

RUNNNNNNNNNNNNNNNNNNNNNNNN AWAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAY

loopback0

@Arantor said:

@Scarlet_Manuka said:
TOAD

RUNNNNNNNNNNNNNNNNNNNNNNNN AWAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAY

++

Scarlet_Manuka

@dkf said:

I'm wondering if it is possible to get both result sets streamed in, doing the cross comparison as you go.

Hmmm. I could build something like this in Informatica if I wanted to. The problem then is that Informatica doesn't speak Excel, so I'd have to spit out a set of flat files and import them into the workbook to set up the investigation. It's not clear that this would buy me any time overall - it'd be faster for the big runs but it might well be slower for the small ones.

There are probably ways I could streamline the process though, for instance having it mail me the files so I don't have to SCP them off the server and having a template workbook that I could just paste them into. I do a monthly audit process that does something similar and the "open attachment, paste contents into relevant sheet, repeat" process is fairly efficient. I may give this a go.

@dkf said:

That would require your cross-match IDs to be the same and for you to sort by them

Already doing this, the data extraction query builds the key as a composite of half a dozen fields (separated by a character that cannot appear in any of them) and sorts by it.

@Arantor said:

RUNNNNNNNNNNNNNNNNNNNNNNNN AWAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAY

Because SQL Developer is so much better?

Hmm, running it gives me a great feeling already. First I get a UAC prompt, then this:

OK, I'll find what looks like the most appropriate java...

Well, it would have been nice if you'd told me you wanted that.

CLEARLY THIS IS SUPERIOR SOFTWARE THAT I MUST USE FROM NOW ON

Filed under: Options, I have them

dkf

@Scarlet_Manuka said:

The problem then is that Informatica doesn't speak Excel, so I'd have to spit out a set of flat files and import them into the workbook to set up the investigation.

CSV not good enough for you? All of these tools (and many more) will speak it as it's that simple, and all our users are happiest when that's the format that they've got the data in.

It's only going to be a problem if you need lots of complicated formulæ set up and you can't handle things by just importing the CSV onto one sheet of the workbook. If that's what's going on, I'd contend that it's probably better to switch to using a different tool; Excel is definitely nice for ad hoc investigation, but gets more annoying as you want to take on more complex structured tasks (such as where you're no longer working with tables of data all in memory).

Scarlet_Manuka

@dkf said:

CSV not good enough for you?

The flat files I was referring to would be CSV.

I still need a multiple-worksheet setup to do the investigation. (For example, I see a particular result hasn't been generated in the Test environment and I might autofilter for other results in both environments related to the same asset or revenue.)

The formulae aren't complicated at all. Apart from the lookup to get the corresponding result from the other environment, all I'm doing is basically one column in each of the data sheets for the status, which is either "Missing in {other env}", "OK", or "Variance". Then for anything that's not OK I manually assign a cause after investigation (which often involves doing a query and using COUNTIF to check for records that match a given cause). I have a summary sheet that tabulates status for each data sheet and also tabulates causes assigned to the records that were not OK. It's all very basic.

@dkf said:

I'd contend that it's probably better to switch to using a different tool

Still open to suggestions here, if you think this is the case.

dkf

@Scarlet_Manuka said:

I still need a multiple-worksheet setup to do the investigation.

I'm still not at all convinced that Excel is the right tool for the scale of task you're doing, and suspect that trying to get more of it into the DB or into some sort of streaming processing would help. The aim wouldn't be to put it all in there, but rather to do the initial filter out of uninteresting stuff in an efficient way. The uninteresting ought to be in a minority: if it is a majority, then your life is probably too exciting…

But what exactly the right tool is… that's a bit trickier. We're probably talking actually programming something for this.

loopback0

@Scarlet_Manuka said:

Because SQL Developer is so much better?

Yes. By miles.

loopback0

It's Oracle, if the data comes from two databases can't you use a database link between the two?

Luhmann

@Scarlet_Manuka said:

SQL Developer

Yes

Oh you mean Oracle. Well yeah TOAD is the least of your problems then. It still is a a piece of shit though.

Scarlet_Manuka

@Scarlet_Manuka said:

Hmmm. I could build something like this in Informatica if I wanted to.

I realised in the early hours of this morning that this won't work, because the test and production environments are firewalled off from each other for precisely the reason that we don't want dev/test apps accidentally talking to prod DBs or vice versa. So there's no Informatica instance that could pull data from both the test and prod environments.

@dkf said:

We're probably talking actually programming something for this.

This is in no way going to be easier than what I'm doing now.

I'm thinking the better solution is to just build a template workbook with the more complicated lookup formula in place (the one that does the fast binary non-exact match lookup, then checks to see if the result is an exact match). That'll solve the performance issue and it will take less time to do than any of the other solutions we've discussed.

@loopback0 said:

@Scarlet_Manuka said:
Because SQL Developer is so much better?

Yes. By miles.

What are the highlights, bearing in mind that I have no need to connect to non-Oracle databases?

My attempts to use it aren't going well. There appears to be no way to get all my saved connection data from TOAD into SQL Developer, which isn't too surprising but is a pain. But trying to set up connections manually isn't working too well either. Enter data for the first one, click the Test button and get a message saying "Status : Failure -Test failed: IO Error: SO Exception was generated". No idea what that exception means.

Attempting to connect gives basically the same error message, but with some extra information: "Vendor code 17002". Looking at ORA-17002 discussion on the web now, looks like it's a fun one with a variety of obscure possible causes, yay.

Arantor

The I-Hate-Oracle Club is -------> that way.

ben_lubar

@loopback0 said:

@Scarlet_Manuka said:
Because SSDS is so much better?

Yes. By miles.

FTFY