Interviewing for tech stacks you don't know

Weng

So. I need a hadoop guy. Specifically, a hadoop guy with very specific field knowledge. Fortunately there was a local company using hadoop working in that field that recently vanished in a merger and dumped all their employees.

Problem: Nobody in my entire company (not just this office, the whole thing) knows the first practical thing about any big data technologies.

How do you interview for that?

cartman82

No experience with that.

If they have something published on github, you can always go into the issues and see what others are saying / complaining about.

Other than that, and doing a hadoop tutorial yourself, IMO you're out of luck.

Weng

If I needed tutorial level expertise we wouldn't be hiring. We're actually looking for someone to come in and literally solve an entire problem class for us, likely bootstrapping a specialist team. Maybe I'll do the tutorial anyway.

We know the solution is possible down this avenue (previous VPs have had highly paid consultants develop the solution and then balked at the maintenance costs), we just don't have the expertise to figure out who knows what they're talking about.

We have the names of the "heroes" from the other company and will be dispatching our headhunters shortly, but in the event we don't get those....

cartman82

I wonder if you can hire a well known name in hadoop circles to vet your candidates. Like, find an author of a hadoop book and offer them some $$$ to speak to the top 3 candidates for an hour each, over skype.

locallunatic

If HPCs have helped on this stuff but the problem was maintenance costs then could you use one of them as someone to vet the hire like @cartman82 was saying?

Eldelshell

Man, you're in trouble if you think Hadoop is a thing. It's only a part of a lot of interchangeably parts... Maybe you can start learning about it (few hours) and then listen to what to interviewees say and doesn't sound too far off. Also, what exactly you're looking for? A solution architect to implement the thing or a data scientist to retrieve and work on the data? The two profiles (IT vs Math/Statistics) are very different. I did actually setup a few big data solutions some time ago and aren't very hard. Understanding how and what data to retrieve is another monster.

Weng

This proves what I was about to say about our previous highly paid consultants. I was deeply involved in that process and I still have no fucking clue what I'm talking about.

We aren't really looking at this for any sort of Big Data reasons. We have a very well understood set of frequently updated finite datasets that need to be rubbed together in a very well understood and specific way (it amounts to taking a Cartesian product between two enormous tables with some mild mathematics on each result row, and then filtering that result in a variety of creative ways). Difficulty is that the datasets are frakking huge and it takes days to crunch through linearly, even on the fattest of fat boxes.

The problem, however, is highly divisible and we have been led to believe lends itself well to distribution via map/reduce.

Captain

Yeah, map-reduce (as an architecture) ought to work, since you're mapping each row, and reducing via a pair operation.

So what's the problem?

Weng

Various GIS issues surrounding the distances between points on a map. Usually variations on "find the closest n whatevers" and "find all whatevers within x miles" with lots of sprinkles to it so you usually have to take the union of both answers to have a qualifying dataset.

Some low precision variations on the problem can be computed at runtime on our current MSSQL geography stack if the datasets are small enough, but the once datasets pass a certain size (roughly the size of an average east coast state) it starts taking days.

The laser precise version is what our customers truly want, and that problem is basically impossible beyond a certain size threshold in the MSSQL stack (you cannot cache answers if you're being arbitrarily precise).

Captain

Are you really interested in the closest n whatevers outside of the state? (In other words, can you filter out results that are prima facie irrelevant before beginning the n^2 loop?)

Weng

Yep.

Broadly the are two sane ways to approach the problem of geographic indexing.

The first is to index records on a 2d grid of a given precision and use successively more granular indexes to get to the needed level of precision and just determine which squares contain the records to do the actual detailed math with.

This is the approach MSSQL uses.

This works quite well for the "all x within y miles" problem.
The "nearest n" problem requires a lot of cross referencing with other indexes on that model and since geography isn't a core function MS hasn't added good query plans for that yet.

The other approach is to hash the coordinates in such a way that the characters in the hash string are interleaved representations of lat and long. Lat encoded on evens, long encoded on odds for instance. The deeper into the string you go, the more precise.

This has the neat effect of being a 1d index sorted more or less by how close things are to each other (though things get pathological around the edges of the coordinate system).

This system is ideal for "nearest n" (so long as you're guaranteed to stay within the same quadrant of the planet, which we are) but shit for anything involving an exact distance. We could theoretically implement it on SQL, but we're pretty sure we'd still run into scalability problems (particularly because grid indexing still has performance limits)

Since we are usually unioning both problems to get "all x within y but with a minimum of z in case target area is underserved", we need to do both.

Lorne Kates

Here's a four step plan for an interview. (Warning, actual advise below. Those looking for the usual @Lorne_Kates trollshit, the Trans thread is that way...)

Step 0: The usual bullshit
Tell me about your past experience. What's your five year plan? What did you do at Job X? Why did you leave job Y? Was your last boss and coworkers raging fuckwits? What's your biggest weakness?

Step 1: Test her general tech knowledge
This falls under "make them write code in the interview". Given them a fairly generic problem, one that can and should be solved in < 30 minutes. Ask them to write out the solution in pseudocode only. Review the solution with them.
If they've given you a fucking insane, incomplete, or wrong design, you can end things there. They aren't the candidate for you. Otherwise talk to them about why they solved it the way they did, their code philosophy, etc. If you think they have a firm technical base, move to Step 2.

Step 2: Test her Hardup Knowledge
Ask her to take the design you just sketched out, and implement it in Hardup. I know you don't know jack or shit about it. But she will. And you know the business requirements, and the tech requirements. You know what inputs will produce what outputs, what errors should be caught, etc, etc. You probably can review the Hardup code itself, to see if it's documented, written nicely, etc.
This will check if she can take business requirements and turn them into working code (thus letting her interface with the business units). It will also check it she can take business requirements, turn them into pseudocode, and discuss that pseudocode-- which will be required for her to interface with any other coder in your company.
If she can produce a working Hardup program that takes input as it should, produces output as it should, handles errors as it should and doesn't completely crash the computer or network, then you know she's technically competent in Hardup.

Step 3: Test her expectations of what she'll be doing
Simple question: What tools does she expect to have available to her to do the work she'll need to do in Hardup? If this was a .net interview, you'd expect them to ask for a PC, enough CPU and RAM, a copy of Visioning Studioland, a local Microsoft Apache server, etc. You may not know everything she's asking for, but even if SHE knows what she needs, that's a huge boon. Write down everything she says she'll need. Make sure it's available to her on the first day. Provision whatever hardware and software licenses you'll need.
If she says "I don't know" or gives any other sort of vague answers, be wary. You are testing her ability to be autonomous and to be a expert self-advocate in her field. Fucking no one else in your company will be able to proactively get her the tools she needs. You need to know she knows what she needs, and is able to demand the proper tools be available.

There you go. 4 step process. Follow that, and you can interview for a Hardup programmer without knowing anything about it yourself.

Filed under: And now back to your regularly scheduled **Bullshit****strong text**

Weng

Looks like a fantastic plan.

Of course there's a WTFcorp wrinkle: coding in the interview is verboten (or at least effectively so. No computers allowed, and I'm not about to make somebody use a pen and paper).

We'd have to stop at pseudo code.

And ain't no way anyone's gonna have a working computer on day 1. Although with this stack a mac might be kosher and we can get those slightly faster.

Captain

If coding isn't possible, talk through your actual problem. What strategy will she use to implement map reduce? Mock up the architecture on the white board.

AlexMedia

Either that, or try to get an exception to that stupid policy. You know that your company needs someone with rather specialistic skills in a not too distant future, so it's in the company's best interest to hire the best Hadoop developer that's available.

Try to come up with some reasons for why sidestepping the policy would be in the company's best interest ("the cost of letting go and re-hiring is bigger than the re-hiring process alone", "we'll lose lots of time and probably won't make deadlines if...", and so on) and you might get an exception.

But I don't know WTFcorp (and I probably don't want to either) - if your place is anything like some of the places I've worked in the past all those efforts are probably be in vain, because process must be followed. Even if the process is shitty or utterly insane.

Weng

WTFcorp is unusual in that everybody knows our hiring process is shit and nobody cares.

I should make a thread some time.

AlexMedia

Please do.

blakeyrat

@Weng said:

Problem: Nobody in my entire company (not just this office, the whole thing) knows the first practical thing about any big data technologies.

Big data is mostly hype. Most companies don't need it. The majority of companies that think they need it actually instead need "statistically significant data". Very few companies actually need it, and those are ones you've heard of like Facebook, Google, or Microsoft. Possibly Netflix.

@Weng said:

How do you interview for that?

There's not much to Hadoop. You just feed it little micro-programs and tell it to run them. The programs can be written in a ton of different languages... so really what this guy would actually be doing is:

Writing micro-programs in Python or Java or whatever his preference is
Knowing how to admin a Hadoop cluster, including setting one up

With the vast vast vast majority of the work being number 1. Based on this, you can ask the "standard set" of DBA questions and programming questions and get a pretty solid opinion of the guy's capabilities.

The Hadoop-specific admin stuff, frankly, anybody in your office who isn't completely useless can probably learn in a couple weeks.

You'll know he's a really good candidate if he says something like, "you know Big Data is mostly just a buzzword right now, I can get you the same result on a lot fewer servers if we just bucket data like X and Y, and do statistical analysis on Z."

Those are the guys you want, who are decided to solving the problem with whatever tool is available instead of "when all you know is Hadoop, everything looks like a big data problem" guys.

blakeyrat

@Weng said:

Difficulty is that the datasets are frakking huge and it takes days to crunch through linearly, even on the fattest of fat boxes.

The problem, however, is highly divisible and we have been led to believe lends itself well to distribution via map/reduce.

Any decent C# developer ought to be able to write a map/reduce solution in a couple weeks. Map/reduce is really, really simple. You don't need something off-the-shelf for it.

At a previous company, we had a quick-and-dirty taskbar app a developer wrote to do map/reduce stuff, it just ran on every employee's desktop as a Startup item. It couldn't possibly have taken more than a week to set up. When you're in a company environment, you're not fucking Google, you just have a central HTTP server/fileshare where the workers can "check in" and get assignments.

Weng

Yeah. We've thought about homebrewing it. The problem is getting nodes. Our current VP is a self professed "technologist" and has a huge bias against Microsoft. We can hardly get new nodes to scale our existing cluster app (in order to do so I had to spend a week proving it wasn't economically viable to retool to a Linux stack).

blakeyrat

@Weng said:

Yeah. We've thought about homebrewing it. The problem is getting nodes.

Buy cloud services with a corporate credit card nobody looks at too closely. Shove it on all the desktops in your department as a Startup item, and limit it to 1 core.

Look, the real problem here is you have major organizational programs preventing you from creating good software. You can try to work under the table, like for example finding a corporate credit card that isn't watched too closely, but it's going to cripple you in the long run.

Let's say you do find the master super Hadoop wizard. Why the fuck would he want to work for YOUR company? How do you expect him to set up a Hadoop cluster if people who have worked there for YEARS have failed at provisioning servers? How can ANYBODY be productive in an environment where senior developers have to spend a week convincing some moron to not throw money away?

I've changed my mind: your major objective in this interview is to convince this good candidate to tolerate your shitty company environment. Everything else is secondary.

Weng

Yeah that's a good portion of the stock interview. It doesn't ward off as many people as you'd think.

This particular initiative has a good chance of success because we've convinced the VP of the week that its his idea.

As far as cloud services go, every single corporate card gets every single purchase approved at the VP level.

blakeyrat

@Weng said:

This particular initiative has a good chance of success because we've convinced the VP of the week that its his idea.

Then why can't you provision servers? Your words do not match your previous words.

Weng

Can't provision Microsoft servers without a massive argument.

Words like Hadoop, Java and Linux turn this guy on.

blakeyrat

Whatever. I have little patience for shitty companies and people who choose to work at them. Frankly, I think you deserve to fail and I feel sorry in advance for the Hadoop expert you trick to work at that career-ending depressing place.

TheCPUWizard

There are consulting firms that provide high-end technical screenings for various stacks. These are independent of the rest of the supply chain and charge a fee that is independent of if the person gets hired or not. These can be a very good value.

Lorne Kates

Have you thought about turning off the fire suppressant system, then burning the entire data room to the ground?

Then hack the VP's wifi, so that when he google for how much replacement servers will cost-- intercept the HTML, and multiple all prices by 25x.

Then go to him with a sane data setup that doesn't use Hardup.

It may be easier than interviewing.

Filed under: I gave you my serious answer. Back to jokes. Yes, "jokes". click me.

fbmac

Why they always want something so specific? People can learn, and lots of them can learn a new nosql database in a few hours.