A generic coworker WTF

jetcitywoman

I've noticed over the years among the many people I've worked with that there are always a handful who... I like to describe it as "working by rote". Remember in college there were always people who typed the examples from the textbook into the computer verbatim? That's ok unless the examples are buggy. I've found people in business with the same mindset. They can't seem to handle critical thinking or troubleshooting, or thinking outside the box even a tiny bit.

One guy I met (and occasionally have to grit my teeth and work with) did exactly this at a customer site: needed to back up the system before performing an OS upgrade. He shut down the production system, sat down and proceeded to read the backup manual. He was perplexed when the examples he was typing in didn't work. When I tried to clue him in that the actual device names weren't the same as the examples in the manual, he got annoyed and said he wanted to do it "his way".

Once I was asked to help a fellow lady programmer debug a cobol program (this was in the 1990's). I started just by sitting next to her watching but letting her do her thing. It became apparent to me that she was just reading the code on the screen, trying to guess where it might be failing. I showed her how to set a breakpoint and explained that she should set it close to where the problem might be happening. Ok, says she. Stares at the screen some more. She couldn't even guess where to put the breakpoint. So I picked a place and we ran it through. I was off, but now we had a better idea where the failure might be. I told her to start over and set the breakpoint closer. More staring at the code. In a Socratic fashion, I asked her where she thought we should set the breakpoint. After some thought, she pointed to a line of code *before* the previous breakpoint.

Now, I know some people are faster than others, and everybody has their own way of thinking things through. But I was amazed that her style was mentally walking through the code, guessing what the variable contents could be and which paths might be followed. Having the debugger show her what is actually happening was a foreign concept to her. I've met others like that since then, they seem to take forever to accomplish tasks that are trivial to others.

Last week, we asked a DBA to install a database for a customer after we upgraded his platform and software. She has a reputation for being good and reliable. She emailed us a list of requirements for the version of Oracle she was going to install. Minimum OS version 8.2-1 and a list of patches on top of that. I wrote back to say that we're installing the latest, which is 8.3, so all the patches for 8.2-1 will already be bundled into it. She wrote back and just repeated that the Oracle manual states minimum OS version 8.2-1 and this list of patches.....

Has anybody else noticed this in their environments? I can't imagine how people who can't analyze, can't troubleshoot, and can't extrapolate or reason things out - get jobs in the IT industry. Isn't the job all about those things?

ammoQ

I admit I also prefer to look at the code over the use of breakpoints, watches etc. But at least I know such things exist ;-)

Back to the main topic: Such narrow-minded persons exist, and I guess it's mostly a lack of self-confidence.

lpope187

@ammoQ said:

Back to the main topic: Such narrow-minded persons exist, and I guess it's mostly a lack of self-confidence.

In some cases, self-confidence could be the cause like in the COBOL programmers case; but I don't think it is the cause for everything. In the other two cases, I'd say it is a matter of pride causing the inability to admit when one's wrong.

I've met quite a few IT professionals who are incompetent and unwilling/unable to learn and try new things. In that regard, IT is just like any other industry.

Lingerance

Quality control on the drones appears to be severely lacking these days. Worst thing I've seen is a "programmer" installed phpBB to the root web directory effectively destroying my custom portal system at which point my "boss" got his contact with our hosting provider to help him fix the error, which involved using google cache to generate "backup data" to perform the backup. When I finally got the email (as at the time I wasn't checking my email regularly as I break habit occasionally) it was about a three hour job to repair everything.

Note: The afore-mentioned place I volunteer for; as programming is hobby for me not a job. Hence lacking response times on my part.

jetcitywoman

@ammoQ said:

I admit I also prefer to look at the code over the use of breakpoints, watches etc. But at least I know such things exist ;-)
Back to the main topic: Such narrow-minded persons exist, and I guess it's mostly a lack of self-confidence.

Heh, sure! I didn't mean to imply that I've never just read through the code. You obviously need to do that in order to decide/guess where to set your breakpoint. Or if you're old enough ;-) where to put print statements for debugging purposes. But in this case, I could tell she was way in over her head. It might have been a language barrier though. English wasn't her first language, and this WAS Cobol... Which is another challenge in and of itself.

I think what puzzles me more than anything is not just how these people get hired when they obviously don't have the skills, but how they KEEP their jobs. For example, the backup situation was about 6 or 7 years ago and he STILL works for us, STILL as a programmer/engineer and STILL with onsite customer assignments.

ammoQ

@jetcitywoman said:

Heh, sure! I didn't mean to imply that I've never just read through the code. You obviously need to do that in order to decide/guess where to set your breakpoint. Or if you're old enough ;-)

I am

where to put print statements for debugging purposes.
I think what puzzles me more than anything is not just how these people get hired when they obviously don't have the skills, but how they KEEP their jobs. For example, the backup situation was about 6 or 7 years ago and he STILL works for us, STILL as a programmer/engineer and STILL with onsite customer assignments.

Maybe the manager knows that it is always a bad idea to fire someone who blindly follows the orders? Such people can be very useful (at least for the manager).

Independent

@jetcitywoman said:

I think what puzzles me more than anything is not just how these people get hired when they obviously don't have the skills, but how they KEEP their jobs. For example, the backup situation was about 6 or 7 years ago and he STILL works for us, STILL as a programmer/engineer and STILL with onsite customer assignments.

Look at it from the bright side: it is thanks to such people that the more intelligent of us progress in our carreer.

Devi

@ammoQ said:

Maybe the manager knows that it is always a bad idea to fire someone who blindly follows the orders? Such people can be very useful (at least for the manager).

I agree, initiative can be a double edged sword. It is a known fact that the system works with the version and patches described, whether it works with the next version that contains those patches is unknown. If she recommends version 8.3 and it doesn't work, then her manager can quite reasonably hold her responsible for all the problems that caused. On the other hand, without anyone taking that initiative in some form, you'll never know if it does work with that version. It's striking a balance between innovation and cautiousness that is key (sorry, I'm reading Dune at the moment and it's encouraging me to think in grand sounding axiomatic truths)

This reminds me of a conversation I saw in these forums a while ago about how some people might be inherently unable to program and how programming courses once had an unusually high failure rate, because some students were simply incapable of learning how to comprehend and write code. Someone mentioned that Universities had deliberately let the passing grade fall below a reasonable level, so that enough of these incompetent (or shall we say, disadvantaged?) people could pass the course, in order to stop the poor grades making the university look bad. The direct consequence of this being a large number of people whose qualifications hid their lack of actual ability. I have no idea if this is true, but when I was trying to look it up right now, I stumbled upon this essay [url]http://www.psy.gla.ac.uk/~steve/localed/jenkins.html[/url] and at one point is states:

"It might appear at first sight that deep learning is vital for programming,
providing understanding that can be applied in new problem areas. However,
it could equally be argued that programming can be learned as essentially
a process that amounts to simple "pattern matching" where common
problems are spotted and known working solutions applied."

Which does make sense, I'm working at the moment with integrating a physics engine into a game simulation. My understanding of physics is awful, as is my higher mathematics (i.e. I failed calculus in school and that probably doesn't even count as "higher") but I can recognize what the engine does and what situations in the system that behavior can be applied to, so I can still integrate the engine. Because of this my current title is Physics Programmer, though woe betide anyone who actually hired me as such. I don't even know Newton's laws of motion without a webpage to remind me.

jetcitywoman

@Devi said:

@ammoQ said:
Maybe the manager knows that it is always a bad idea to fire someone who blindly follows the orders? Such people can be very useful (at least for the manager).
I agree, initiative can be a double edged sword. It is a known fact that the system works with the version and patches described, whether it works with the next version that contains those patches is unknown. If she recommends version 8.3 and it doesn't work, then her manager can quite reasonably hold her responsible for all the problems that caused. On the other hand, without anyone taking that initiative in some form, you'll never know if it does work with that version. It's striking a balance between innovation and cautiousness that is key (sorry, I'm reading Dune at the moment and it's encouraging me to think in grand sounding axiomatic truths)

True about initiative being a double-edged sword. Managers inherently love yes-men because they're just easier to manage. I've also long noticed that the more intelligent employees, while very valuable to the company, tend to argue and debate more and thus are trouble for management and usually seen as "difficult". I just think it's a shame that less skilled people, we could be harsh and call them trained monkeys because some of the ones I've known are little better than that, are seen as valuable to the company. They slow projects down and often cause problems from bad coding or whatever. I guess I'm idealist - in my world, it would be more than lip service that "only the best" are hired and retained - AND VALUED by the company. In my experience they're usually quickly alienated and politically targetted and quickly laid off. In fact, in my current company, when the trained monkeys can't handle the job, the workload is largely shifted over to the harder workers, so the hard workers are overloaded and the trained monkeys are protected and play Freecell all day.

Regarding this lady and the Oracle specs, it really didn't matter. The new hardware comes preloaded with OS 8.3 and all the licensing for that. We don't have a choice to go with the older version - unless it IS possible to order it but it would be alot of trouble. When I questioned it to her, I was mainly trying to reassure her that although we don't have the version specified, we have something newer. What bugged me was that instead of a reasonable argument like you gave above, she simply echoed herself and "threw the book at me". It wasn't helpful, and you're welcome to call me a snob, but I think it showed lack of intelligence.

@Devi said:

This reminds me of a conversation I saw in these forums a while ago about how some people might be inherently unable to program and how programming courses once had an unusually high failure rate, because some students were simply incapable of learning how to comprehend and write code. Someone mentioned that Universities had deliberately let the passing grade fall below a reasonable level, so that enough of these incompetent (or shall we say, disadvantaged?) people could pass the course, in order to stop the poor grades making the university look bad. The direct consequence of this being a large number of people whose qualifications hid their lack of actual ability. I have no idea if this is true, but when I was trying to look it up right now, I stumbled upon this essay [url]http://www.psy.gla.ac.uk/~steve/localed/jenkins.html[/url] and at one point is states:
"It might appear at first sight that deep learning is vital for programming,
providing understanding that can be applied in new problem areas. However,
it could equally be argued that programming can be learned as essentially
a process that amounts to simple "pattern matching" where common
problems are spotted and known working solutions applied."

In many of the people I've met, I've noticed a pattern that some people seem incapable of thinking in a logical, analytical, methodical way. There's nothing wrong with those people, except that they shouldn't be in lines of work that require logical, analytical thinking or methodical practices. Many years ago I got the opportunity to help my company promote someone internally. It was in a 911 dispatch center and we had several dispatchers as candidates for the IT support opening. Two very different jobs with very different skillsets. The only similarity was the ability to reason and huge attention to detail. I did sort of a one-on-one Computers 101 class with the candidates and it enabled me to see how well they'd do in the job as well as if they even had the aptitude for it. One of the candidates not only sat patiently for a few hours while I taught him binary math, he "got it". Sometimes I drove him nuts with my Socratic method of teaching programming, but it helped him learn to reason problems out and work them through methodically. The other candidates were simply bored with the Computers 101 and lost interest.

CPound

@jetcitywoman said:

In fact, in my current company, when the trained monkeys can't handle the job, the workload is largely shifted over to the harder workers, so the hard workers are overloaded and the trained monkeys are protected and play Freecell all day.

It's the hard worker's fault for not objecting to the work redistribution. If you're a hard worker, why would you sit idly by when management says to you: "Johnny's kinda slow and not getting it all done, so we're going to reassign you some of his projects. Some tasks may not be in your job description, but go ahead and take them on. Take one for the team." If I were the hard worker, I would quit. It sounds like a management problem. Your managers obviously have no clue how to manage a programming team.

@jetcitywoman said:

In many of the people I've met, I've noticed a pattern that some people seem incapable of thinking in a logical, analytical, methodical way. There's nothing wrong with those people, except that they shouldn't be in lines of work that require logical, analytical thinking or methodical practices.

That's like saying: "These people aren't idiots, they're just stupid and incompetent."

stratos

@jetcitywoman said:

I did sort of a one-on-one Computers 101 class with the candidates and it enabled me to see how well they'd do in the job as well as if they even had the aptitude for it. One of the candidates not only sat patiently for a few hours while I taught him binary math, he "got it". Sometimes I drove him nuts with my Socratic method of teaching programming, but it helped him learn to reason problems out and work them through methodically. The other candidates were simply bored with the Computers 101 and lost interest.

I try to do that too, and give them a lot of freedom to find it out on there own. Biggest reason for that is if they don't learn to find solutions for themselves; They will keep asking me, and while i know more then them, i certainly don't know everything, or even the best way way to do things.

I've never found anyone who just "got it" though, mostly they just stare blank as if i'm trying to explain the impossible. What troubles me a bit though is that most of these are out of school guys, and then don't seem to know anything about data structures. And i'm not talking about the theory behind it or anything, but just binary tree sorting, bubble sorting, linked lists. Which i think is basic for a programmer to know really. It's like they only know list/array,string and int, and that's it.

CPound

@stratos said:

What troubles me a bit though is that most of these are out of school guys, and then don't seem to know anything about data structures. And i'm not talking about the theory behind it or anything, but just binary tree sorting, bubble sorting, linked lists. Which i think is basic for a programmer to know really.

Nowadays there's no need for the average programmer to understand what a "data structure" is. It's all drag and drop in the IDE. Regarding "binary tree sorting" and "bubble sorting"...I did that sort of thing in college and never ever touched it afterwards. It's just not relevant to business needs. Finally, "linked lists". I mentioned this in a previous post, but to this day I have to look up the definition. What are they anyways? Hyperlinked <li> tags?

zedhex

Duh... is this supposed to be flamebait or what??? There is a very simple reason why a programmer needs to know the difference between the different sorting algorithms - performance.

For example, if you want to do a search for a item in an indexed collection, you need something that will do a b-tree search, not a bubble sort. Just go away and read something about algorithms. This means that a good programmer will choose the data structure he needs according to the data he is working on. If he doesn't he will get a program that may work, but it will be several orders of magnitude slower than it needs to be.

So, CPound if you came for an interview at our comany - I wouldn't hire you. You are just not a real programmer...

ammoQ

@zedhex said:

Duh... is this supposed to be flamebait or what??? There is a very simple reason why a programmer needs to know the difference between the different sorting algorithms - performance.

For example, if you want to do a search for a item in an indexed collection, you need something that will do a b-tree search, not a bubble sort. Just go away and read something about algorithms. This means that a good programmer will choose the data structure he needs according to the data he is working on. If he doesn't he will get a program that may work, but it will be several orders of magnitude slower than it needs to be.

So, CPound if you came for an interview at our comany - I wouldn't hire you. You are just not a real programmer...

To be fair, hardly anyone has to write sorting algorithms by themself nowadays. You get quicksort and/or heapsort out of the box.

ammoQ

@CPound said:

It's just not relevant to business needs. Finally, "linked lists". I mentioned this in a previous post, but to this day I have to look up the definition. What are they anyways? Hyperlinked <li> tags?

Maybe I'm feeding a troll, but anyway: linked lists are a rather simple data structure, comprising a number of nodes, where each node consists of the payload and a pointer to the next node.

For example, in Java, it looks like that:

class LinkedList {
    Object data;
    LinkedList next; // this is the pointer to another LinkedList object
    // I'm leaving out the methods for appending, inserting, searching etc. to keep the example short
}

zedhex

Re. data structures: my point wasn't that you have to know how to write them (although that helps) - just that any decent programmer should know (at least) how and where they are best used.

CodeWhisperer

Since we seem to be on the topic of data structures and algorithms, and anticipating the question "Yeah, but why would you ever need one?"...

Assume you have a list of entries, but you don't know how long the list is going to be. You have a couple ways you can go:

1) Set a 'max size' up front

Problems with this: you use the same amount of memory for 1 entry as you do for your max amount; If you happen to need more than that max size in the future, you're out of luck unless you recompile; and if you need to do inserts or deletions in that list, you'll do a lot of moving things around. If you've ever run into a piece of software and thought "Why do they only let me have 10 of these, what if I need 11?", there's a good chance you're looking at software with a hard-coded limit on an array.

2) Reallocate the array every time you resize it

Start with an array of 10, say. if you want another entry, you have to allocate a new array of size 11, and maybe move things over manually and delete the original. Sometimes those details are taken care of for you, but that doesn't mean it's free.

Both of these also have the problem that they require a continguous chunk of memory of the right size. If you want a really large chunk of memory that could get tricky depending on the situation, OS, etc.

The solution is to use something like a linked list. If you want to add an entry to the end, no problem, just create a new node and set the 'tail node' of the list to point to your new node and viola, a new tail. If you want to remove a node, it's about as easy. Inserting? Just change the 'next' pointer on the node before you want to insert in front of. No reallocation needed, no moving around of elements.

Linked lists also lead to other, more interesting data structures. You can join the head of the list to the tail of the list and get a 'circular linked list' which can be useful in buffering. If you let each node have 2 links to the next nodes, you get a binary tree.

There are times when you want an array over a linked list. Linked lists are tougher to 'index into'. If you want to be able to say "give me the 30th node", that'll take 30 steps, whereas with an array, you can jump right to it, for instance.

But it's that understanding of when to use what structure, what the tradeoffs are, etc., that seems to be the point here. Not too many .NET developers need to write their own linked list class anymore when you can just say List<string> and get one 'out of the box', but knowing the right time to do it is important.

-cw

CPound

@zedhex said:

So, CPound if you came for an interview at our comany - I wouldn't hire you. You are just not a real programmer...

What are you going to do now? Ask all of your incoming candidates to give you a dictionary definition of "linked list"? They will get that answer right if they're right out of college. Which was really the point I was trying to make. Linked lists, bubble sorting, etc. is all academic stuff. (Come to think of it, I learned that stuff in my Freshman year C++ intro course and haven't used it since!) It's all academic and is not used at all in the real world of business.

There is the slim chance that you might be working for a university or be doing some sort of intense computer research which would require such trivia. But for the average, day-to-day programmer, it's completely useless information. I would rather drag-and-drop all day, not knowing what goes on behind the IDE, making $90k a year, than have to spout on about esoteric programming topics, just to keep my job under a CompSci professor for $30k a year.

CodeWhisperer

@CPound said:

There is the slim chance that you might be working for a university or be doing some sort of intense computer research which would require such trivia. But for the average, day-to-day programmer, it's completely useless information.

That's an amazingly sad statement.

Knowing stuff like this comes up all the time outside the 'academic' setting; and it's attitudes like this that result in systems with poor performance and high maintainability.

Sure, if all you need to do is get data from a DB, display it to a page, take the changes, push them back to the DB...well, then you won't need this. But that sort of programming is to 'software engineering' what 'the oil change guy' is to 'mechanic'. Sure, there's a market for it, and if that's all you aspire to...well, good on ya. But if you were to go to Microsoft, or Google, or Amazon, or Apple, or ... anywhere, really...they'd laugh you out of the building -- because that stuff does come up, and often.

Where? Well, from my own career:

- A search engine that had to answer questions like "what is geographically close to my location". Stuff like that isn't good for standard dbs typically, you need a specialized data structure.

- A year before that it was a specialized search engine that had to be able to tell if one or more of the results "superceded" the others and exclude the superceded ones.

- A few years before that I was building a simulator for people to test error correction algorithms against, that required a 'signal path' that could be strung together out of arbitrary blocks of functionality.

- Before that a network security product that needed to maintain a list of rules that took different paths depending on what happened before.

- Before that more mapping systems, and developing a custom DB product for the military.

And that's just the last 7 or 8 years. I could go on.

None of those examples would have been well served by an "array of whatever" or a simple database request, they required some thought about the data structures underneath and the algorithms that played on top of them. I'm sure any number of people here will have similar stories to tell.

This isn't 'academic' programming, this is solving real world problems. And if I'm interviewing for a job to solve those sorts of problems you'd better believe I'm going to ask someone what a linked list is. Or what polymorphism is. Or a design question that actually requires them to think about pointers/references between nodes of data in some 'esoteric' form -- because they're going to have to write code that, damn it, needs to have nodes of data that are linked together in some interesting way.

Where do you think operating systems, compilers, word processors, spreadsheets, web browsers, web servers, databases, audio players, games, etc., come from anyway? Do you think they're written by devs like you who go home at night, then the algorithm elves come out and write all the interesting parts?

Sheesh. The real WTF is we've been having this discussion for over a year now and you haven't even tried to learn.

-cw

CodeWhisperer

Err...that should have been "high maintenance cost" or "low maintainability".... sigh.

CPound

@CodeWhisperer said:

Sheesh. The real WTF is we've been having this discussion for over a year now and you haven't even tried to learn.

That's not it at all. I would re-learn the stuff if I needed to. Yes, I knew all those concepts way-back-when, but there's really no point now in my current career. And I'm sure it's the same for the average coder.

Think about it. If you are a .NET programmer, and you have a task to do, why reinvent the wheel? Have you ever heard of enterprise code blocks before? Everything you need to build the majority of your applications is already out there. Just drag and drop it! Why make it hard on yourself creating what's already been created? Besides, I don't know about you, but I have deadlines to meet on my job, and I don't have the time to ponder the intricacies of the CLR and how I can improve on it. People in academia have that sort of time, to sit around and chat about how they would create this or that new component. I don't. Programming is all about providing solutions, and if you can't deliver (and fast) then you're looking for a new job. The business could care less about the latest theorizing on bubble sorts. They want to make money and you're there to help them do that.

I know you're going to come back with something like "What if you're faced with a truly difficult issue that's outside of the realm of what you're used to?"

All I can say to that is, that's what Google is for. And people like you hate to hear that. Because the information is so readily available. Just Google it, then cut and paste the solution that someone already came up with, and you're done! It really frosts your cookies I know. But that's real-life programming for you. Drag and drop, cut and paste, point and click. The easier the better.

CodeWhisperer

You really should stop trying to guess what it is that "frosts my cookies". You don't have a clue.

Go ahead and try your own suggestion. Take one of my examples. Geographic search engine, for instance. You've got a list of 1 million locations (with latitude & longitude) and you want to return a list of which of those locations are within 20 miles of your location -- without searching through all one million of them. Go to google and come back and tell me how to do it. For bonus points, it has to handle 'regions' as well...like zip codes. Can you tell me which zipcodes are within 50 miles of a given location using the same mechanism?

-cw

CPound

@CodeWhisperer said:

Go ahead and try your own suggestion. Take one of my examples. Geographic search engine, for instance. You've got a list of 1 million locations (with latitude & longitude) and you want to return a list of which of those locations are within 20 miles of your location -- without searching through all one million of them. Go to google and come back and tell me how to do it. For bonus points, it has to handle 'regions' as well...like zip codes. Can you tell me which zipcodes are within 50 miles of a given location using the same mechanism?

I'm not going to do your job for you. That's your assignment. Have fun with that.

CodeWhisperer

hahahaha. Classic troll. It's sad that I actually anticipated that would be your response.

If you had been paying attention, you'd have understood that I already did do this, about a year ago.

-cw

ammoQ

@CodeWhisperer said:

Can you tell me which zipcodes are within 50 miles of a given location using the same mechanism?

-cw

I've had to "solve" that problem, and the solution was nothing more than a database table ("from_zip","to_zip","distance"). I guess even CPound might be able to do that.

Yes, Austria is a small country, it's perfectly possible to solve the problem like that.

CodeWhisperer

That was only the bonus marks. If you take the first half of the problem (1 million locations), that table becomes unfeasible.

It is also imprecise based on whatever point in the zip code you use as the 'center'. It could well be, for larger zips especially, that some part of the zip code will be considerably closer/further away than that center point.

-cw

ammoQ

@CPound said:

What are you going to do now? Ask all of your incoming candidates to give you a dictionary definition of "linked list"? They will get that answer right if they're right out of college. Which was really the point I was trying to make. Linked lists, bubble sorting, etc. is all academic stuff. (Come to think of it, I learned that stuff in my Freshman year C++ intro course and haven't used it since!) It's all academic and is not used at all in the real world of business.

At least I know that stuff and it's been more than a decade since the last time I've seen the inside of a classroom(). As CW has pointed out, this are the basics you just have to know to know what you are doing. Btw, without going too much into details, there can be circumstances where it makes sense to write e.g. a hand-crafted hash table algorithm.

() TBH, I've seen the inside of the classroom of my son's primary school.

ammoQ

@CodeWhisperer said:

That was only the bonus marks. If you take the first half of the problem (1 million locations), that table becomes unfeasible.

Now that harddiscs are so cheap, why not have a database table with 10^12 rows?

It is also imprecise based on whatever point in the zip code you use as the 'center'. It could well be, for larger zips especially, that some part of the zip code will be considerably closer/further away than that center point.

True, but in that application, all I have is the zip codes (no detailed adresses), so it can hardly get any better than that.

CPound

@CodeWhisperer said:

hahahaha. Classic troll. It's sad that I actually anticipated that would be your response.

LOL - we make a good team, you and I.

I set 'em up, you knock 'em down. Or vice versa.

CodeWhisperer

@ammoQ said:

Now that harddiscs are so cheap, why not have a database table with 10^12 rows?

Because disk seek time is still in the ms range. If on every request you had to go through a million of those, you'd have incredibly painful performance. And if you add a new location, you have to create a million new entries. If you get 1000 new locations, you have to write a billion new rows.

Also, because it doesn't solve the problem. The question didn't involve "this distance between two recorded points", but between an arbitrary location or region and those that are recorded. (Say, between where you are in your car and restaurant locations or homes for sale; or if you're drawing a map and need to know what to display)

-cw

CodeWhisperer

@CPound said:

I set 'em up, you knock 'em down. Or vice versa.

You set yourself up, I just challenged you and you fled.

-cw

ammoQ

@CodeWhisperer said:

@ammoQ said:
Now that harddiscs are so cheap, why not have a database table with 10^12 rows?

Because disk seek time is still in the ms range. If on every request you had to go through a million of those, you'd have incredibly painful performance. And if you add a new location, you have to create a million new entries. If you get 1000 new locations, you have to write a billion new rows.

Also, because it doesn't solve the problem. The question didn't involve "this distance between two recorded points", but between an arbitrary location or region and those that are recorded. (Say, between where you are in your car and restaurant locations or homes for sale; or if you're drawing a map and need to know what to display)

-cw

I was just kidding when I suggested the 10^12 row table, but then, it strikes me that we are already fscking close to be able to have that on a household PC with a few harddisks.

The right answer to this problem is: It has already been solved, use google to find the algorithm very clever people have created.

CodeWhisperer

>g< Well, that is the CPound answer, but whether or not that's the right answer has yet to be demonstrated.

-cw

ammoQ

@CodeWhisperer said:

>g< Well, that is the CPound answer, but whether or not that's the right answer has yet to be demonstrated.

-cw

kd tree?

CodeWhisperer

kd-tree is feasible. There were some trade-offs to be made.

I wanted to be able to keep data in the db until someone wanted it (no sense loading up New York's houses for sale before someone wants them, or to keep Vienna's an hour after some stopped caring), and kd- could require rebalancing the tree as that data got loaded and dropped.

kd trees have the following characteristics: Building the tree O(n log n); Inserting/removing a node is O(log n); lookup on a range is approx O( sqrt(n) ) for a 2d query.

My solution: Building the structure O(n), Inserting/removing node was O(1), lookup was really dependent on a 'resolution' factor and the size of the range you were searching, but O(sqrt(n) would have been a good high estimate.

-cw

ammoQ

@CodeWhisperer said:

kd-tree is feasible. There were some trade-offs to be made.

I wanted to be able to keep data in the db until someone wanted it (no sense loading up New York's houses for sale before someone wants them, or to keep Vienna's an hour after some stopped caring), and kd- could require rebalancing the tree as that data got loaded and dropped.

kd trees have the following characteristics: Building the tree O(n log n); Inserting/removing a node is O(log n); lookup on a range is approx O( sqrt(n) ) for a 2d query.

My solution: Building the structure O(n), Inserting/removing node was O(1), lookup was really dependent on a 'resolution' factor and the size of the range you were searching, but O(sqrt(n) would have been a good high estimate.

-cw

That was my first thought before I looked up the kd tree. Split the whole area into m*m tiles, where m is choosen so that every tile contains roughly m*m locations; so m ~ 4th root of n. In the location table in the database, just add a field with the tile number (or two fields tile_x, tile_y, just in case someone is looking for a location west of Salzburg)

For close distance searches, I have to search at most 4 tiles.

stratos

You say data structures don't come into play in "real life" programming. Well i'm about as low on the ladder as it get's making webpages in php. php doesn't even have a difference between a array and a list.. Yet this stuff still comes up, just this week i asked someone to create a table design and some code to store a folder structure to map the content of the site to. If he had known about data structures, and specifically linked lists, it would have taken him about a hour or two tops. However he didn't and it took him the whole day until i helped him. And i'm still not sure if he actually understands how it works.

This isn't a disaster because he will learn, if he likes it or not. But my point originally was, that for people just out of school, i expect them to know and understand these basic data structures, however it seems at least at his school, they didn't even go into it. Just basic data types and the equivalent of hello world or something; Which distresses me.

asuffield

@CodeWhisperer said:

I wanted to be able to keep data in the db until someone wanted it (no sense loading up New York's houses for sale before someone wants them, or to keep Vienna's an hour after some stopped caring), and kd- could require rebalancing the tree as that data got loaded and dropped.

kd trees have the following characteristics: Building the tree O(n log n); Inserting/removing a node is O(log n); lookup on a range is approx O( sqrt(n) ) for a 2d query.

My solution: Building the structure O(n), Inserting/removing node was O(1), lookup was really dependent on a 'resolution' factor and the size of the range you were searching, but O(sqrt(n) would have been a good high estimate.

I don't think you can actually get lookup down to O(sqrt(n)) for this problem using any method in reasonable space, so I suspect you're thinking about the amortised performance instead (although it's too late in the evening for me to work it out). I'd expect the lower bound on a range query to be something like O(n log n).

For this sort of problem, you usually want to quote the size of the structure as well as the cost of the major operations. There's a simple solution that gives constant time lookup in exponential space (the old "honking great big table of precomputed results" approach), for example, and we aren't interested in that one.

CodeWhisperer

@ammoQ said:

Split the whole area into mm tiles...

That's not far from my solution. In our case we wanted a lot of it to be kept in memory to avoid repeated db hits (mapping system, if you slide the map one mile to the west 5 times, there isn't much sense asking the db for much the same data 5 times in a row).

We had mn 'buckets' the choice of m&n was made based on a tradeoff between a) having small enough number of buckets that we could maintain the set in memory without too much waste even most of them hadn't been loaded yet, b) a large enough number of buckets so that each would contain a relatively small number of items. The buckets could be loaded & unloaded individually depending on when they get used, so you weren't forced to load up the entire country at once.

Which buckets to look in could be easily determined, then each bucket contained a linked list of items (some of which were 'points', others were rectangles or more complex regions), and each class had it's own implementation for a "overlaps()" method, so scanning those buckets to see which items/regions were in the range I was looking for (and what ever search criteria were requested) was a simple process.

There was a bit of trickyness -- up in alaska, a degree of longitude is less distance than down in southern texas, but it wasn't too hard to account for.

Wow, we worked linked lists and polymorphism -- not to mention trigonometry in the form of the distance formula -- into the same problem...a real, non-academic problem at that.

-cw

CodeWhisperer

@asuffield said:

I don't think you can actually get lookup down to O(sqrt(n)) for this problem using any method in reasonable space, so I suspect you're thinking about the amortised performance instead (although it's too late in the evening for me to work it out). I'd expect the lower bound on a range query to be something like O(n log n).

The statement about kd-trees comes from wikipedia which claims search on a range is O(n^(1-1/d) +k) [where d is the dimension of the space and k is the size of the returned set]. The statement about the performance of my solution wasn't based on analysis but on observation. With a million entries, I very seldom had to search through more than a couple hundred entities with a thousand being the highest I saw. It certainly doesn't qualify as mathematically rigorous. It would be possible for one of the 'buckets' to contain 100,000 entries and totally blow the estimate, but in practice that didn't happen.

And of course if that bucket wasn't loaded yet, there is the incurred time of going to disk to retrieve it. However, in practice, this system ran about 3 orders of magnitude faster than the pure DB-based system it replaced.

-cw

CPound

@stratos said:

You say data structures don't come into play in "real life" programming. Well i'm about as low on the ladder as it get's making webpages in php. php doesn't even have a difference between a array and a list.. Yet this stuff still comes up, just this week i asked someone to create a table design and some code to store a folder structure to map the content of the site to. If he had known about data structures, and specifically linked lists, it would have taken him about a hour or two tops. However he didn't and it took him the whole day until i helped him. And i'm still not sure if he actually understands how it works.

Ok, I'll take the bait.

I don't know about PHP (it's been years since I even glanced at it - I briefly used it in the mid-90s) but .NET has a thing called generics. I think that's what you're getting at. Once again, it's a few lines of cut-and-paste code...no thought involved!

It amazes me in this forum how people try to over-complicate what they do for a living. You all try to make it sound so dramatic, like you're doing something so inventive, or you make it sound so overly important...like you're saving lives or something.

Let me break it down for you:

Progamming is not as difficult/complex as you make it out to be.
Whatever problem you are trying to solve, has already been solved, by someone else much smarter than you, and is freely available via Google.
If you are thinking too much about a solution, then you're probably going down the wrong path. Don't over-complicate the issue. (Think: cut-and-paste)
.NET offers most of what you need right out of the box. If you are struggling (i.e. reinventing the wheel), then that means you need to spend more time learning what the framework has to offer. After you do that, it's all cut-and-paste or drag-and-drop.
Theorizing has no place in the business world. Management simply does not care.

There ya go!

stratos

@CPound said:

@stratos said:
You say data structures don't come into play in "real life" programming. Well i'm about as low on the ladder as it get's making webpages in php. php doesn't even have a difference between a array and a list.. Yet this stuff still comes up, just this week i asked someone to create a table design and some code to store a folder structure to map the content of the site to. If he had known about data structures, and specifically linked lists, it would have taken him about a hour or two tops. However he didn't and it took him the whole day until i helped him. And i'm still not sure if he actually understands how it works.
Ok, I'll take the bait.
I don't know about PHP (it's been years since I even glanced at it - I briefly used it in the mid-90s) but .NET has a thing called generics. I think that's what you're getting at. Once again, it's a few lines of cut-and-paste code...no thought involved!

Nope sorry, wasn't really getting at anything, besides pointing out that even if your working in a language where you would supposedly hardly have to worry about data structures, it pops up from time to time.

It amazes me in this forum how people try to over-complicate what they do for a living. You all try to make it sound so dramatic, like you're doing something so inventive, or you make it sound so overly important...like you're saving lives or something.

Sorry if it looked like that, but no, not at all. wouldn't even want to really, i could get someone killed.

Let me break it down for you:
Progamming is not as difficult/complex as you make it out to be.

I would say thath what i was talking about indeed isn't complex.

Whatever problem you are trying to solve, has already been solved, by someone else much smarter than you, and is freely available via Google.

Indeed it is. However without knowing how the problem is called, it's pretty hard googling

If you are thinking too much about a solution, then you're probably going down the wrong path. Don't over-complicate the issue. (Think: cut-and-paste)

Words of truth, indeed, solutions shouldn't be difficult, you need to think about the problem until the solution is obvious.

.NET offers most of what you need right out of the box. If you are struggling (i.e. reinventing the wheel), then that means you need to spend more time learning what the framework has to offer. After you do that, it's all cut-and-paste or drag-and-drop.

Well my language doesn't really offer a framework, but still sound advice.

Theorizing has no place in the business world. Management simply does not care.

That's why there's a lot that you simply don't tell management.

There ya go!

Indeed i do, i'm off to Wacken: Open Air, You you lot in a week.

CodeWhisperer

CPound,

I never did hear what your answer to the search engine questions was, or where you would have cut-and-pasted it from. Or where operating systems, compilers, word processors, spreadsheets, web browsers, web servers, databases, audio players, games, search engines, etc. come from.

Just keep on believing in those algorithm elves.

You're right about one thing...management doesn't care about theorizing. They care that someone is going to come along and provide an answer that runs 1000x times faster than yours. Or costs 1/3 the cost to maintain. Or lets them adapt to changing business solutions. And that's that.

-cw

CPound

@CodeWhisperer said:

I never did hear what your answer to the search engine questions was, or where you would have cut-and-pasted it from. Or where operating systems, compilers, word processors, spreadsheets, web browsers, web servers, databases, audio players, games, search engines, etc. come from.

Just keep on believing in those algorithm elves.

Somehow all of my projects get done, so I owe the elves a great deal of gratitude.

@CodeWhisperer said:

You're right about one thing...management doesn't care about theorizing. They care that someone is going to come along and provide an answer that runs 1000x times faster than yours. Or costs 1/3 the cost to maintain. Or lets them adapt to changing business solutions. And that's that.

Well put.

asuffield

@CodeWhisperer said:

@asuffield said:
I don't think you can actually get lookup down to O(sqrt(n)) for this problem using any method in reasonable space, so I suspect you're thinking about the amortised performance instead (although it's too late in the evening for me to work it out). I'd expect the lower bound on a range query to be something like O(n log n).
The statement about kd-trees comes from wikipedia which claims search on a range is O(n^(1-1/d) +k) [where d is the dimension of the space and k is the size of the returned set].

Ah - the k is important here, because it's probably going to be larger than sqrt(n) fairly often. O(sqrt(n) + k) is far more believable.

The statement about the performance of my solution wasn't based on analysis but on observation. With a million entries, I very seldom had to search through more than a couple hundred entities with a thousand being the highest I saw. It certainly doesn't qualify as mathematically rigorous. It would be possible for one of the 'buckets' to contain 100,000 entries and totally blow the estimate, but in practice that didn't happen.

That makes more sense. Remember that O() is talking about the worst-case performance, not the typical performance.

CodeWhisperer

@asuffield said:

Ah - the k is important here, because it's probably going to be larger than sqrt(n) fairly often. O(sqrt(n) + k) is far more believable.

Hmm? k is, as I understand it, the size of the set returned from the search. Most searches, given the sort of application we're discussing here, are only going to return a small number of entries compared to n -- though clearly, worst case performance can push up to O(n) given that definition.

I suppose, then, that my own solution -- given this definition -- would be O(k). Hey, Ok performance isn't that bad! :)

(groan)

-cw

kirchhoff

@CPound said:

Somehow all of my projects get done, so I owe the elves a great deal of gratitude.

This thread is proof that CPound is in fact, KING INTERWEBS DOUCHEBAG. We bow before you.

You like how I did the Bold and the Italics there? I thought it was pretty cool myself.

Ixpah

@CPound said:

Whatever problem you are trying to solve, has already been solved, by someone else much smarter than you, and is freely available via Google.

Without understanding how do you know that it functions correctly?

Even with obscure terms google seems to find millions of pages these days. Personally I find a lot of the programming information/examples on the net to be extremely poor quality, often inaccurate, and sometimes just plain wrong. If you don't know about say XSS or SQL injection how would you know that the code you are copying into your company's website isn't vulnerable to these exploits?

asuffield

@Ixpah said:

Even with obscure terms google seems to find millions of pages these days. Personally I find a lot of the programming information/examples on the net to be extremely poor quality, often inaccurate, and sometimes just plain wrong.

I cannot think of any occasion when I have seen programming advice from google that was not actively bad and interspersed with misinformation. I believe this is most probably because good developers are not interested in creating websites telling ignorant people how to pretend that they know what they're doing.

dhromed

Those who can; do.

Thos who can't; write websites about it.