Where does test data come from?



  • Up until a few months ago, I worked for a very small web development company that produces a web-based product for office management. Basically, it keeps track of things like customers, invoices, receivables, and a ton of other stuff. The company itself is run on the software, so all our customers and payments and everything were stored there. That was not really a big deal, since you need a login to get in and stuff.

    Our test database was a copy of the production database, with emails changed (so that customers wouldn't get random emails when we're testing stuff) and a lot of test entries added. It was on an internal server and only accessible by the three developers, so it wasn't a huge problem. Maybe not the smartest idea, but not terrible.

    While navigating our external website, I noticed that I could sign up for a demo account. I was curious how torn-up our demo would be, since everybody can add their own stuff, so I created an account and logged in. Looking around, the data looked familiar: the publicly-accessible database was a copy of our company's production database! (Luckily just a copy, changes weren't reflected -- that would have been much worse).

    Since I don't have direct access to the production-server's databases, I added it to our bug-tracker with the highest possible priority and changed the demo account password. At the time when I left the company, that's how it was. And it wouldn't surprise me a bit if they eventually realized their demo was broken and reset the password.....



  • Wow.  talk about a severe case of "breach of confidentiality".  Let's hope they never realize their demo is busted.  Chances are they won't.  Who actually tries to use a demo, fails, then emails the  company asking why the demo is busted anyway?  Usually they just think it's a crap product (you know, first impressions...), and go look for something else that'll suit their needs.



  • I usually generate my test data with a nice script that is feed by a few nice little peices fo data.

    Look up the census information on names, you get male, female, and last names, with the percentage of occurrence in the population, use that percentage to randomly pick a first, middle and last name from those lists.

    For addresses, I just use a Verb/Adjective Noun Postfix type format.  For example:

    Verb/Adjective:

    Running, Flowing, Blowing, Shady, Red, etc

    Noun:

    Creek, Brook, Tree, Fox, Bear, Cup, etc

    Postfix:

    Lane, Road, Street, Alley, etc.

    Picking one from each list gives you neat little addresses when combined with a random number and a zip code lookup for city/state.

    Yeah I had to much fun putting all that together, but it kept our test data truly as test data.  Run this as a replace on all copied live data if you need to and you are good to go.
     



  • This company didn't happen to be associated with an architecture firm did it? If so, I know the product, and when I left there just under a year ago, it was the same deal, though I think the demo account worked.



  • I remember at a previous company, our boss would get random user names from none other than our online phonebook.  Well, until they banned his IP address cause he was hammering the server with his script.  555-0000, 555-0001, etc.  At least they were believable names 

     



  • @un.sined said:

    This company didn't happen to be associated with an architecture firm did it? If so, I know the product, and when I left there just under a year ago, it was the same deal, though I think the demo account worked.

    It is, actually. But it's a rather small company in a rather small city (in a big-ass country: Canada). I doubt anybody here has even heard of it.



  • working for an airline, we'd typically build our own test PNRs, a lot of times in scripts.  They'd have names like TEST/ONE MR, TEST/TWO MRS, TEST/THREE, I/TEST/FOUR  and so on (mr, mrs, child, infant).  Anyway, you would not believe how times I'd look on the production system and find one or two TESTs booked on a flight.  They'd be obvious test cases because there'd never be any contact information: no phone number.  Occasionally Mr Test would fly first class too.




  • re: Where does test data come from?

    When a mommy test script and a daddy test script love each other very much...



  • @rbowes said:

    @un.sined said:

    This company didn't happen to be associated with an architecture firm did it? If so, I know the product, and when I left there just under a year ago, it was the same deal, though I think the demo account worked.

    It is, actually. But it's a rather small company in a rather small city (in a big-ass country: Canada). I doubt anybody here has even heard of it.

    I bet some of us in Calgary know it :) 



  • @webzter said:

    re: Where does test data come from?

    When a mommy test script and a daddy test script love each other very much...

    Must be a lot of that going around at the moment....

     

     Actually, as long as either the data isn't sensitive or the people with access to the test data have access to the live data anyway, the best test data is real data. It's quite common for changes to go wrong when they go live, and have the developer say "oh, well I didn't expect *that*!"

    As long as there's not a reason not to, I always use real data for testing. Of course, making real & sensitive data world visible in the guise of a test system is a big ol' wtf.



  • @skippy said:

    I bet some of us in Calgary know it :) 

    Ha, fooled you by adding my location! I just moved here from Winnipeg a couple months ago.

    (Crap! I gave away the real city :P ) 



  • @webzter said:

    re: Where does test data come from?

    When a mommy test script and a daddy test script love each other very much...

    Wait! Tell me more... do they merge?



  • Using a script to generate test data is extremely foolhardy. The main reason you're testing is to check that there won't be failures in exceptional cases, but if such a case is unforseen, your script likely won't create such data, and thus you'll miss the error.



  • @rbowes said:

    @skippy said:

    I bet some of us in Calgary know it :) 

    Ha, fooled you by adding my location! I just moved here from Winnipeg a couple months ago.

    (Crap! I gave away the real city :P ) 

     

    insert standard Winnipeg bashing comment here

    Welcome to the city that's about to become the most expensive in Canada. 



  • @skippy said:

    insert standard Winnipeg bashing comment here

    Welcome to the city that's about to become the most expensive in Canada. 

    Yeah, thanks.. $900/month for a small apartment on 14ave. imagines what he could buy in Winnipeg for that pricetag

    Of course, it could be much, much worse... 



  • @rbowes said:

    @skippy said:

    insert standard Winnipeg bashing comment here

    Welcome to the city that's about to become the most expensive in Canada. 

    Yeah, thanks.. $900/month for a small apartment on 14ave. imagines what he could buy in Winnipeg for that pricetag

    Of course, it could be much, much worse... 


    You could be in New York, where $900 a month won't even get you a cardboard box in an alley.



  • @m0ffx said:

    Using a script to generate test data is extremely foolhardy. The main reason you're testing is to check that there won't be failures in exceptional cases, but if such a case is unforseen, your script likely won't create such data, and thus you'll miss the error.

    Using a test to generate all your test data may be inadequate for some cases, but you can script some pretty exhaustive test cases for some sorts of things - coming up with every possible permutation of the valid data, in some instances. And, of course, there's always fuzz testing (feed your application lots of random garbage and see if you can crash it).

     



  • @morry said:

    working for an airline, we'd typically build our own test PNRs, a lot of times in scripts. They'd have names like TEST/ONE MR, TEST/TWO MRS, TEST/THREE, I/TEST/FOUR and so on (mr, mrs, child, infant). Anyway, you would not believe how times I'd look on the production system and find one or two TESTs booked on a flight. They'd be obvious test cases because there'd never be any contact information: no phone number. Occasionally Mr Test would fly first class too.

     
    We have a person working part-time just to find and delete crap PNR's. Well, "find" is the wrong word, we have a program that searches for suspect PNR's, but it's always amazing how many bad or duplicate bookings are hanging around.



  • @Carnildo said:

    @rbowes said:
    @skippy said:

    insert standard Winnipeg bashing comment here

    Welcome to the city that's about to become the most expensive in Canada.

    Yeah, thanks.. $900/month for a small apartment on 14ave. imagines what he could buy in Winnipeg for that pricetag

    Of course, it could be much, much worse...


    You could be in New York, where $900 a month won't even get you a cardboard box in an alley.

    Yes, but in New York, that's $900 in real money, not that goofy stuff they use up north.



  • Given how weak the US dollar is against the pound lately, I'm not sure I'd call it 'real' money either :p



  • If I can't use real production data, I prefer to use a list of things that have bitten me on the arse in the past.  "O'Brien" is a great one for finding where you haven't properly escaped quotes in javascript or SQL code.  Single-space, trailing-space and leading-space are also good traps.

     "Christopher" is the longest name in common (English) usage - I don't know any other name longer than 9 characters, except for some Indian friends.  This is good for checking the width of columns in mailning labels, reports and HTML tables.  The one Christopher we have in our employee database also has an inconveniently-long last name too.



  • @bstorer said:

    Yes, but in New York, that's $900 in real money, not that goofy stuff they use up north.

    But at least up here, denominations look distinct. When I'm in the US, I have to go through my wallet looking for the right green bill, whereas here I just pull out the right bill first try.

    And don't get me started on $1 bills. That's so old fashioned! :)
     



  • I don't think I've seen it exposed on a "demo" site like that before, but I have worked places where *way* too many employees had access to copies of production data. Contractors who may be here today, gone tomorrow, could (if they wanted) leave with copies of production databases containing identities to steal, bank numbers to try to use, etc.


     



  • @rbowes said:

    And don't get me started on $1 bills. That's so old fashioned! :)
     

    And don't let any non-canadians get started on our heavy pockets, littered with tons of $1 and $2 coins.  Wallets are almost obsolete here.  One good thing that comes from all this change is that it's really promoting the "cashless" society, cause no one wants to haul around all that change. 



  • @skippy said:

    @rbowes said:

    And don't get me started on $1 bills. That's so old fashioned! :)
     

    And don't let any non-canadians get started on our heavy pockets, littered with tons of $1 and $2 coins.  Wallets are almost obsolete here.  One good thing that comes from all this change is that it's really promoting the "cashless" society, cause no one wants to haul around all that change. 

    Have you considered one of [url=http://www.coindispenser.com/mc75.jpg]these?[/url]



  • @bstorer said:

    Have you considered one of [url=http://www.coindispenser.com/mc75.jpg]these?[/url]

    Now you can look like a carhop attendant for [url=http://www.sonicdrivein.com]Sonic[/url] wherever you go!  Brilliant.


Log in to reply