What the Daily WTF?

MarcB

So a client wants to have some basic package tracking abilities integrated into their site, forcing me to dig into the bowels of UPS, FedEx, et. al's tracking APIs.

They all seem to LOVE using XML (a total conspiracy by storage manufacturers and bandwidth providers to take all our money), and have big fancy DTDs and XSIs.

Browsing through FedEx's documentation showed me one of the perils of attaching human-readable tags to data:

ok tag: <PostalCode>
huh? tag: <StateOrProvinceCode>

In a system where they have tags like <SignatureProofOfDeliveryAvailable>['true' or 'false']</SignatureProofOfDeliveryAvailable>, they give an incorrect tag like <PostalCode>? Sure they mean <PostalOrZipCode>, or perhaps the more US-centric <ZipOrPostalCode>. Or maybe it's the other way around, and they should have taken out the Province option and just gone for a much more compact <StateCode>, or the quite-useable <State>.

Oh, wait... that takes up less space and bandwidth... back to the drawing board... <CodeForLocalPoliticalJurisdictionWhichFallsBetweenMunicipalAndNationalInScopeAndSizeWithoutBeingOffensiveToAnyLocalesWhichMayUseDifferentTermsToReferToSuchDivisions>

MarcB

@SpectateSwamp said:

Video is by far the most exciting part of Search. I can video
in a book and go directly to a given page and play that back
in slow motion. I can video in forum screens.

Useless unless you include the obligatory wooden table and then screen-capture each frame of the video.

@SpectateSwamp said:

It has no database

So you're saying your World Class app will have to do a complete scan of the data set for every search, every time? Must be nice to have the patience of Job, because my 2.3 terabytes of drive space here at home would take a while to search.

@SpectateSwamp said:

and can easily be explained to the common man.

Then go explain it to the common men. They don't hang out on this forum. Most of us here are rather uncommon.

MarcB

@SpectateSwamp said:

Laugh all you want. You are just displaying the reasons for your ClueLessNess.

You don't see Desktop Search for the Spelling Errors. You don't see Desktop Search for "goto" statements. You don't see Desktop Search for "Bad Code" But don't be Worry. I haven't given up on you yet.

Personally, I don't care about the "quality" of the code behind an application, as long as it's reasonable stable, does what it's supposed to, and doesn't have too many bugs. Frankly, yours is a steaming pile of unmaintainable spaghetti (nay, scratch that, it's a steaming pile of Alphabits) code, but since we don't have to work with it, it doesn't enter into the equation. So we'll leave the quality (or lack thereof) out of this discussion.

But your assertion that this app will solve all our problems (balance our checkbook, win the lotto, get us that supermodel girlfriend, etc...) is something we CAN work with.

So, let's look at it this way. How exactly do you "search" video? To my mind, searching video would be a matter of "find all videos where person X is present, music Y is playing, filmed on/around date Z, and contains a wooden table". Perhaps a much simpler query, using your example... "what time index is page 297 of War and Peace filmed on, in which video file".

How exactly does your app accomplish this? Where's the code that can parse the video, decode the frames, and figure out where the page flips are? Where's the OCR code to 'read' the text on all those filmed pages? Where's the English parser which can scan for the title and author of the work in question? What about support for non-English books? Can DesktopSearch do its magic on a video of a book in German? French? Arabic? What about Kanji or Cantonese? Can it detect which direction the text is to be read in? Left->Right? Right-Left? Top->Bottom? Alternating directions?

It's quite simple: Your app does none of this. To call it a "Search" program is laughable at best. At most you could try and position it as a cataloguing app. You must enter all the relevant metadata yourself. Watch the video, manually type up time indexes, manually enter page numbers, manually enter author/title information. In other words, what most other TRUE search programs like what comes with Vista or Google Desktop do automatically, your app requires us to do manually.

Like I stated earlier, I have around 2.3TB of storage here. Some's video, some's plain text, some's .tar.gz backups of web applications, etc... If your application was to become my solution to finding anything, I would have to manually go through 2.3 quadrillion bytes of data and build an index of it all myself. How is this efficient? I don't know how fast Vista's or Google Desktop's indexing actually is, but I'm quite certain both will finish building their index long before I go through the first few minutes of a video of War and Peace.

As such, your app is barely better than building an index manually in notepad and using the standard ctrl-F (Edit|Find) functionality to "search" your "desktop". And given the interface design, undoubtedly it's much less efficient.

Another thought experiment... You advocate the digitization of books by filming them with a camcorder and storing the .avi (or .dv or .mov, whatever format you care to use) as a more efficient "container" for the book's content. Let's pick some reasonable numbers out of a hat. We're digitizing a book of 500 pages. It's pure text, no illustrations or diagrams. We'll figure it takes around 6 seconds for each page to be flipped and stabilized enough to get a clear capture of it on the camera. So that's 50 minutes of video for this book. If we're using a DV camera for this, that'd be around 11 gigabytes of video data (DV is approximately 12 gigabyte per hour of video). Let's convert the video to DiVX, at a bitrate sufficient to keep things clear and legible, i.e: DVD quality. That would produce a .avi file of around 400 megabytes.

If the book had been OCR'd into a text file, we'd have perhaps 2 megabytes of data. How is it more efficient to store this book as a .DV video at 12 gigabytes or a 400meg .avi, and have to MANUALLY generate an index of its metadata (time offset, page number, etc...), than to store the book as a .txt file, which can be easily search by even Notepad and contain all that metadata automatically?

Answer these questions with reasonable answers, and please, do not simply continue to say that we should try desktop search for ourselves. It's up to you to sell it to us, convince us of how useful it is. Explain to us how your code accomplishes what you claim it does, and perhaps we'll give it a try. Until then, we'll stick with our own solutions and methods, as we know they work.

You're the outsider, prove that your way is better.

MarcB

@CodeSimian said:

Apparently, he also doesn't grasp "the art of parsing 1 character at a time." I would hate to see him try to add support, for say, thousands of files:

He also doesn't understand identifying computers properly. I'd hate to see him try to add suport, for say, thousands of computers:

    If Left(strBuffer, lngBufSize) = "OEMCOMPUTER" Then
'        MsgBox (strMessage)
    End If                          'january 31 2001 only show it if it my laptop

Found that in a chunk of code he'd obviously copypasta'd, as it's trivially googleable - it looked strange because the variable names actually made sense, once you ignored the usual M$-style Hungarian notation.

MarcB

Swampy PWNED by his own words and challenge.

@SpectateSwamp said:

I get shit for the same answers so sometimes I spice it up a bit. But anyway back to your search of a random file. Test this out for me. How fast is your computer at reading data from your disk without searching Just reading from start to end of file.

Ok. I generated a file that's exactly 2,388,888,897 bytes long. That's 2.3 gigabytes. It's got a simple format:

This is line XXX. Here's some more text to make the file even bigger, even though this text is repeated every line

where is the actual line number (so it goes "This is line 1...", "This is line 2", etc..).

So here's some info on the file:

File size, the first line, and the last line:

marc@panic:~$ ls -l swamp.txt
-rw-r--r-- 1 marc marc 2388888897 2008-02-14 18:24 swamp.txt
marc@panic:~$ head -1 swamp.txt
This is line 1. Here's some more text to make the file even bigger, even though this text is repeated every line
marc@panic:~$ tail -1 swamp.txt
This is line 20000000. Here's some more text to make the file even bigger, even though this text is repeated every line

I'll assume you mean a page to be 24 lines of text, so here's how long it takes to display the last 24 lines of text in that file:

marc@panic:~$ /usr/bin/time tail -24 swamp.txt
This is line 19999977. Here's some more text to make the file even bigger, even though this text is repeated every line
This is line 19999978. Here's some more text to make the file even bigger, even though this text is repeated every line
[... to keep this post short, I'm deleting the next few lines...]
This is line 19999999. Here's some more text to make the file even bigger, even though this text is repeated every line
This is line 20000000. Here's some more text to make the file even bigger, even though this text is repeated every line
0.00user 0.00system 0:00.00elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+208minor)pagefaults 0swaps

It took 0.00 seconds to do that. In other words, it's so fast, and the time span so small, you need more than two decimal places to express the number. Of course, that's not a fair test, because tail is smart enough to open the file FROM THE END and count backwards. Let's try something harder...

How about the 24 lines from 1,000,000 to 1,000,024?

marc@panic:~$ /usr/bin/time head -1000024 swamp.txt|tail -24
This is line 1000001. Here's some more text to make the file even bigger, even though this text is repeated every line
This is line 1000002. Here's some more text to make the file even bigger, even though this text is repeated every line
[... deleting the middle lines. again, to keep things short ...]
This is line 1000023. Here's some more text to make the file even bigger, even though this text is repeated every line
This is line 1000024. Here's some more text to make the file even bigger, even though this text is repeated every line
0.22user 0.12system 0:03.53elapsed 9%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+205minor)pagefaults 0swaps

Not bad. 3.53 seconds to scan through the first 1,000,024 lines and display the last 24 of those. That's approximately 114 megabytes of data, in 4 seconds, or around 34,000,000 characters per second. Well above your magical 20,000,000

Now, let's see how long it takes grep to find all lines with the number '9' in it. Note that the time stated will be the time scan the ENTIRE file, and remember that we're looking for '9', so the first hit will be at line 9, and show up instantly.

marc@panic:~$ /usr/bin/time grep 9 swamp.txt|wc -l
4.41user 3.00system 0:52.51elapsed 14%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+247minor)pagefaults 0swaps
10434062

So. 52.51 seconds, to find 10,434,062 lines of results. That's 52.51 seconds to scan 2.3gigabytes, or approximately 45,493,980cps, more than DOUBLE your magical 20,000,000cps. Of course, I didn't display all 10 million results and paste them here. That would be stupid. But had I not filtered the results through the 'wc' (word count) program, I would have seen all 10 million lines flash by, just the way you like.

In case you start complaining about me having a faster machine, here's the specs:

AMD Athlon 64 3000+ (2 GHz)
2 gigabytes of ram
IBM 120gigabyte ATA drive

An average machine. It's about 5 years old now, I think. Maybe a bit less. Definitely not "top of the line" anymore. And you'll note that my test file is LARGER than the amount of ram in the system, so there's no way the file could be cached - which means grep and head and tail have to scan the entire thing FROM DISK every time.

@SpectateSwamp said:

Is it somewhere in the 20,000,000 cps range? If it is then you'll have proved that I'm not Gee Hawing you

So, my worst result was around 30,000,000cps, and my "best" result was 45,400,000cps. Is that Gee Hawing you? Or will you finally now admit that you're spewing bullshit through that beard of yours? SSDS is not "the fastest". It's merely the fastest that YOU'VE bothered to test.

Oh, and here's one more test. Call it a freebie. How long does it take grep to find the line with the "12345678" text on it? And display 4 lines of context before and 5 lines of context after it?

marc@panic:~$ /usr/bin/time grep '12345678' -A 5 -B 4 -m 1 swamp.txt
This is line 12345673. Here's some more text to make the file even bigger, even though this text is repeated every line
This is line 12345674. Here's some more text to make the file even bigger, even though this text is repeated every line
This is line 12345675. Here's some more text to make the file even bigger, even though this text is repeated every line
This is line 12345676. Here's some more text to make the file even bigger, even though this text is repeated every line
This is line 12345677. Here's some more text to make the file even bigger, even though this text is repeated every line
This is line 12345678. Here's some more text to make the file even bigger, even though this text is repeated every line
This is line 12345679. Here's some more text to make the file even bigger, even though this text is repeated every line
This is line 12345680. Here's some more text to make the file even bigger, even though this text is repeated every line
This is line 12345681. Here's some more text to make the file even bigger, even though this text is repeated every line
This is line 12345682. Here's some more text to make the file even bigger, even though this text is repeated every line
This is line 12345683. Here's some more text to make the file even bigger, even though this text is repeated every line
1.12user 1.06system 0:31.91elapsed 6%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+252minor)pagefaults 0swaps

31.9 seconds, to scan through around 1.4gigabytes of text to find that match, or in other words 46,441,421cps. Wow. a new record. Oh, and it only used 6% of the available CPU power. In other words, this is a disk-bound operation. A faster computer won't help unless you put in a faster harddrive.

Here's the script I used to generate the swamp.txt test file. It's very easy. A VB expert like you can no doubt reproduce it in only a few years:

#!/usr/bin/perl

open($fh, '>swamp.txt');
for $x (1 .. 20000000) {
print $fh "This is line $x. Here's some more text to make the file even bigger, even though this text is repeated every line\n";
}
close($fh);

MarcB

@pitchingchris said:

I can position a stone under a steady drop of water and acheive the same affect. Doesn't make it any more magical

Yesterday I made a sandwich and one of the pieces of bread had a big hole in it. Also, one of the pickle chips had *2* holes in it. Therefore I must be at least 2x the shaman Swampie is, because the stuff I find has more holes. I should've put the sandwich on my wooden table, taken some photos of it, then screened the photos on an LCD and taken wonky off-axis video of it, mangled it through a half dozen transcoding runs, then uploaded it to ebay along with 200 pages of rambling incoherent line noise for sale for $9,999,999,999,999.99 - a reasonable bargain, given the hol(e)y food's awesome powers.

But I was hungry, so I just ate it instead.

MarcB

@SpectateSwamp said:

Final clues

the encryption file cript.txt is at:

http://www.telusplanet.net/public/stonedan/cript.txt

It is used to shuffle the characters around during encryption

the search excutable is at:

http://www.telusplanet.net/public/stonedan/search.exe

Rot-37 (or so) is your encryption? I got more secure encryption out of a cereal box this morning... If you wanted real security, you could XOR the data with the passphrase 'susageP', just like Microsoft did in early version of WinCE for storing NT passwords. If Microsoft does it, it must be industrial strength, you know.

So how's life in Troll land anyways? We don't take enough time to find out enough about each other on this forum... tell us about yourself. Is the sky blue in your world? Birds grounded and pigs airborne? 1 + 1 = 3 for sufficiently large values of 1?

MarcB

I'd like to share a nice(?) little(?) horror story from a job in years gone by... Apologies for the length, but there's a lot of stuff involved with this, and I've no doubt forgotten a fair number of details.

I'd gotten a temporary gig as a sysadmin at the local public library, where, due to union classifications and whatnot, you couldn't be a "systems administrator" or "IT guy" or whatever. You were a "Librarian Assistant".

It was a pretty standard IT gig. Run some servers (one OpenVMS box for the library catalogue/circulation system, and at the time, some NT4 Server systems), and support the desktops (NT4WS). One of the apps we had to try and support was a little gem called InterLEND, used to "automate" the inter-library loans (ILL) process (sending requests, tracking the progress of requests sent out, yada yada).

InterLEND had started its days as a DOS program, using a modem to dial up to various obsolete and (hopefully) long dead services to sent requests to other libraries. It wasn't fancy, but it did what it had to with a minimum of fuss. And then the Windows age came along, and InterLEND was re-released as a Windows app, with all the usual bells and whistles. Menu bars, button bars, scroll bars, lots of colors, and all the other good stuff that comes with Windows (resource leaks, unexplained crashes, oh joy).

Now, to make one thing clear, InterLEND for Windows had one nice thing going for it: its data files were 100% backwards compatible with all previous versions, including the pre-history DOS versions. You could use any version to read/write the data files and perform the daily tasks without problem, or so it was thought. This saved our butts a few time.

We found various problems with the application. Some by the users during normal work, some by us techies during testing of new versions. Amongst them:

1) InterLEND windows was TCP/IP aware and could send requests via SMTP and retrieve them from POP mailboxes. However, any messages in the mailbox which did not contain the standard ISO interlibrary loan protocol formatted messages could (and pretty much always did) crash the Mailer portion of InterLEND, which then took down the rest of InterLEND. While running the mailer, you'd get a nice little status window showing the application babbling at itself and at the POP and SMTP servers. But it wouldn't tell you which incoming message was causing the crash. You'd have to fire up an email app (Outlook, Eudora, whatever...) to rummage through the mailbox and try to guess which one was causing the problem. All'd you get is Doctor Watson telling you there was a problem.

Sometimes it was obvious: a spam message had crawled in (usually not a problem, as the email addresses for these requests were generally not posted publicly anywhere on the net). Sometimes it wasn't. If a misbehaving mailer at another library had sent out a malformed message, InterLEND would choke and hit the floor retching bits all over the place. To figure out which one was causing the crash, you'd print out a copy from the mail client, delete the message from the mailbox, and re-run InterLEND's mailer, until it managed to get past the spot causing the crash.

Then you'd take the messages you'd printed out (no, there was no wooden table involved) and see if they'd shown up in InterLEND. The one that wasn't in the database was the one causing the crash. Thankfully, the ISO format was plaintext with various padding and delimiter characters (mostly +'s), so extracting the message details wasn't too painful. You could manually enter the details into InterLEND as a new request and from there things would proceded normally.

2) InterLEND for Windows' interface was, to put it mildly, quirky. It had approximately 15 tabs underneath the main menu bar, each representing an area of the program (Mailer, Requests, Clients, Search, etc...). They all had different colors to differentiate them, so that from a distance, InterLEND looked like a rainbow that had a serious personality disorder.

3) Some of the tabs, such as search, would naturally contain a lot of data. If you did a search for any requests whose item in question contained the word 'The', you'd get quite a few hits. So the search results were displayed in a scrollable window. Nothing wrong with that... but for some reason InterLEND presented *two* scrollbars, side-by-side. The outer one was the standard scrollbar any "tall" content will show in a window. The second, inner-most, scrollbar was... different. It also scrolled the content, but *ONLY* the content that been retrieved from the database so far. If you did a query that returned, say, 5000 results, the other scrollbar would size itself to scroll through those 5000 results, retrieving data only as you scroll down through the recordset.

If you'd scrolled down to (maybe) record 1000, the outer bar would be moved down the scroll area by about 1/5th the available area. However, the inner scroll bar would be pinned at the bottom of the scroll area, as you were displaying record #990-1000 of the records retrieved so far.

I don't know why that second scroll bar was there. It completely duplicated the functionality of the outer bar. At best it served as a very bad "zoom" function to view previously retrieved data, at worst it was someone's "hey! that's cool!' idea for stuffing in more features.

4) InterLEND used some kind of private/proprietary database for storing all of its data. They didn't use any of the readily available "private" databases, such as Jet or Borland DE. Maybe this is due to InterLEND's history as a DOS program from the stone ages of computing, created before such fancy DB engines were available. Whatever engine they were using, it did what it had to. Stored data, updated data, deleted data, retrieved data. But on occasion, it would corrupt itself. Being proprietary means, of course, that there's no real repair tools. And whatever its internal construction was, meant that any corruption left the entire database was a steaming pile of random bits. You couldn't rebuild an index, delete a corrupted record, or lop off a chunk of the file to excise the bad spots. A flipped bit somewhere hosed the whole thing.

This meant the database files had to be religiously backed up, lest you lose everything. Thankfully the corruption didn't occur too often, so a backup granularity of 1 day usually did the trick. At most you'd lose that morning's working (incoming requests were retrieved/processed in the mornings, and outgoing requests were done in the Afternoons). But when it did hose itself, work stopped until the nightly backup tape could be retrieved and last night's copy of the files retrieved. So standard SOP was to print out anything you'd been working on, in case the DB fried itself, which leads to....

5) One of InterLEND's selling features was that it would reduce the amount of paperwork needed to process a request. Interlibrary Loans, by their nature, are paper-intensive. The requesting library has to send a form for the actual request. The lending library has to send forms along with the item to specify due dates and shipping instructions. If the item's going across international borders, customs declarations have to be included. Oy veh. And as well, at both ends of the process, more forms are used for the requestor to tell the requesting library what they want, for the lending library to retrieve it from the stacks, etc... In the end, sending a single book from library A to library B could slurp up 10 or more sheets of paper. And then there's the form letters to the original request to tell them the item's arrived and that it can be picked up, and the form letters to tell the same person that it's now overdue and they owe $toomuch in overdue fines, etc...

No way to get around those 10 pages... paper trails must be maintained. But, there's quite a few stages to processing an ILL request. There's requests, returns, queries if the request wasn't specific enough ("did you really want 20 years worth of that magazine? or 20 pages from a specific issue?"), and status updates if either side isn't processing things fast enough. Each one requires digging out the paper trail and ticking off things on the forms to indicate what stage the process was in.

So now we've gone from a paper intensive manual process to a fully automatic .... paper intensive manual process.

Oh well... at least they were saving mad coin on postage due to emailing most of the queries and status updates. They could use some of it to help offset the cost of all that paper...

Now for the oddities in InterLEND...

1) A longstanding never-fixed bug was embedded in the mailer. There was no error handling on the POP end of things. If the mailbox was unavailable for any reason (network burp, typo in the user or server name, bad password, etc..), the InterLEND client again barf bits and crash.

When this bug was found during new version rollout testing, InterLEND's supplier claimed it was our fault for having triggered the bug. After all, who in their right mind would enter deliberately incorrect information into an application?

2) InterLEND was supposedly network-aware. This was in the marketroid features checklist somewhere. What this mean is that it could be run off a network drive without problem, and that it could do SMTP and POP email.

What it didn't mean is multi-user concurrency. Only one person could be using it to update the database at any time, or it would chew itself to bits. Reading, however, wasn't a problem. As we had two people working on ILL processing at the time, this presented a problem... Either they'd have to a) take turns running the app, or, horribly, b) be very careful about when either of them would do an update on the database.

So of course, they opted for b). At least the two people had desks side-by-side and could tell each other when they were doing an updates. But after a long day staring at forms, mistakes occasionally get made...

3) InterLEND for Windows, as mentioned above, brought in all the fancy GUI features that come with Windows (indeed, it seems like the vendor tried to cram in as many GUI widgets as they possibly could). Along with this is brought in proportionally-spaced fonts for the various printouts it could do. But those printouts were never updated to be "modern". Essentially they just dumped a plaintext ASCII-formatted page to the laser printer and told it to use Times-Roman, instead of an old dot-matrix the way they used to.

This meant the "nicely" designed forms with their columns and tab-aligned (well, looking it was tab-aligned) tables came out, at best, somewhat jagged. And of course there's no report designer of any sort anywhere in InterLEND. At most you could customize a few fields to specify return addresses and basic contact information.

This is where the backwards compatibility of the data files came into play. When it came time to print reports, they'd quit out of the Windows version, fire up the old DOS version, print whatever they had to, then hop out of DOS and back into Windows.

4) The GUI itself was never terribly stable. Simple things like scrolling around in a window could cause it to crash. Perhaps it was a race condition, as this only seemed to occur in windows that were displaying data retrieved from the database. But it wasn't frequent or consistent enough to say for sure.

But again, the vendor's response amounted to the old "Don't do that, then."

Fast forward a few years, and the decision comes down from on-high to replace InterLEND. Oh frabjuous joy! Something better! Finally... <insert sound of screeching tires, followed by prolonged car crash sounds>.

Government grants for upgrading library services were found to be available, so after a few rounds of drafts and submissions, some Federal moolah rolls in, and the upgrade process is set loose.

To select which product to migrate to, a... *gasp* CONSULTANT *gasp* was hired to do a study of available alternatives. The various libraries in the province here that used InterLEND were queried as to what their ILL staff would like from a new version, a specification and features requirement was cobbled together from the wish list, and the consultant was set loose.

After a somewhat (and curiously) short study person, the consultant pronounced that product X (I'm not mentioning what, since it's still in use here these days) would fit the bill perfectly. It had all the features, was actively maintained, the licensing costs were rather reasonable for an app of this power, and quite affordable, given the budgets of some of the libraries which would be using it. But that's not what impressed the various library representatives who were present at the dog & pony show. No, their primary comment about the whole process was about how cute the consultant was. This should've been a warning sign...

So with great fanfare, contracts were signed, install media was supplied, test servers were configured at deployed, and training sessions were organized at the provincial library's headquarters for initial testing and basic training. Everything seemed to go quite well, and people were returning from the sessions excited at what they'd be able to do when the system went live.

A few months later, after the initial training sessions were concluded and general approval of product X was noted, the decision was made to go live with it. Cue the car crash sound effects.

Product X was by its nature a client/server application. The server app ran on a server (woah, what a concept), used MS SQL Server as its data store, and generally acted as a front-end for the client apps, so they'd never have to directly interface with the database. As this system would be used by various libraries spread across the whole province, this saved a considerable amount on MSSQL licensing. Only one license would have to be purchased for the server, and a few CALs for the concurrent accesses from the front-end server app.

Everything seemed to be working perfectly. Test requests were humming along and zipping through the system, things were speading up considerably compared to the old InterLEND workflow, yay yeehaw pass the champagne. And then the rollout started, and the first remote clients began accessing the server to perform requests.

Now things started moving at a v.... e.... r..... y...... ssllloooowwwwwwwww crawl. The client application, which in testing had started up in <5 seconds from double-click to fully started, now took over 10 minutes to display the interface, and left the user stuck with a horrible splash screen that took up 50% of the desktop, and could not be put into the background or otherwise covered up.

If the user had the patience to sit through the the glacially slow startup, the client itself took nearly forever to respond to actions. Opening a query screen to search for a particular request would take 2 or 3 minutes to display the query interface. Other actions took similar periods to complete, or even start.

Much head scratching ensued. The administrations of the server could not replicate this behavior on their end, and were starting to pass the blame buck back to the users, or the users' IT infrastucture ("It works fine for us!"). The users' IT people were lobbing the buck back at the server admins, saying everything's fine on their ends, yada yada yada.

A bit of investigating by a coworker finds the real problem. Any actions in the client program are essentially round-tripped through the server front-end program. Think of Windows Vista's User Access Control, implemented in a client app. Performing any action would cause various requests to be sent to the server, i.e. "does this user have permissions for this?" or "what's the contents of table 'massive amounts of data'?".

And worse yet, many of these queries were the equivalent of an SQL "SELECT * FROM SOMETABLE", with the client rummaging through all the returned records for whatever it needed.

Aha! Now the slowdown becomes clear. All the testing and training sessions had been done via local networks at LAN speeds with some test data. There might have been a few minor speed hiccups, but those could be blamed on Windows in general, which everyone knows hiccups louder and harder than anything else in the universe. Fast forward to live deployment, with remote users accessing the system over the public internet over at best T1 lines (major centers), or at worst, 56k dialup (remote rural branches) with large amouns of production data. Suddenly the masses of data even the simplest action in the client trigger don't show up very fast anymore. And as the system becomes populated with more and more data, the slowdown will only get worse.

Some investigating reveals that product X came in two versions. One was the generally available "shrink wrap" type that the vendor supplied by default. A second version, custom-build for another library system which had also been bitten by the network speed "bug", had been rejiggered and optimized to support slower and higher-latency network links. Oops... the consultant had told everyone that product X was designed entirely with internet usage in mind. He never mentioned this special edition.

Essentially this second version sprinkled a little bit of the new-fangled miraculous power of SQL "WHERE" clauses to the client->server messages.

Things got.... somewhat less laggy. Now, clicking on a button would get a response in 5 or 10 seconds, rather than 5 or 10 minutes. Woah... a few extra bytes of client->server network overhead reduced wait times by about 1.5 orders of magnitude, and also reduced server->client bandwidth usage as well. Bonus!

So that's where things stand these days. Product X is in full production use by most libraries in this province. A few opted out of it, and are using other systems, or sticking with InterLEND. And as for InterLEND itself, most locations still use it a bit, slowly whittling away at the pile of old requests still active within it, longing for the day when that last request becomes completed and they can nuke the program from their systems with extreme prejudice. And the locations which have opted to stick it out with InterLEND? Well, we don't talk about those places much anymore....

The moral of all this? If the primary comment from people returning from the consultant's presentations on which product to recommend are "He's cute!", perhaps a different consultant (and group junket members) should be chosen. And perhaps the consultant's qualifications would be vetted as well, to see if he's got any library experience in his past. Experience which exceeds the "oh yeah, I've borrowed a book on occasion".

MarcB

Let the smackdown begin. Score so far: TDWTF denizens: Infinity +1 Swampy: NaN

@SpectateSwamp said:

You don't know horseshit. Write a search engine that does half what this one does. If you can. Go ahead show me something worthwhile that you have done that is simplified by your method of coding. Can others extend it? Can others understand it. There isn't too much difficult about this code. Except the line wrap and hi-lite logic and I flowcharted that for you.

Swampy, please accept this: Your software will never catch on with anyone but you. By design it requires the user to adapt their entire computing experience to its demands. This is not how utility software ever should work. You require users to fundamentally change the basis of modern graphical computing. If you bought a video camera that required you to jam it up your
left nostril, pound a spike into your head, and walk while standing on
your head, using your eyelids for locomotion, all BEFORE you could record even a single second of video, would you use that
camera? SSDS is the exact same thing:

Point and click to open a file? Better just type in a number.
Multiple document files? Keep it all in a single text file. Who needs timestamps...
Why bother editing video? Keep all the useless video and make a note explaining where the short 'good' segment in the middle is. Or better yet, just force everyone to watch the whole thing.
Use editing software to extract the good clip and upload a 1st generation copy to Youtube? Sheesh, of course the solution is to video a copy off the computer screen so you get all the joys of screen refresh tearing, reflections, double the compression artifacts, and double the motion blur because you can't even be bothered to follow your own advice and use a tripod.
Use variable names that explain their purpose intrinsically? Better use random alphanumeric sequences.
Functions to encapsulate commonly used code? Better just slap copies of the code everywhere.
Use gotos because they're better? I know of precisely ONE justifiable use of gotos in modern programming languages, and your code does not come anywhere near qualifying for that use.
Comments which explain what the code is intended to do? Better document what the code does. "x = 1; ' Set x to 1" is so very extraordinarly useful later on, because not everyone might know what an assignment statement in visual basic looks like.

Your software doesn't even qualify as 'spaghetti code'. At least spaghetti can be a tasty and healthy meal. Your software is composed of worms and maggots, simmered in septic tank residue.

Please, explain to use why your labels, such as "line_380921:" are
better and clearer than "process_text_contents:" or "decrypt_file:"
might be? What good is a variable named "aaa" or "AAA"? Would not saying "current_scan_position" or "textOffset" be clearer?

If it wasn't for your requirement of being able to play random videos, there is nothing in SSDS that would no be better implemented as a command line utility. Text is better suited to a text-only medium. You're slapping a very bad graphical interface on a very bad text interface, compounding the usability issues. Complain all your want about the "other" search engines, and how none are as good as SSDS, the fact remains that any user, new to your software, will not able to do anything without extensive training.

By comparison, anyone with the slightest familiarity with Windows will know what do do with a Vista or Google Desktop search. "Type your search terms here, and click this button". And guess what, without having to spend hours and days and months manually catalogueing their files, the "other" search engines will be able to find their data. Installing any real search engine is a few minutes of download time, a few minutes of install time, and an hour or two of automatic indexing, after which "it just works". By comparison, SSDS is only as useful as the time you put into building your own index. A virgin install of SSDS is utterly useless, and will REMAIN useless until you invest a serious chunk of time in building up its "index". A virgin install of Google Desktop is still useful, as you can search even as the index is being built in the background.

Write a search engine? Already done. And it's quite succesful, as well. Here's the guts of it:

SELECT items.id AS id, CONCAT(item_search.path, ' - ', items.name) AS name
FROM items
LEFT JOIN item_search ON items.id=item_search.id
WHERE MATCH (item_search.name, item_search.path, item_search.extra) AGAINST ($qkeyword IN BOOLEAN MODE)

(for those in the peanut gallery, yes, $qkeyword is properly quoted/escaped to prevent injection attacks, and no, I can't use parametric queries as this is one small part of a large union query which is built dynamically).

For Swampy, that's an SQL query against a MySQL InnoDB database, using a link flattened MyISAM table with FULLTEXT indexes. The flattened data takes approximately 3 seconds to build and insert into the database, which includes the fulltext indexing on-the-fly during insertion.

Complicated looking? Maybe.
Complicated sounding? Maybe.
Lots of buzzwords? Definitely.
Beyond your comprehension and abilities? Definitely.
Ability of users to understand what to do and find what they need? High

Doesn't matter how complicated or simple a search engine is in the background. The front-end (that's called "the interface", Swampy) hides all of that. An ideal search engine is a single text box to enter search times, and a button to initiate the search. And guess what, that's what my search engine is. Complicated in the background, utterly simple in the foreground.

And with 3 seconds to build the index, it's fully useful and has the complete data corpus (about 500 megabytes) available. How much could you find with a virgin install of SSDS and 3 seconds of time to work with it?

MarcB

@SpectateSwamp said:

That's why I have never had any takers on a desktop search challenge. Even moreso now, when they can look at and test out this wonderful program. They know the Spectate Swamp would make them look bad. There search came from internet search. What rubbish.

You cannot claim a "victory" for your search app merely by the lack of takers in a challenge. Yours is NOT the superior technology merely because no one else cares to "challenge" it. I highly doubt they "know the Spectate Swamp would make them look bad". I am completely certain that they do not even know you exist.

Consider it from this angle: Your DesktopSearch is so miserably inferior, your challenge is the equivalent of an amoeba (DesktopSearch) inviting an elephant (Everyone else) to duke it out. They don't answer your challenge because they don't even realize you exist.

I offer this as a parallel: From time to time, certain companies will try to make a bit splash in the tech world by claiming unbreakable encryption, totally secure system, unhackable firewalls, etc... etc... They "prove" their claim by challenging one and all to do whatever it is their product claims to guard against. Maybe they'll offer a prize, maybe they offer nothing. But invariably, when their challenge (and prize) go unclaimed, they try to make further waves claiming that the lack of winners in the challenge has proved their technological claims are true.

This is exactly the situation you find yourself in. You challenge us all to try your program. None of us do. You take this to mean that we admit you're right. But that's exactly wrong, we don't bother to try your program because we can see (and you yourself have admitted) that it's utterly useless.

And hey, since you're so obviously advanced in search technology, light years beyond the existing stuff, why are you not in line at the patent office? Were your claims true, you'd be sitting on potential metric buttload of cash from patent royalties and licensing fees. Why aren't you down in Seattle knocking on Bill Gates' door? They just bought FAST to gain search technology... If only they'd know about your program, they might have saved some of the $1.2 billion that FAST cost them. And better yet, you've written the stuff in their own personal toy language! Why are you wasting your time here on us lamebrained search-deprived know-nothings?

MarcB

@jetcitywoman said:

We seem to be caught between working too much and not having any time to spend our money, or getting time off but being too poor to spend it. How do you guys afford travel?

It's a combination of (oddly enough) higher taxes and greater social services. You can look at it this way: America has a "do it yourself" type governmental system, and Europe has "here, let us do it for you". Not particularly accurate, but it's a fair way to look at it on the gross detail level.

If you want to focus in on a particular detail, try public transportation. I know, from personal experience, that public transit in Switzerland is by far far far better than anything available here in Canada or the US. It seems expensive - a universal annual adult pass is 3,100 CHF (CHF/USD are roughly at par these days), but in exchange you get unlimited travel on all public transit systems - bus, rail, tram, boat - anywhere in the country. Think of how much the average American commuter spends on gas and parking and you'll quickly exceed that 3100 within a few months. And then factor in that it's a national pass - imagine being able to hop on a Greyhound or Amtrac and go from NYC to LA using your local bus pass. Such a thing may never be possible in the US or Canada because of the sheer distances involved

MarcB

@morbiuswilters said:

Any product can be produced almost anywhere and if it can be done better in one place rather than another, the former will generally reduce the potential revenue of the latter.

True, but some things simply can't be moved. A phone support job can be anywhere, but the technician who does the housecall to replace the sticky R key on the user's keyboard will generally be local, and work for the local contractor/agency who has the best pay/leave package.

@morbiuswilters said:

However, restrictive laws like the European labor laws lower the value of allocating resources there which means countries with less-restrictive laws tend to remain more competitive.

Of course, European labor laws do help out in other ways - there's no generally no (or at least, much less) need for corporate health plans, because of universal medicare. That's one major expense out of the way that American companies have to swallow.

MarcB

@merreborn said:

I'm pretty sure all of their modern processors feature Cool'n'Quiet, which should cause the chip to gracefully clock down if cooling fails.

CnQ is more there to throttle down clock speed when the system's idle, though the thermal monitoring and throttling plugs into it.

The original "quake meltdown" video was made to show how ineficient AMD's thermal monitoring was - the older Athlons didn't have a thermistor built into the cpu package. It was up to the motherboard makers to put in somewhere under the socket. SInce it wasn't on the cpu package, it couldn't detect an overheating situation quickly, as the whole cpu package had to get hot, plus heat up the small air gap between the cpu and the sensor, by which time those particular Athlons were smoking. Intel, by comparison, had the thermistor built into the cpu package and could detect overheating far faster and react accordingly.

That being said, my personal preference for CPU brands is based more on bang per buck than fanboi-brandism. I was loyal to AMD for a while, because Athlons kicked Intel ass all over the market, but my current desktop is an Intel (Core 2 Due E6600), and for the forseable, any future computer purchases or upgrades will be Intel as well, unless AMD manages to come out with another Athlon-style beast. The upcoming Nehalem cpu from Intel sounds like it's going to not only kick AMD's ass, but wipe the floor as well. But we'll have to wait and see if the hype and early benchmarks hold up once it actually hits retail.

MarcB

@snoofle said:

thou shalt not use company resources (eg: computer, electricity, internet connection) for non-business (aka personal) use.
...
My friend and I will be taking our full lunch hours outside from now on.

Better make sure you coordinate things so that you step outside just as lunchtime hits - don't want to get dinged for using company air and heat/ac while on personal time. And don't step on any company grass while leaving the grounds - those blades are expensive.

MarcB

I'd add a couple more bits to the Etiquette section:

a) Do not needlessly resurrect old threads. If the last reply in a thread is more than 1 week old, please consider the discussion as "mostly dead", and anything beyond 2 weeks as completely dead.

b) Do not spam the forums with 'me too!' type posts. This is not AOL.

MarcB

@MasterPlanSoftware said:

My guess? All of them. We are talking about FF users, after all.

Exaggerate much?

MarcB

@morbiuswilters said:

My point is that waiting on it to be released is really lame, especially considering the source is already out there.

Yes, but how many of FF's users would have the required stuff to do a full compile of the source, and even know what a compiler was, let alone be able to use the thing?

MarcB

@vt_mruhlin said:

The virus was removed by either their or your mail server. I certainly know I'd prefer it if neither of said mail servers were able to access data on my machine without me explicitly sending it to them....

Nope. I disabled the AVG certification taglines on my system, and the mail server I retrieve this from isn't running an AV scanner. Beside, if it was my system that caught the trojan, it'd be in my virus vault, and that's empty, and I would have gotten a notification from AVG as the mail was retrieved.

MarcB

@MasterPlanSoftware said:

Is this your first time receiving spam?

No, but it's the first time I've seen spam coming from a subverted system that neutralized the main threat of the spam before it ever got out onto the wire.

MarcB

Found this in my Inbox this afternoon:

I find it funny that this person's AVG (which is up-to-date) caught the outgoing virus and killed it, but seems to have missed whatever was sending it on that machine in the first place. Looking at the mail's headers shows it originated from Israel.

I love spam, it's so multicultural. Israeli spam pretending to be Danish, attacking an American mailbox.

MarcB

@MarcB

Best posts made by MarcB

Swampy PWNED by his own words and challenge.

Let the smackdown begin. Score so far: TDWTF denizens: Infinity +1 Swampy: NaN

Latest posts made by MarcB