Never let a client touch a CSV file! Even if they insist!

wdl

One of our clients wanted an export of the content that was in their Content Management System, in the form of a CSV file. When asked why they needed it they replied they wanted to translate the content, then give it back to us so we could import back into the system. When asked why they couldn't just use the CMS to do that, they said using a CMS is too complicated for a translating agency. This is already a WTF, but hey, they are the client, if they want to pay for it.... even after we warned them multiple times this isn't a good idea...

Ok, so eventually we agreed to export the data. (yes I know, giving a client a CSV file is a WTF)

We created a CSV file with 3 colums, the project ID, the item ID, and the content. That way we could use the CSV to update the existing fields in the database. We explained them it was very important we got the file back with all the records intact. We warned them multiple times not to touch the ID coulumns, and not to swap the order of the content... only translate the content fields.

I knew this would go wrong.... and after 2 weeks of silence we got the mail we we all waiting for... Our client had opened the CSV file in excell, selected the content column. Copied it (as 1 long string), pasted it into notepad, and saved it as 1 HTML file (because... the translation agancy had asked them for HTML). Offcourse this"html" file was a big pile of garbage.... but the translation agency went ahead and translated that monster file...

Now our client sends us that "HTML" file and asks us to import it back into the database!!! Oh my... !

snoofle

But.... it appears they heeded your warnings and didn't modify the csv file.

Non-technical clients can not be trusted with even the most basic task; you must be brutally detailed in your instructions:

Dear Client,

Here are all the CMS records, one per file named: PrjId_ItmId.txt, wrapped in a zip file. Please have a corresponding set of IDENTICALLY NAMED FILES containing the translated text prepared and wrapped in a single zip file returned to us.

...and even then, it might not be specific enough.

dtobias

Where I work, people will invariably open up CSV files in Excel, even when no changes to them are needed, and if you need a copy of the file to import into something they'll give it to you by saving it out of Excel, which mangles all sorts of things, such as converting account numbers into exponential format and dropping leading zeroes from zip codes.

mott555

Someone needs to invent a programming language and compiler that targets the Retarded Client Instruction Set (RCIS). Create your client instructions exactly as you would a software application, then compile it into client-readable form, send them the result. Then they just need to follow the instructions verbatim.

Bumble_Bee_Tuna

I think the key here is samples. I.e. "I'm going to send you a file like this:

1,1,example,

1,2,sample text,

1,3, cat

And expecting to receive a file like this:

1,1,ejemplo

1,2,muestra texto

1,3,gato

Here is a sample file with 3 entries, please translate this and send me the results so I can be sure we're on the same page."

Even that's not foolproof, since there's nothing to stop them from using an entirely different process to translate the sample file as they will for the real file. But it's a good start.

serguey123

@Bumble Bee Tuna said:

I think the key here is samples. I.e. "I'm going to send you a file like this:

1,1,example,

1,2,sample text,

1,3, cat

And expecting to receive a file like this:

1,1,ejemplo

1,2, texto de muestra

1,3,gato

Here is a sample file with 3 entries, please translate this and send me the results so I can be sure we're on the same page."

Even that's not foolproof, since there's nothing to stop them from using an entirely different process to translate the sample file as they will for the real file. But it's a good start.

FTFY, translation tools suck ass

blakeyrat

@mott555 said:

Someone needs to invent a programming language and compiler that targets the Retarded Client Instruction Set (RCIS). Create your client instructions exactly as you would a software application, then compile it into client-readable form, send them the result. Then they just need to follow the instructions verbatim.

You can't get mad at the customer for following your directions. It's not their fault you provided incomplete directions.

And calling them a "retard" because they didn't have telepathy to know what you meant by "don't touch the CSV file", that's just being an asshole.

wdl

The sad thing is our client DID understand our instructions. Trust me, we explained it thoroughly. But... after receiving the CSV file from us and the translation agency telling them they would prefer HTML files... they decided they could convert the CSV to HTML themselves, by simply copying the content colum into Notepad, and then saving it as a .html file. Simple, isn't it? And they honestly thought that after the translation they would have been able to just copy the content out of the "html" file from notepad and paste it back into the CSV file. Sigh....

PJH

Might it have been 'safer' to rename the file to .txt? (Or .html) with instructions not to touch the numbers at the start of each line.

Or, even better, liaise with the translation company directly (or via a client contact) for a mutually compatible format/process. (Though if the translators can't cope with a CMS, then I think you've lost to start with.)

By the way, what's happening with the crap that did come back? Is it being used? Sent back to the client to be 'fixed?' 'Fixed' in-house?

b_redeker

@wdl said:

in their Content Management System

This is what was bugging me. Does their CMS not support the whole translation proces in a proper way, for instance making a screen available in which you could do that?

Then again, I think many agencies prefer to edit xls file. You would probably have been safest to create an XLS file with the ID column and original text disabled, and an extra column for the translated text that the agency can edit. And next time, offer to contact the agency directly. It often helps prevent WTFs (or at least you can only blame yourself that way).

PJH

@b-redeker said:

Then again, I think many agencies prefer to edit xls file. You would probably
have been safest to create an XLS file

Since all the client did was copypasta the 3rd column to a separate file for the translation company rather than pass the file intact, I'm sure this wouldn't have worked.

robbak

So, they requested a CSV file .... and then did that to it?

Too dumb to live, too thick to die.

I'd like to see what a transation agency would make of those short strings out-of-context. Sounds like a plan for a wtf website.

dtobias

@PJH said:

Since all the client did was copypasta

Mmmmm... pasta!

RogerWilco

@robbak said:

I'd like to see what a transation agency would make of those short strings out-of-context. Sounds like a plan for a wtf website.

Yeah. I agree that someone who thinks that you can make proper translations of such a mess, doesn't understand the complexities of translations either.

I've supported internationalized applications. The worst was when the boss (who only knew one language), asked a colleague who had lived in the target country for a while to make the translation. The result was terrible. My language skills aren't great, but this violated some of the most basic rules of grammar and spelling in that foreign language.

People often don't have a clue how hard it is to translate something properly.

The_Assimilator

@RogerWilco said:

@robbak said:
I'd like to see what a transation agency would make of those short strings out-of-context. Sounds like a plan for a wtf website.
Yeah. I agree that someone who thinks that you can make proper translations of such a mess, doesn't understand the complexities of translations either.
I've supported internationalized applications. The worst was when the boss (who only knew one language), asked a colleague who had lived in the target country for a while to make the translation. The result was terrible. My language skills aren't great, but this violated some of the most basic rules of grammar and spelling in that foreign language.
People often don't have a clue how [b]difficult[/b] it is to translate something properly.

FTFY.

RTapeLoadingError

I'd definitely factor two things into the estimates:

1) The time required to write some sort of script to test the validity of the returned data;
2) Rework of the inevitably mangled files

Once the client has had it spelled out that there is a cost implication of returning unusable data they should hopefully take the steps required to ensure that it doesn't happen.

Yes, I do have a kind of naive optimism....

I would have given them an Excel sheet and added a column for translated text.

operagost

You will live longer

If you will only note this:

Clients are stupid.

RTapeLoadingError

But that works out well. Either they pay us to do the work or to fix it when they do it themselves.

RayS

This is why people should consider CSV as an export-only format, even if people aren't involved. Throw a " in there and you'll find 4 different applications handle it in 4 different ways. Not even an intelligent user (not that such a thing has ever existed) could reliably navigate the compatibility minefield of CSV.

Scarlet_Manuka

@Bumble Bee Tuna said:

I think the key here is samples. [...] Even that's not foolproof, since there's nothing to stop them from using an entirely different process to translate the sample file as they will for the real file. But it's a good start.

Sample files certainly aren't a panacea. We were doing a data cleansing exercise wherein we would generate extracts, send them to an external agency for the cleansing, and update our system with the changes. So we sent them a sample, they cleaned it up, and I wrote an import interface based on the format of the cleaned sample file.

Over the course of using the real files, it became apparent that they only sent the columns that were applicable to a particular extract. So between one extract and the next you might have one or two columns disappear and another couple appear that weren't there before. They also changed the delimiter - I think that was after the first extract.

In the end I wound up writing a pre-processor to put the returned file into the format the import interface was expecting, and manually checking each file for any additional surprises. I was glad when we finished.

A sample file is still a good idea. But some people will screw with you no matter what you do :)

Jaime

@Scarlet Manuka said:

@Bumble Bee Tuna said:
I think the key here is samples. [...] Even that's not foolproof, since there's nothing to stop them from using an entirely different process to translate the sample file as they will for the real file. But it's a good start.
Sample files certainly aren't a panacea. We were doing a data cleansing exercise wherein we would generate extracts, send them to an external agency for the cleansing, and update our system with the changes. So we sent them a sample, they cleaned it up, and I wrote an import interface based on the format of the cleaned sample file.
Over the course of using the real files, it became apparent that they only sent the columns that were applicable to a particular extract. So between one extract and the next you might have one or two columns disappear and another couple appear that weren't there before. They also changed the delimiter - I think that was after the first extract.
In the end I wound up writing a pre-processor to put the returned file into the format the import interface was expecting, and manually checking each file for any additional surprises. I was glad when we finished.
A sample file is still a good idea. But some people will screw with you no matter what you do :)

I stopped giving people sample files. In my experience, the people that are going to be a problem are the same ones that will simply not read the spec and hack out a solution that works on the sample file. My most recent example was when I sent a sample XML file to a team to import. They got the sample working, but our initial rounds of testing failed. It turns out that the sample, which was made by hand with a text editor, had a carriage return after the XML declaration, but the file spit out by the application didn't. Of course, the only way that a carriage return at this location could have caused a problem is if they were reading the XML with a hand written parser.

Zylon

@Jaime said:

I stopped giving people sample files. In my experience, the people that are going to be a problem are the same ones that will simply not read the spec and hack out a solution that works on the sample file. My most recent example was when I sent a sample XML file to a team to import. They got the sample working, but our initial rounds of testing failed. It turns out that the sample, which was made by hand with a text editor, had a carriage return after the XML declaration, but the file spit out by the application didn't. Of course, the only way that a carriage return at this location could have caused a problem is if they were reading the XML with a hand written parser.

So when YOU give the client a hand-generated sample file that doesn't correspond with the actual application data, that's fine. But when THEY use their hand-written parser that correctly handles the sample data but then chokes on the application data, they're a bunch of screwups.

The irony is palpable.

HighlyPaidContractor

@Jaime said:

I stopped giving people sample files. In my experience, the people that are going to be a problem are the same ones that will simply not read the spec and hack out a solution that works on the sample file. My most recent example was when I sent a sample XML file to a team to import. They got the sample working, but our initial rounds of testing failed. It turns out that the sample, which was made by hand with a text editor, had a carriage return after the XML declaration, but the file spit out by the application didn't. Of course, the only way that a carriage return at this location could have caused a problem is if they were reading the XML with a hand written parser.

On my current project, I wrote my solution based off the spec (fixed width). When I ran through the sample file, it crashed and burned because it was comma delimited with quoted strings (and a different field order). The client response? "The documentation was wrong."

Jaime

@Zylon said:

@Jaime said:
I stopped giving people sample files. In my experience, the people that are going to be a problem are the same ones that will simply not read the spec and hack out a solution that works on the sample file. My most recent example was when I sent a sample XML file to a team to import. They got the sample working, but our initial rounds of testing failed. It turns out that the sample, which was made by hand with a text editor, had a carriage return after the XML declaration, but the file spit out by the application didn't. Of course, the only way that a carriage return at this location could have caused a problem is if they were reading the XML with a hand written parser.

So when YOU give the client a hand-generated sample file that doesn't correspond with the actual application data, that's fine. But when THEY use their hand-written parser that correctly handles the sample data but then chokes on the application data, they're a bunch of screwups.

The irony is palpable.

Are you sure you weren't on that other team? My hand built sample met the specification. The application output also met the specification. The problem was that they built their parser around the implicit specification reverse engineered from the sample instead of simply reading the specification.

Also, are you suggesting that it is impossible to agree on an interchange specification before either application is built? Are you defending people that wrote an XML parser? A general rule of data interchange is to be strict when writing and loose when reading. The outcome that happened required at least two rookie mistakes on their part.

Jaime

@HighlyPaidContractor said:

@Jaime said:
I stopped giving people sample files. In my experience, the people that are going to be a problem are the same ones that will simply not read the spec and hack out a solution that works on the sample file. My most recent example was when I sent a sample XML file to a team to import. They got the sample working, but our initial rounds of testing failed. It turns out that the sample, which was made by hand with a text editor, had a carriage return after the XML declaration, but the file spit out by the application didn't. Of course, the only way that a carriage return at this location could have caused a problem is if they were reading the XML with a hand written parser.

On my current project, I wrote my solution based off the spec (fixed width). When I ran through the sample file, it crashed and burned because it was comma delimited with quoted strings (and a different field order). The client response? "The documentation was wrong."

This problem could also have been prevented by not providing samples. Both providing specs and samples can be seen as duplication, if the sample is seen as an implicit spec. My guess is that when they updated from fixed width to comma delimited, they updated the "spec". Of course, they only updated the implicit spec, not the explicit one.

Scarlet_Manuka

@Jaime said:

I stopped giving people sample files. In my experience, the people that are going to be a problem are the same ones that will simply not read the spec and hack out a solution that works on the sample file.

The spec? You get to have a spec for these things? I was just told "This file is what the data will look like, write an interface to import it."

My most recent example was when I sent a sample XML file to a team to import. They got the sample working, but our initial rounds of testing failed. It turns out that the sample, which was made by hand with a text editor, had a carriage return after the XML declaration, but the file spit out by the application didn't. Of course, the only way that a carriage return at this location could have caused a problem is if they were reading the XML with a hand written parser.

Wait, you're blaming them for the failure, even though the bit that changed between sample and implementation was on your end?

OK, I see later you say

My hand built sample met the specification. The application output also met the specification. The problem was that they built their parser around the implicit specification reverse engineered from the sample instead of simply reading the specification.

That makes more sense, but it wasn't very clear from your earlier post. I still think that if you're providing a sample file, you have a responsibility to make sure it's in the same format that you'll be using for the real data. And as we all know, it's easy for a specification to be read differently by different groups, so you can't say "if people just built to the spec we wouldn't have these problems", which seems to be your point. Of course, getting everyone to build to the spec would still be an improvement. :) But do you also get angry with everyone who doesn't accept your special-character, quoted-local-part-with-nested-comments email address? After all, it meets the spec...

I guess the moral of the story is that you can't trust your clients with XML data either. Or really, anything at all, ever.

ender

@Scarlet Manuka said:

That makes more sense, but it wasn't very clear from your earlier post.

It's XML - having a linefeed after the <?xml version="1.0" encoding="utf-8"?> doesn't change the file format.@Scarlet Manuka said:

I still think that if you're providing a sample file, you have a responsibility to make sure it's in the same format that you'll be using for the real data.

Having linefeeds in XML files can depend on the library that generates them, or a setting in the library - but (not) having linefeeds should have no effect on parsing ability of the file (look at OpenOffice.org - by default it generates XML files with no linefeeds, but you can toggle a setting, and it'll output XMLs with linefeeds and indentation).

Jaime

@Scarlet Manuka said:

@Jaime said:
I stopped giving people sample files. In my experience, the people that are going to be a problem are the same ones that will simply not read the spec and hack out a solution that works on the sample file.
The spec? You get to have a spec for these things? I was just told "This file is what the data will look like, write an interface to import it."
My most recent example was when I sent a sample XML file to a team to import. They got the sample working, but our initial rounds of testing failed. It turns out that the sample, which was made by hand with a text editor, had a carriage return after the XML declaration, but the file spit out by the application didn't. Of course, the only way that a carriage return at this location could have caused a problem is if they were reading the XML with a hand written parser.
Wait, you're blaming them for the failure, even though the bit that changed between sample and implementation was on your end?
OK, I see later you say
My hand built sample met the specification. The application output also met the specification. The problem was that they built their parser around the implicit specification reverse engineered from the sample instead of simply reading the specification.
That makes more sense, but it wasn't very clear from your earlier post. I still think that if you're providing a sample file, you have a responsibility to make sure it's in the same format that you'll be using for the real data. And as we all know, it's easy for a specification to be read differently by different groups, so you can't say "if people just built to the spec we wouldn't have these problems", which seems to be your point. Of course, getting everyone to build to the spec would still be an improvement. :) But do you also get angry with everyone who doesn't accept your special-character, quoted-local-part-with-nested-comments email address? After all, it meets the spec...
I guess the moral of the story is that you can't trust your clients with XML data either. Or really, anything at all, ever.

Once again, point missed. If I provide a spec that states the file will be XML and then some detail of what would be in the XML file, then I'll get one of two answers: "OK", or "I don't know XML". If I provide an XML sample, then I'll always get an affirmative response, but some people will parse it properly as XML and some people will treat it as a really verbose plain text file (with disasterous results). The spec only route prevents misunderstandings.

Even better, if I provide nothing but an XML Schema, people who don't understand XML are guaranteed to reject it as garbage while people who know XML will open it in XMLSpy and have their own sample in thirty seconds.

Jaime

@Scarlet Manuka said:

The spec? You get to have a spec for these things? I was just told "This file is what the data will look like, write an interface to import it."

That would be the "example only" route. Look at where that got you. Samples are false security and their existence should be taken as a warning that there is going to be a problem.

Scarlet_Manuka

My point was that some of us don't get to have a choice. I'm happy for you that you have the luxury of an agreed specification to work to, but don't assume everyone else is in a position to make that happen.

In any case, did you miss the bit where they changed the delimiter between one file and the next? If they're going to do that sort of thing, I don't think the presence of a spec would have restrained them.

Jaime

@Scarlet Manuka said:

My point was that some of us don't get to have a choice. I'm happy for you that you have the luxury of an agreed specification to work to, but don't assume everyone else is in a position to make that happen.
In any case, did you miss the bit where they changed the delimiter between one file and the next? If they're going to do that sort of thing, I don't think the presence of a spec would have restrained them.

It might have. Specs define the rules and make it easier to see when they are changed. Example don't work the same. If example #1 has comma delimiters and example #2 has tab delimiters, someone could simply say "the delimiters are configurable" or "it's the receivers responsibility to infer the delimiters" after the fact. As I said before, I all I got was examples and no spec, that would be the first topic I schedule a meeting for. If they can't give me a spec, then I assume it's going to be an absolute mess and triple the estimate.

As for "not having the luxury", of course there are bad projects. However, I don't jump into coding until I get a reasonable idea of what's expected of me. If the project blows up because of what the other side does, it's not the end of the world. The important part is to do your best to get over these humps before you sink a lot of time into code written for the wrong file format.

Scarlet_Manuka

Your job is not the same as mine :) In this particular case, it wasn't some kind of shared development effort - my company was just using this other company's data cleansing service. I didn't have any contact with the other company at all (my manager did, but I don't know if it was much more than sending them the files and authorising payment for the charges). It wasn't ever going to be any kind of ongoing thing; it was essentially a one-off job (but with the data broken up into five or six pieces).

I'm not trying to make some argument against working from specs here. Specs are great. On our internal projects, we at least [i]try[/i] to have everything specced out. Sometimes we even manage to keep the spec up to date right until the production rollout. :) But for something like this - which was also under extreme time pressure, incidentally - it wasn't really an option. Nor was it all that much of a problem. I only mentioned it because it seemed relevant to the conversation, not because it left major scars on my soul.

For an amusing contrast, there's an annual data extract I do which has a spec, but a really annoying and brain-dead one. (I've mentioned it here before, I think.) Basically it's tracking loan accounts over a period of several years. For each account there's some information that is static and some information that varies from month to month. So for this extract, for our internal copy of the data I have one table for accounts and one table for revenue; the revenue table has the account ID and the month as keys. The format I have to send to the company that does the analysis is this: for each month, I create an Access database with a single table in it. That table is a join between the account data and the revenue data for the given month. So I'm sending them around 70-80 Access databases, each with one table, with a significant amount of redundant information. But hey, that's the spec (I've questioned it and offered alternatives, but that's the way they want it).

I have that project to thank for forcing me into the maze of horrifying WTF that is Access VBA (because modifying the query for each month and re-running it gets old once the number of months gets above 40 or so). I used to think that Excel VBA had some strange quirks... Access VBA makes Excel VBA seem elegant.

danixdefcon5

@Jaime said:

My most recent example was when I sent a sample XML file to a team to import. They got the sample working, but our initial rounds of testing failed. It turns out that the sample, which was made by hand with a text editor, had a carriage return after the XML declaration, but the file spit out by the application didn't. Of course, the only way that a carriage return at this location could have caused a problem is if they were reading the XML with a hand written parser.

Sounds like one developer we had at one of my previous jobs. The sad thing is that while all my apps actually used an XML parser, the XMLy stuff wasn't the main part of my programs. However, the dude who was "parsing" the XML was doing so for a specific module where reading/writing XML was its main job!

The parts of that code I inherited made my head scream, and then I spent the next 2 hours ripping out the offensive code and sticking in a generic XML parser. I wasn't allowed to drop-in my replacement, as the original code had not blown up ... yet.

HighlyPaidContractor

@danixdefcon5 said:

@Jaime said:
My most recent example was when I sent a sample XML file to a team to import. They got the sample working, but our initial rounds of testing failed. It turns out that the sample, which was made by hand with a text editor, had a carriage return after the XML declaration, but the file spit out by the application didn't. Of course, the only way that a carriage return at this location could have caused a problem is if they were reading the XML with a hand written parser.

Sounds like one developer we had at one of my previous jobs. The sad thing is that while all my apps actually used an XML parser, the XMLy stuff wasn't the main part of my programs. However, the dude who was "parsing" the XML was doing so for a specific module where reading/writing XML was its main job!

The parts of that code I inherited made my head scream, and then I spent the next 2 hours ripping out the offensive code and sticking in a generic XML parser. I wasn't allowed to drop-in my replacement, as the original code had not blown up ... yet.

At a prior job, the existing system stored massive xml docs in a SQL Server table. When querying for information on a specific user, this wasn't very efficient, but also wasn't much of an issue. (pull xml, parse). When I wanted to pull information on all users who met specific criteria, I had to either pull the entire table, or write queries that internally parsed the xml.

SELECT [userName],[xmlString] FROM userdata WHERE [xmlString] LIKE '%<roles value="%4%">%'

I was also reprimanded for "not properly understanding XML"

wrack

What version of SQL Server? Because 2005 and upwards lets you do XPath and such queries directly inside the table (assuming the xml is in a column of type xml)...

Weng

@bannedfromcoding said:

What version of SQL Server? Because 2005 and upwards lets you do XPath and such queries directly inside the table (assuming the xml is in a column of type xml)...

Cool. Better file that one away for future usage.