Cwmail address books



  • My task today is to convert 200 users' address books from the old webmail system (which ran something called CWMail) to the new one. I was wondering why (and how) the hell there was so much variation in the address book formats -- some use commas as delimiters, some use = signs, some use copious numbers of spaces, some use angle brackets to surround e-mail addresses, some don't...

    I just learned the reason, and I'm not impressed: CWMail's idea of an "address book" is a free-form text box into which the user can type whatever they want.


  • Garbage Person

     So just split over every not-valid-for-email character (including whitespace). You should end up with a bunch of this:

     

    [This], [is], [a], [name], [example@example.org] - pick out every valid email address, make everything before it go into the name field (if you have both a first and last name field your new webmail client is annoying and should be killed) and drop the email address into the email address field. Store the original file somewhere so you can go fishing and manually fix ones on a per-complaint basis.



  • Thanks, that sort of worked. I've gotta learn to keep in mind that not everyone cares about data integrity -- if they did, they wouldn't use inconsistent formatting in their freeform address books :)


  • Garbage Person

    @thetrivialstuff said:

    Thanks, that sort of worked. I've gotta learn to keep in mind that not everyone cares about data integrity -- if they did, they wouldn't use inconsistent formatting in their freeform address books :)
    Most people think the computer is a magic box and have no idea what data integrity might even begin to mean. ... Hell, I know several practitioners of the trade who fit both of those statements.

     Also, some developers are of the mindset that their data structures should be as convoluted andimpossible-to-convert-to-something-else as possible as a "security feature" to protect "their data" from being converted to competitor's platforms. At one point I came across a database in a ticketing system in which every piece of data (even the column and table names) was encrypted. 



  •  @Weng said:

    So just split over every not-valid-for-email character (including whitespace).

     No, this is bad.  Check this before you do that.

     http://www.remote.org/jochen/mail/info/chars.html

    "The SPACE character can be used in email addresses if quoted properly, but would be very confusing. Don't use it."

     You can skip the whitespace pretty safely (if you care, figure out the escaping), but almost all valid characters are valid for email addresses, so be careful!  There are some weird ones out there that are actually still in use.  Really, you can only split on control characters and DEL safely.

    Yes, I am pedantic.


  • Garbage Person

    @shepd said:

     @Weng said:

    So just split over every not-valid-for-email character (including whitespace).

     No, this is bad.  Check this before you do that.

     http://www.remote.org/jochen/mail/info/chars.html

    "The SPACE character can be used in email addresses if quoted properly, but would be very confusing. Don't use it."

     You can skip the whitespace pretty safely (if you care, figure out the escaping), but almost all valid characters are valid for email addresses, so be careful!  There are some weird ones out there that are actually still in use.  Really, you can only split on control characters and DEL safely.

    Yes, I am pedantic.

    Yeah, but the original address book wouldn't have worked anyfuckingway if they actually followed the specs.


  • @shepd said:

     @Weng said:

    So just split over every not-valid-for-email character (including whitespace).

     No, this is bad.  Check this before you do that.


    That's why I said it "sort of" worked -- I checked all the output by hand anyway (what can I say; I'm paid by the hour :P  fixing the occasional glitches was still faster than manually converting, though).

    @Weng said:

    Yeah, but the original address book wouldn't have worked anyfuckingway if they actually followed the specs.

    The problem was that there was no spec -- the instructions next to that text box just said "type your addresses here" or something like that. Can't really blame the users for once (except the ones that decided to change their format midway through their own file...)



  • @thetrivialstuff said:

    The problem was that there was no spec -- the instructions next to that text box just said "type your addresses here" or something like that. Can't really blame the users for once (except the ones that decided to change their format midway through their own file...)

    There is a special place in hell for people that make forms like this.

    I used to do inventory and asset management for a large Department of Defense contractor. The system of record used a data dump of HR's people soft data so I would know who was who. That dump had phone numbers in about 40 different formats due to phone numbers being entered in a free-form field with no validation:

    555-555-5555

    1-555-555-5555

    9-1-555-555-5555

    55/5-5:55555

    bob's cell phone: 555 555 5555  

    555-5555 but call my cell at 555-444-5555-12345

     This got dumped into my tool which would only let me dispatch tickets in a 555-555-5555. I had no control over how the data was shoveled into my tool, the data could only be fixed through a tool that took 6 minutes to load and all of my crys, threats and blackmails to get the format fixed was answered with "Can't you just deal with it? It's only a few hundred thousand records. And we only destroy all of your changes once a week."

     

    But I guess I'm just getting bitter at my old age of 30.

     



  • @Sho_Asylumn said:

     This got dumped into my tool which would only let me dispatch tickets in a 555-555-5555. I had no control over how the data was shoveled into my tool, the data could only be fixed through a tool that took 6 minutes to load and all of my crys, threats and blackmails to get the format fixed was answered with "Can't you just deal with it? It's only a few hundred thousand records. And we only destroy all of your changes once a week."

    So the real WTF is your dispatch ticket system which can only handle North American phone numbers!


Log in to reply