NIST is really, really hip



  • Okay, so I'm trying to find some acronym soup known as FIPS codes from NIST. 

    Now, if you know anything about (The) National Institute of Standards and Technology, you'd figure they'd spend at least some of their $862mil FY2010 budget to make this data programmatically accessible. I mean, it's their goal to advance "measurement science, standards, and technology in ways that enhance economic security and improve quality of life" after all. 

    So how do you think that this data would be made accessible? Perhaps it exists as an XML file with a decade old, over-engineered schema? Would it be in a MySQL, or, perhaps more enterprisey, an MS Access database?  Or... does it live in an Excel spreadsheet?

    Okay. Well, the worst thing that could possibly happen is that there will be some need to scrape some HTML elements. Don't trip.

    Getting googly leads us here:  http://www.itl.nist.gov/fipspubs/fip6-4.htm

    Okay, so far it's just a plain old site. However, we see our first red flag at the bottom of the page:

    Last modified: May 10, 2002

    Well, whatever. Surely they had Excel back then! Let's continue to "teh codez":  http://www.itl.nist.gov/fipspubs/co-codes/states.htm

    Oh... oh god. This looks like it's out of some hellish 1990s "web development" craze. Looking at the source confirms this: it still uses the <center> element, deprecated in 1777. But whatever, we're just going to access this data via C# so we don't really need to do anything with the site, just get the link to "all the states and US territories in ASCII format *without* HTML tags" - wait, what?

    ...why is this a text file? Why is there no consistent column numbering? Why are there comments associated with tildes and hyphens?

    Why hasn't this been updated since nineteen fucking ninety?!?!



  • According to the census website FIPS has been replaced, here is a link to all the new information [url]http://www.census.gov/geo/www/ansi/ansi.html[/url]



  • Could be worse. You could be trying to get your encryption project FIPS 140-2 certified.



  • I would not expect any kind of modern webiness from the people who bring you the microcalorimeter .


  • Discourse touched me in a no-no place

    @fennec said:

    Could be worse. You could be trying to get your encryption project FIPS 140-2 certified.
    Did Sony try?



  • @Power Troll said:

    ...why is this a text file? Why is there no consistent column numbering? Why are there comments associated with tildes and hyphens?

    Why hasn't this been updated since nineteen fucking ninety?!?!

     

    Maybe there haven't been any counties / state subdivisions (parishes in Louisiana, etc.) added or dropped since then?

    What's wrong with plain text as a data format?  It's much more portable than some proprietary thing like M$ Excel.

     



  • @dtobias said:

    What's wrong with plain text as a data format?  It's much more portable than some proprietary thing like M$ Excel.

    Take a look at the file before you say that. They couldn't possibly have made it harder to parse. (Well, they could have used HL7 format I guess...)



  • Looked at the text file. I smell COBOL... better get an air freshener.



  • if you've been parsing "tables" out of RFCs (like 1345), I guess for you every text file looks "easy to parse". Did not try yet for the given link, though, but had a look and think it's easy to distinguish line types and also to parse the data there. If they are outdated, it's quite useless, though...

    If you don't need the plain text in between, just if you see a line consisting of tildes, skip all lines up to (but excluding) the next line consisting of hyphens. Then if you have a digit in parens at the end of a line after a line of hyphens, it is a state line. A line starting with 3 digits + 2 spaces is a "table" line; split the line before (positive lookahead rulez) every 3 digits followed by 2 spaces, then every part should be (after trimming whitespace) consist of 3 digits 2 spaces and the name. Associate the digits with the state somehow. And so on...

    dump all lines you cannot parse to stdout and have a look at it. if it is all garbage you are done, if not repeat :)



  • @delta534 said:

    According to the census website FIPS has been replaced, here is a link to all the new information http://www.census.gov/geo/www/ansi/ansi.html
     

     

    More specifically, the file he wants is a CSV file located here: http://www.census.gov/geo/www/ansi/national.txt



  • Thanks for the replies, but the deliverable requirement is specifically for FIPS. The data might be the same, but whatever - I've already finished writing a parser. I don't make the rules, I just play by them. But yeah, the problem is do-able - that's not the issue, really. The issue is that this data is in an absolutely terrible format for doing anything programmatically despite coming from a "look ahead" organization.



  • @frits said:

    I would not expect any kind of modern webiness from the people who bring you the microcalorimeter .

    What is unusualy about that???  Measuring energy is still commonly done in calories [the energy to heat one gram of water one degree celcius] rather than in Joules. When dealing with transmission losses, we ae talking very small amounts of energy being lost over a short disance.

     It is also important to remember that although this was updaed in 1993, it is largely late 1950's to early 1960's technology, makng the use of calorie over Joule even more common.



  • @TheCPUWizard said:

    @frits said:
    I would not expect any kind of modern webiness from the people who bring you the microcalorimeter .

    What is unusualy about that???

    My guess is that they forgot the wooden table, but it's hard to tell for sure with all the fading.



  • @TheCPUWizard said:

    @frits said:

    I would not expect any kind of modern webiness from the people who bring you the microcalorimeter .

    What is unusualy about that???  Measuring energy is still commonly done in calories [the energy to heat one gram of water one degree celcius] rather than in Joules. When dealing with transmission losses, we ae talking very small amounts of energy being lost over a short disance.

     It is also important to remember that although this was updaed in 1993, it is largely late 1950's to early 1960's technology, makng the use of calorie over Joule even more common.

    Nothing unsual about that.  I only know it exists because I'm relatively familiar with what NIST's primary business is.  It is not in cutting edge Information Technology or web services.  NIST is well-known for maintaining national measurement standards.  Measurement science is by nature very slow to change.



  • @NIST said:

    The portion of Alaska that is not within 
    organized boroughs is covered by a single "unorganized 
    borough,"

     

    Nice, can anyone tell us if it is really unorganized ? If so let's all move there.

     


Log in to reply