SID playing: Looking stuff up from a flat file has never been easier



  • Probably not front-page material - picking on open source code.

    There's this file format called PSID that is used for Commodore 64 music. The largest collection of music in this format is High Voltage SID Collection. The file format is pretty old now, and even in the newer revisions, it hasn't gained the niceties afforded by some of the newer emulator-music file formats like NSF/SPC/GSF, specifically, information about the lengths of the songs stored in the file.

    HVSC folks solved this by bundling a file called Songlengths.txt that is basically a simple flat text file database of the lengths of tunes. Looking up song lengths couldn't be easier: Calculate MD5 sum for the file, find the line with the MD5 sum in question, and parse the song lengths following the MD5 sum.

    Okay - using MD5 for hash key is a bit excessive, but I can live with that... So let's see.... I take a MD5 sum of any of the files that comes with HVSC... and I can't find a match from file.

    Ah, but I forgot that curious comment from the documentation: "As of HVSC 5.0, a modified fingerprint calculation is required."

    Modified fingerprint calculation? Um, that doesn't sound good - especially when the said "modified fingerprint calculation" isn't explained anywhere in this file.

    But it is explained - in source code.

    While I'm waiting for the headache to pass, let me babble and get this straight: I'm supposed to gulp up the file and parse the file header. Then I'm supposed to take a MD5 sum of the file data plus slightly differently formatted header. And apparently, if you choose to use NTSC timer instead of PAL timer - a player setting, mind you - out comes a completely different MD5 sum!

    The Songlengths.txt maintainers obviously realise that "ChunkOfWorthlessUndecipherableLineNoise = Random song lengths..." is not exactly the most readable format in the world and gives little information for a casual reader of the file. Therefore, in their infinite wisdom (and I'm not sarcastic here or anything, honest), each song length entry is preceded by a completely innocuous comment text that tells the relative path to the SID file in question... just for the benefit of the reader, you know, if you need to look up the song lengths by hand... The player software is obviously expected to ignore all such rubbishy comment lines. I mean, they're comment lines? Surely calculating MD5 sums is faster, more efficient, and more reliable than relying on oft-changing relative file paths? (Okay, now I'm sarcastic.)

    So, I decided that the most efficient and clearly the most straightforward way of reading the file is to just a) search for the song file path, b) read the next line, and c) ignore the MD5 sum and just parse the song lenghts. Breaks the standard, but gets the job done without twisting one's mind!



  • I'm fully with you. Just, can you be sure everyone is so nice to put the path of the file in the comment exactly one line above the entry? What if someone just forgets the comment or puts it, let's say, below the entry?



  • The problem with free software has always been that any idiot can create it. Some of it just plain sucks. Obscure legacy stuff in particular does this a lot. Anybody who would spend considerable effort emulating a C64 is not particularly sane to begin with; it is unreasonable to expect them to produce sane software.

    Incidentally, the reason why these files don't have stored lengths is that the question is not meaningful. A SID file is actually a program. It does not have to have a single length, or even terminate at all.



  • @WWWWolf said:

    And apparently, if you choose to use NTSC timer instead of PAL timer - a player setting, mind you - out comes a completely different MD5 sum!

    It's a player setting in much the same way as 33 vs 45 rpm is a player setting.  Sure, you can choose either one, but it won't sound right.

    (That said, clearly sometimes it's just wrong.  Like Skate or Die.  Rob Hubbard, living and working in the UK, obviously made the song for 50 Hz playback.  Yet, last I checked (a long time ago), the default setting for that song in the HVSC was 60 Hz.  Perhaps it was released like that in the US, and the PAL versions adjusted accordingly, but all the cracked versions I ever saw played it back on a vblank interrupt, at 50 Hz.  Like it should be.)  🙂



  • @asuffield said:

    The problem with free software has always been that any idiot can create it.

    Doesn't this apply equally to proprietary software, as proven by this website? 😉 



  • @asuffield said:

    Incidentally, the reason why these files don't have stored lengths is that the question is not meaningful. A SID file is actually a program. It does not have to have a single length, or even terminate at all.

    Ah, but here's the thing: Many other comparable formats that use exact same approach (file has a chunk of code and data extracted from the game, and player emulates the console hardware) do have song lengths in the headers. Likewise, most of these files don't have length

    The player can either use the song lengths as found in the header or, at the user's request, ignore them completely. My logic goes like this: If such interesting information can be optionally stored in Songlengths.txt, why couldn't it be stored somewhere a bit closer, like in, ah, I don't know, .sid header? 🙂

    @magetoo said:

    It's a player setting in much the same way as 33 vs 45 rpm is a player setting. Sure, you can choose either one, but it won't sound right.

    Yeah, you're right, it's definitely file-dependant, not really user choice. I really should stop posting stuff late at night and actually think what I say sometimes. 🙂



  • @Dark Shikari said:

    @asuffield said:

    The problem with free software has always been that any idiot can create it.

    Doesn't this apply equally to proprietary software, as proven by this website? 😉 

    Not quite. Proprietary software does have a lower bound in order to continue existing, while free software will admit anybody, no matter how insane or retarded.

    Stupidity can be found anywhere, but the very worst examples will always be free.



  • @asuffield said:

    @Dark Shikari said:
    @asuffield said:

    The problem with free software has always been that any idiot can create it.

    Doesn't this apply equally to proprietary software, as proven by this website? 😉 

    Not quite. Proprietary software does have a lower bound in order to continue existing, while free software will admit anybody, no matter how insane or retarded.

    Stupidity can be found anywhere, but the very worst examples will always be free.

    You're assuming that the one of the basic concepts of commerce holds true. That is, that providing a poor quality product or service will drive buyers away.

    From what I've seen, this clearly isn't the case.

    In fact, in some ways there is greater motivation to create shitty proprietary software than shitty free software. If there's an opportunity to make shit and dupe customers into paying money for it, then there's room for the more morally flexible among us to profit from this activity. However, I don't see any rewards for using one's own time to manufacture worthless shit.
     



  • @drinkingbird said:

    You're assuming that the one of the basic concepts of commerce holds true. That is, that providing a poor quality product or service will drive buyers away.

    I am assuming that somewhere there exists a threshold of crappiness, below which customers actually will be driven away.

    Admittedly, I find it difficult to imagine such a product. Electrified keyboards and an interface that flashes pink and yellow may be involved.

    In fact, in some ways there is greater motivation to create shitty proprietary software than shitty free software. If there's an opportunity to make shit and dupe customers into paying money for it, then there's room for the more morally flexible among us to profit from this activity.

    Every true salesman con artist knows that you don't actually need to make a product in order to succeed on this path. CD blanks with printed labels will suffice. Or binary installers that always present an error message about permissions and then exit. If you are willing to deliver something that doesn't actually work at all, then producing software would be a waste of time.



  • @asuffield said:

    @drinkingbird said:

    You're assuming that the one of the basic concepts of commerce holds true. That is, that providing a poor quality product or service will drive buyers away.

    I am assuming that somewhere there exists a threshold of crappiness, below which customers actually will be driven away.

    Admittedly, I find it difficult to imagine such a product. Electrified keyboards and an interface that flashes pink and yellow may be involved.

    In fact, in some ways there is greater motivation to create shitty proprietary software than shitty free software. If there's an opportunity to make shit and dupe customers into paying money for it, then there's room for the more morally flexible among us to profit from this activity.

    Every true salesman con artist knows that you don't actually need to make a product in order to succeed on this path. CD blanks with printed labels will suffice. Or binary installers that always present an error message about permissions and then exit. If you are willing to deliver something that doesn't actually work at all, then producing software would be a waste of time.

    I believe the issue is that the youth of software industry renders the consumer largely unable to jugde what is "good", or even what "works". We all understand that a coffeemaker that takes beans is better than that Senseo junk, and we understand that it's going to cost more. Coffee's been with us for eons. But in the case of software, nobody but the geek seems to be really capable of telling good from bad.


     



  • Sorry for replying to an ancient thread, but I just stumbled into this while examining sidplayfp and libsidplayfp and its Songlengths.txt database support. There's a lot of misinformation in this thread, unfortunately.

    Looking up song lengths couldn't be easier: Calculate MD5 sum for the file, find the line with the MD5 sum in question, and parse the song lengths following the MD5 sum.

    So let's see.... I take a MD5 sum of any of the files that comes with HVSC... and I can't find a match from file.

    Wrong approach. You cannot take the MD5 fingerprint of the entire file for various reasons. First of all, in their regular updates the HVSC maintainers have altered the metadata in the header of .sid files while keeping the actual machine code and data fragment unchanged. Any such update would have invalidated the fingerprint of the file. Additionally, if they altered the speed of a tune via a flag in the file header, that should invalidate the fingerprint. Same if they shuffled the order of multiple songs in a file or if they altered the INIT and PLAY vectors of the machine code to fix a ripped sidtune so that it would initialise correctly and play differently -- yes, sometimes not just seconds but minutes have been missing due to wrong initialisation.

    Ah, but I forgot that curious comment from the documentation: "As of HVSC 5.0, a modified fingerprint calculation is required."

     At that time the database has been experimental. Highly experimental even.

    Modified fingerprint calculation? Um, that doesn't sound good - especially when the said "modified fingerprint calculation" isn't explained anywhere in this file.

    But it is explained - in source code.

    Great! That would have been a perfect opportunity to join and contribute ideas or actual improvements even. It's been an experimental project.

    While I'm waiting for the headache to pass, let me babble and get this straight: I'm supposed to gulp up the file and parse the file header. Then I'm supposed to take a MD5 sum of the file data plus slightly differently formatted header. And apparently, if you choose to use NTSC timer instead of PAL timer - a player setting, mind you - out comes a completely different MD5 sum!

    Of course! Would you have preferred if all player front-ends needed to compensate for changes in PAL vs. NTSC and VBI vs. CIA Timer IRQ? That would have been annoying, even for an experimental database.

    The Songlengths.txt maintainers obviously realise that "ChunkOfWorthlessUndecipherableLineNoise = Random song lengths..." is not exactly the most readable format in the world and gives little information for a casual reader of the file. Therefore, in their infinite wisdom (and I'm not sarcastic here or anything, honest),

    Wow! You're clueless and overly negative here at the same time. More below.

    each song length entry is preceded by a completely innocuous comment text that tells the relative path to the SID file in question... just for the benefit of the reader, you know, if you need to look up the song lengths by hand...

    It's great that the INI file is human-readable,  too. The only alternative for a highly experimental project would have been to provide a full set of tools to read and edit the INI file for manual lookups. Being able to quickly search it manually and take a look at what songlengths (and other details) are stored for a sid file is a great feature. Also don't forget that the original Songlengths.txt file contains diagnostic output because the song-length detector has been experimental.

     

    The player software is obviously expected to ignore all such rubbishy comment lines. I mean, they're comment lines?

    Lines starting with ';' in INI files certainly are comments and are to be ignored. A file may move to a different path any time, but its MD5 fingerprint could stay unchanged.

    Surely calculating MD5 sums is faster, more efficient, and more reliable than relying on oft-changing relative file paths? (Okay, now I'm sarcastic.)

    Clueless once again. SID music fans like to copy files to their private collection. Then all that's left is a sidtune's MD5 fingerprint, because not even the relative file path can be used to identify a file. Ignoring the MD5 fingerprints (the actual key/value pairs in the INI file) and relying on the paths in the comments is just a really bad idea and much too fragile.

     

    My logic goes like this: If such interesting information can be optionally stored in Songlengths.txt, why couldn't it be stored somewhere a bit closer, like in, ah, I don't know, .sid header? 🙂

    Easy to answer. The PSID file format predates this era and is not an extensible header (unlike the SIDPLAY INFOFILE format). Existing players would not have been able to understand a new file format, if it had been published. The later PSID v2NG format may have taken the chance to add fields for song lengths. For quite some time, there has been activity related to developing a successor to PSID (even XML based suggestions), but with no outcome. PSID v2NG and RSID formats are just a compromise.

    @asuffield said:

    Incidentally, the reason why these files don't have stored lengths is that the question is not meaningful. A SID file is actually a program. It does not have to have a single length, or even terminate at all.

     

    True. As such, there is no known "duration" to begin with. It's not defined anywhere. However, while you are correct that many sidtunes contain a program (the music player) that runs forever (or is based on an interrupt handler that is called again and again), its output often stops with silence or with a restart to the beginning (sometimes even by reinitialising the player). Silence and loops is what the song-length detector has tried to determine. Highly experimental, of course. 😉

    @magetoo said:

    (That said, clearly sometimes it's just wrong.  Like Skate or Die.  Rob Hubbard, living and working in the UK, obviously made the song for 50 Hz playback.  Yet, last I checked (a long time ago), the default setting for that song in the HVSC was 60 Hz.  Perhaps it was released like that in the US, and the PAL versions adjusted accordingly, but all the cracked versions I ever saw played it back on a vblank interrupt, at 50 Hz.  Like it should be.)  🙂

    Questionable. The HVSC (with some core members being from the UK and with good contact to Rob Hubbard even) has explicitly corrected the speed of Skate or Die in Update #16, see the included Update files, pointing out that many crackers have played it back at the wrong speed. I would trust the HVSC here.

     

     



  • @misc said:

    Sorry for replying to an ancient thread,
     

    Apparently I wrote an insightful post six years ago and nothing has changed since then.

     

    Now,

    Release the krakens!



  • @misc said:

    Would you have preferred if all player front-ends needed to compensate for changes in PAL vs. NTSC and VBI vs. CIA Timer IRQ? That would have been annoying, even for an experimental database.

    Are you saying that a collection of songs should store the lengths for every possible setting alteration a player might use?
    'Cause that sounds kinda stupid.



  • Some SID music. Oh and that 12 minute song is encoded in like 2.1k worth of SID instructions, if even that many.

    Loved my C-64. Hated the "chompers" at the end of level 2 of Parallax. I only made it through those like 3 times in the years I played it.



  • Whoa, first morbiuswilters, then Gene Wirchenko, now WWWWolf and asuffield - it's like some crazy kind of reuni- oh, wait, it's just another necro. Nevermind...

    @dhromed said:

    Apparently I wrote an insightful post six years ago and nothing has changed since then.


    Around here, one of the largest ice cream producers runs two different brands of ice cream. The products of one brand are sold in cafes and upper-middle class shopping malls, have all sorts sophisticated-sounding names, are backed by an extensive advertising campain, and their distinctly-shaped package is plastered with tantalizing photos of vanilla whirlpools and whole fruits which apparently had just survived a tropical rainstorm. Almost all sentences on the package contain one occurrence of "natural", "traditional recipe" or "carefully selected ingredients". Last time I checked, they cost about €7 per package.



    The other line is sold in lower-class stores, has no advertising at all and has a plain white package out of the cheapest plastic they could find. The only photo on it makes you almost smell the artificial flavours. Price (for the same amount) is €2.

    A few years ago, some magazine actually bothered to check the ingredient listings of both products. Turns out they were exactly the same. This hasn't kept the manufacturer from selling both brands up until now and making a huge profit off both of them.


    So yeah, I fear you already had been wrong six years ago.



  • Are you saying that a collection of songs should store the lengths for every possible setting alteration a player might use?
     

    No. The music ought to be played back as intended by its composer. That's what several composers have requested, btw, and it has lead to updates to sidtune collections. If the composer made a piece of music that plays for 3:02 minutes, it ought not end at 2:31 because it is played back too fast. If the composer made it for 50 Hz (for a PAL VBI handler), it ought not be played at 60 Hz (NTSC VBI) or vice versa, and for example, samples ought not be pitched up either because of user settings. Offering a "force NTSC speed" option has been a mistake. The original PSID header could not describe the required play speed reliably, however.

    Similarly, users who had possessed a real C64 with a MOS-8580 chip would insist on turning on emulation of a MOS-8580 instead of the MOS-6581 the composer had used. Same for the differences in the emulation of filtering. Some chips have had very weak filters, so users would preferably turn off emulation of filters (also when it's far off "the real thing" anyway). That's why the software offers such settings. Even if it reproduced the original 100%, emulators tend towards offering options the original doesn't provide. That doesn't mean the .sid metadata should try to cover all possible settings with regard to playback speed differences. It should describe the needed environment. A duration value stored in the metadata would only match the specified emulation settings.

    For thousands of sidtunes, their duration could be expressed in terms of total number of "frames or PLAY method calls until end of song (= e.g. silence) is reached". That would be a value independent of settings such as NTSC vs. PAL. It would not be suitable for the more complicated sidtunes, which execute a full program (some even written in Basic) that somehow calls the music player without a clearly separated INIT/PLAY interface. Some even use delay-loops instead of interrupt requests.

    For the HVSC team, the primary question has been who will determine and maintain (=update) the many thousands of duration values for sidtunes and their subtunes? Is that feasible without automation? Mind you, some sidtunes contain dozens of short but funny sound effects and jingles. The song-lengths detector was just one attempt at automating this.

     

     



  • @PSWorx said:

    So yeah, I fear you already had been wrong six years ago.
     

    Bad coffee example. I'm still right, though.



  • Just do what everyone does: find a way to embed data anywhere in the file without breaking most players (some place that you're not supposed to read, some checksum that nobody actually checks...) and put your extra headers there.



  • @PSWorx said:

    A few years ago, some magazine actually bothered to check the ingredient listings of both products. Turns out they were exactly the same.
     




  • @spamcourt said:

    Just do what everyone does: find a way to embed data anywhere in the file without breaking most players (some place that you're not supposed to read, some checksum that nobody actually checks...) and put your extra headers there.

    Off-topic.

    In case you refer to the "dataOffset" field in the original PSID header (as introduced with PlaySID): relying on that field and on the numeric "version" field would have been too limiting, since not all players evaluated the version field. Players would not recognize included data they could not use (due to the emulator component not being capable enough), and they could not refuse such files. Therefore developers decided to deliberately break compatibility with existing players (since old players could not play back the sidtunes anyway): http://unusedino.de/ec64/technical/formats/sidplay.html

     

     


Log in to reply
 

Looks like your connection to What the Daily WTF? was lost, please wait while we try to reconnect.