Why does nobody use extended attributes?



  • For example, a HTTP downloading tool could store the Etag and Last-Modified headers as an xattr and then be able to update files iff they were changed. But nobody does that as far as I can find. Why not?



  • This is one of those "this is why we can't have nice things" thing, because I also wish that it were possible to use xattrs. The problem is they go away if you sneeze at the file in the wrong way, so you can't actually put remotely-important information into them, so no one cares about them, so the situation doesn't improve.

    (They are used for a couple things; the markers on files from the internet and other untrusted sources that make Windows pop up the "are you sure you want to run this?" are stored in alternate data streams. You could probably use them for your suggestion.)

    I blame Unix. 🙂



  • Basically, I would use the xattrs for things that describe the file, not its contents. HTTP caching-related headers are one use (as mentioned in the OP), another would be... uh.

    Wow, it's really hard to think of things that describe files and not file contents.



  • I'll refrain from mentioning Mac Classic.



  • @blakeyrat said:

    Mac Classic

    Too late.



  • @ben_lubar said:

    Wow, it's really hard to think of things that describe files and not file contents.

    My dream would be if history had developed so things like EXIF data, ID3 tags, etc. were stored in extended attributes instead of in the file. As it stands now, you've got a bunch of different formats for said metadata because they have to be attached to several different host formats. And that's if you even can put the metadata in the file; you can't really attach an "author" field to a plain text file or something like that in a way that won't muck a bunch of things up.

    (The way I see it, it's bringing some of the benefits you get with something like JSON or XML. Sure, they're both "metaformats" in the sense that to make complete sense of them you still have to know what the schema is, but there's still a fair bit of things you could do in a generic manner. You can also often attach extra information; for instance, if a program expects a JSON object with fields foo, bar, and baz and you add a blah key, there's a reasonably good chance the program will continue to operate correctly, especially if all it does is read. With xattrs, same deal: you could stick a blah xattr onto an existing file and now you've got some information that you can use, but that won't affect how other programs that don't know about blah interact with the file.)



  • emptyfile.txt.tar (10.0 KB)

    This contains an empty file with the blakeyrat attribute set to gibberish and the gibberish attribute set to blakeyrat.



  • @blakeyrat said:

    I'll refrain from mentioning Mac Classic.

    On days that don't end in 'y'?



  • @EvanED said:

    My dream would be if history had developed so things like EXIF data, ID3 tags, etc. were stored in extended attributes instead of in the file. As it stands now, you've got a bunch of different formats for said metadata because they have to be attached to several different host formats.

    That wouldn't help much - AIUI extended attributes are basically key-value pairs, and you still need to agree on the set of keys. So your OS would still need to interpret the tags in context of the format to enable things like localization.

    Also, you'd need to support them in every single filesystem. Also, you couldn't copy a file by simply reading it and writing it back. Also, all utilities like tar would need to somehow serialize the pairs. Also, a content-aware tag format probably enables more possibilities. Also, as ID3 for example supports things like potentially huge images in its attributes, you'd need to choose between either misrepresenting file size, misrepresenting free disk space, or having them mismatch.

    In short, the costs seem to far outweigh the benefits. Easier to just plop a header.



  • It seems like filesystems in general have a real hard time evolving. The fact that the Linux people automatically reject any new ideas doesn't help.

    Apart from the issue of them not being supported in too many places (which is the main deal breaker) there's also the problem that they open a whole new can of worms for interoperability. They're basically a whole new format attached to an arbitrary file, and it will require some standardization to actually work, or we'd end up with a dozen proprietary tags that do the same thing in slightly different ways, like vendor prefixes in HTML.


    Something else, but partially related, that I don't understand: why haven't we still agreed on container formats?

    The vast majority of file formats internally consist of a bunch of separate binary chunks representing different parts of data (I recommend reading the PNG specification if you haven't) so it seems like having a standard way to put binary blocks together (and a library to read them) would be highly beneficial to everyone.

    OpenDocument for example (which is a bunch of XML documents and arbitrary files together) uses zip as a container. This seems like an ugly solution IMO, I don't know, maybe zip actually works great but it was obviously not designed for this.


  • SockDev

    @anonymous234 said:

    This seems like an ugly solution IMO

    well it's actually kind of nice, particularly when your "binary data blocks" are genrally UTF8 XML documents. zip compresses those like nobody's business and you don't have to worry about it.

    also lets you add arbitrary metadata to the container because it's just a new file you shove in there. 🚎



  • epub uses zip as well - I wrote a tool on that for my own use once. There's probably some generic zip-as-a-container format as well, as epub mandates the first element of the zip to be an uncompressed file with the name mimetype and a specific value.



  • @anonymous234 said:

    it seems like having a standard way to put binary blocks together (and a library to read them) would be highly beneficial to everyone.

    We have that. It's called "the filesystem".



  • @accalia said:

    also lets you add arbitrary metadata to the container because it's just a new file you shove in there. 🚎

    Nothing 🚎 about that, in fact it's exactly why I was posing the idea. You place the metadata next to the data and the filesystem can remain a simple key-value storage. Simple! Not without problems though.

    @flabdablet said:

    We have that. It's called "the filesystem".

    And it would work great if I could upload a whole folder to a website, or send it by email. But the way things are, nothing is designed like that.

    This is actually a valid idea, in line with what I was saying: that files and folders are not that different, and the same object could be treated as either. But this idea is not implemented anywhere (and it's not trivial to do so).



  • Didn't Mac OS classic use something where an application bundle is actually a special kind of directory? Not sure on the details, paging @blakeyrat.



  • @anonymous234 said:

    it would work great if I could upload a whole folder to a website, or send it by email. But the way things are, nothing is designed like that.

    We have that. It's called archive formats.



  • @PleegWat said:

    Didn't Mac OS classic use something where an application bundle is actually a special kind of directory?

    No, that's OS X.



  • I thought OSX moved on to using disk images instead.



  • That's for installer packages. An installed OS X application is just a directory with a name ending in .app and contents in a standard format.

    https://en.wikipedia.org/wiki/Bundle_(OS_X)#OS_X_application_bundles



  • @PleegWat said:

    Didn't Mac OS classic use something where an application bundle is actually a special kind of directory?

    No. That's OS X.

    Mac Classic used a system where every file in the OS had:

    • A 4 (ANSI) character file type
    • A 4 (ANSI) character creator code
    • A Data Fork, unformatted, size only limited by filesystem
    • A Resource Fork where each resource had:
    • A 4 character resource type (matching the file type codes where appropriate-- for example "TEXT")
    • Up to 64k of data, which each resource type specifying the format of the data

    In the original Mac Classic design, a program's code was actually in a "CODE" resource, and the Data Fork was unused in applications. When they moved to PPC, they implemented PPC-native applications by putting their code in the Data Fork.

    Resources formats were determined by the resource type code, therefore if you opened up ResEdit (the Resource Fork editor application that shipped with the OS), and opened up an "ICON" resource, you didn't see just a blob of hex, you actually got a bitmap editor with an image of the icon. Same applied to "TEXT", "WIND" (window layout), "DLG" (dialog layout), etc, etc.

    Note that Mac Classic apps did store their window and dialog layouts in the Resource Fork, meaning they had a "Visual IDE" of sorts way back in 1984.

    Applications could define their own Resource Fork types, as long as they didn't conflict with any of the system default ones.

    The Resource Fork system removed the need for an application to have supporting files so, for the most part, Mac applications were literally just the one file and we pointed-and-laughed at those DOS morons who needed "installers". Suckers.

    TRIVIA: the custom database designed used by Bethesda in their Gamebryo/Creation Engine is virtually identical to the old Mac Classic Resource Fork database.



  • @EvanED said:

    I blame Unix.
    Which has xattrs since... since I don't even remember when?



  • Fantastic for 1980-ies. Doesn't seem Apple is much farther along in 2010-s.



  • @Maciejasjmj said:

    Also, you'd need to support them in every single filesystem.

    No, the OS needs to support the an ad-hoc method of tracking them when the filesystem doesn't. How it does it is not particularly relevant, as long as the interface remains the same.

    @Maciejasjmj said:

    Also, you couldn't copy a file by simply reading it and writing it back.

    You can't do that now anyway: you lose created/modified times and access permissions.

    @Maciejasjmj said:

    Also, as ID3 for example supports things like potentially huge images in its attributes, you'd need to choose between either misrepresenting file size, misrepresenting free disk space, or having them mismatch.

    Or just display metadata size as well as the file size.



  • @Salamander said:

    No, the OS needs to support the an ad-hoc method of tracking them when the filesystem doesn't. How it does it is not particularly relevant, as long as the interface remains the same.

    Okay, so you copy a file from NTFS HDD (which does have the attributes) to a FAT32 memory card (which doesn't). It all looks fine, because the OS transparently writes the data somewhere in its internal cache.

    Then you plug the card into your phone, and suddenly all your music is made by an Unknown Artist, because the phone has no idea about what the OS keeps cached.

    @Salamander said:

    You can't do that now anyway: you lose created/modified times and access permissions.

    Yeah, but those are tied to the physical file more than its content. IOW, nobody really cares if the creation time breaks when you recreate the file, but losing artist information is a bigger deal.

    @Salamander said:

    Or just display metadata size as well as the file size.

    So every application would need to take both into account? Eh, maybe it's doable, but seems more annoying than it's worth.



  • @Maciejasjmj said:

    Okay, so you copy a file from NTFS HDD (which does have the attributes) to a FAT32 memory card (which doesn't). It all looks fine, because the OS transparently writes the data somewhere in its internal cache.

    Then you plug the card into your phone, and suddenly all your music is made by an Unknown Artist, because the phone has no idea about what the OS keeps cached.


    The most likely implementation of such an ad hoc system would be to store the metadata as itself another file and never show it in directory listings/etc through its own api. The only hurdle to that approach is getting other OSes to agree on how those files are identified and the structure they use internally.
    Getting the major OSes to agree on that is still a much better option than having every single file format out there track metadata in its own way.

    @Maciejasjmj said:

    So every application would need to take both into account? Eh, maybe it's doable, but seems more annoying than it's worth.

    Every application? I rarely see any application compare file size to free space. Even rarer are the ones that don't screw it up in some way. Things like Windows Explorer are an outlier in that regard. Everything else else just tries to write files and has some error handling if the OS says you can't write the file.



  • @Salamander said:

    The most likely implementation of such an ad hoc system would be to store the metadata as itself another file and never show it in directory listings/etc through its own api.

    Sooo... the SSDS approach on steroids? You'd need the support on every OS to avoid leaking this ugly mess to the user (didn't copy the randomly named hidden file? tough tits, you just lost all your metadata).

    Besides, if we're going to keep the metadata in files half the time anyway, why not just cat the metadata file and the content file... oh wait, that's what we've got now.

    You can argue that it would at least result in a standardized way of storing metadata, but... that has nothing to do with ADSes. Especially in that it's a decent idea - I wouldn't mind having a cross-format standard for basic key-value headers.
    But ADSes are a rather bad place to keep them in.



  • @Maciejasjmj said:

    Sooo... the SSDS approach on steroids? You'd need the support on every OS to avoid leaking this ugly mess to the user (didn't copy the randomly named hidden file? tough tits, you just lost all your metadata).

    Not every OS; Just every major OS that regularly see filesystems that don't support extended attributes. That's OSX, Windows, and Android. The other OSes would eventually add those features to remain compatible.
    That's still a much better solution than continually coming up with an in-file solution for storing metadata for every file format.
    And you can't just cat metadata onto any old file format because a number of them can and do store their own header information at the end of the file.



  • I still fail to see where any of the various extended-metadata schemes (xattrs, resource forks, what-have-you) have any real advantage over keeping your stuff in a general-purpose archive format. Zip archives are actually very well suited to this use case, because they allow for independent compression methods (including none) for each archived subfile; if you've got metadata you need quick easy access to, you can just store it uncompressed.

    Zip archives also keep their central directory information at the end of the file, so if you've got one privileged data stream that needs to appear exactly as-is from byte 0 (an executable, perhaps?) you just make that the first archived subfile and store it uncompressed. You can also put stuff at the start of a Zip archive that doesn't appear in the central directory at all. It's a nice flexible format.

    The only advantage of purpose-built metadata storage, as far as I can see, is that attributes will typically have less overhead than they would if each were represented as a whole subfile in its own right. In a world where media with <1GB capacity are increasingly hard to find, I don't see that as a particularly compelling advantage.



  • @Salamander said:

    Not every OS; Just every major OS that regularly see filesystems that don't support extended attributes.

    Sooo... every one of them?

    @Salamander said:

    That's still a much better solution than continually coming up with an in-file solution for storing metadata for every file format.

    Yeah, but an even better solution is to standardize the in-file method. You avoid the pitfalls of ADS and xattrs, while also having a well-specified standard implementable in the OS for any file format.

    I fail to see why having a standard would necessitate storing the attributes outside the file.

    @Salamander said:

    And you can't just cat metadata onto any old file format because a number of them can and do store their own header information at the end of the file.

    By God are you missing the point. It doesn't matter if they're at the beginning or at the end or in the middle somewhere - in this little thought experiment, we're trying to change that standard, remember?



  • @Maciejasjmj said:

    Sooo... every one of them?

    I use the word 'major' for a very specific reason.

    @Maciejasjmj said:

    I fail to see why having a standard would necessitate storing the attributes outside the file.
    ...
    By God are you missing the point. It doesn't matter if they're at the beginning or at the end or in the middle somewhere - in this little thought experiment, we're trying to change that standard, remember?

    And in this thought experiment, the part that i have thought about are file formats that you cannot change the contents of, anywhere - start, end, or middle. How exactly do you add metadata to those?



  • @Salamander said:

    file formats that you cannot change the contents of, anywhere - start, end, or middle. How exactly do you add metadata to those?

    Put metadata in other files alongside them, then wrap the whole lot up as a zip archive; name it with your original format's filename extension with an x appended. Done.



  • @Salamander said:

    I use the word 'major' for a very specific reason.

    Sooo... still every one of them that's not some obscure niche, and those which are would have to adapt anyway?

    @Salamander said:

    And in this thought experiment, the part that i have thought about are file formats that you cannot change the contents of, anywhere - start, end, or middle. How exactly do you add metadata to those?

    You need to persuade every single person who has a PNG-handling program to support your way of storing metadata already. Might as well go ahead and change the PNG standard - it would probably be even easier, because that's what you're effectively doing - in a somewhat underhanded way.


  • Discourse touched me in a no-no place

    @Salamander said:

    And in this thought experiment, the part that i have thought about are file formats that you cannot change the contents of, anywhere - start, end, or middle. How exactly do you add metadata to those?

    You have a container format that contains that file plus another file that contains the metadata. Plus some sort of standard metadata that describes how one file is metadata for the other. (The format of the metadata? Not standardised. Of course.)


  • Discourse touched me in a no-no place

    @Maciejasjmj said:

    You need to persuade every single person who has a PNG-handling program to support your way of storing metadata already. Might as well go ahead and change the PNG standard - it would probably be even easier, because that's what you're effectively doing - in a somewhat underhanded way.

    PNG supports that sort of thing already (via non-critical chunks).



  • @Maciejasjmj said:

    AIUI extended attributes are basically key-value pairs, and you still need to agree on the set of keys.
    So first, I already said why I think this isn't a fundamental limitation, which is that you can do useful things with them even if you don't have a set of standardized keys. Second, I suspect that you would get at least a fair degree of agreement after a while on at least some keys; for example, an "author" field could apply to basically any kind of file.

    @Maciejasjmj said:

    Also, you'd need to support them in every single filesystem. Also, you couldn't copy a file by simply reading it and writing it back. Also, all utilities like tar would need to somehow serialize the pairs.
    You're right that it would be hard to do with current APIs and systems. That's... that's why it hasn't happened. 🙂 But I haven't come up with anything that I think is a fundamental stumbling block.

    @Maciejasjmj said:

    Also, a content-aware tag format probably enables more possibilities.
    Examples? I don't think I buy it...

    @Maciejasjmj said:

    Also, as ID3 for example supports things like potentially huge images in its attributes, you'd need to choose between either misrepresenting file size, misrepresenting free disk space, or having them mismatch.
    Why do you call it misrepresenting? ls by default should probably the total size of the file including xattrs. If you want the size of the separate streams, just ask for them.



  • @flabdablet said:

    it seems like having a standard way to put binary blocks together (and a library to read them) would be highly beneficial to everyone.

    @flabdablet said:

    We have that. It's called "the filesystem".

    @flabdablet said:

    We have that. It's called archive formats.
    In case you're not deliberately misunderstanding to be argumentative, the point is to have a single unit that you don't have to do operations with before doing what you want. I want to be able to be given a file and open it without having to extract it first, or give a "file" I'm working on to someone else without having to compress it first. Imagine if images were distributed this way -- you had to download an archive (of image data + format metadata + EXIF data in separate files or something) and then open something in it every time you wanted to look at an image.

    Has other benefits to, like making it much harder to accidentally mess with the contents of part of a "file".



  • @wft said:

    Which has xattrs since... since I don't even remember when?

    Not all that long ago relative to the history of Unix, especially if you start looking at when they were enabled by default.

    But that's to my point anyway. My point is that the design decisions of Unix -- in particular, the "files are just bags of bytes" -- is a large part of why software today "can't" put important info into xattrs, because too many things are too easily lost. As a trivial example, cat foo.txt > bar.txt effectively has no way of copying xattrs with the primary stream. (For more realistic examples, imagine putting some other transformers into the pipeline.) If Unix were designed differently (perhaps with pipelines like are in z/OS), it could; but things throughout the system would have to change to make this a reasonably viable thing to do.


  • Discourse touched me in a no-no place

    @EvanED said:

    I want to be able to be given a file and open it without having to extract it first

    How do you feel about Word documents, or Excel spreadsheets? Those are container formats (ZIP archives) with custom extensions on. The applications that work with them understand the container format so they hide the container-ness, but they're genuinely container formats.

    I've used that a few times to my advantage, to do things like extracting the text content so that I could recover critical data and get working again. All with generic tools. Saved my ass.



  • @EvanED said:

    As a trivial example, cat foo.txt > bar.txt effectively has no way of copying xattrs with the primary stream. (For more realistic examples, imagine putting some other transformers into the pipeline.)

    Determining how to cope with the situation when I explicitly don't want the metadata (and if I do transformations on a pipe, I most certainly don't, although in some rare cases I might, or I might want to transform the metadata as well, you name it) in a world where xattrs are god is left as an exercise for the reader.


  • Discourse touched me in a no-no place

    @EvanED said:

    My point is that the design decisions of Unix -- in particular, the "files are just bags of bytes" -- is a large part of why software today "can't" put important info into xattrs, because too many things are too easily lost.

    The flip-side of that is that keeping things simple like that meant that the OS was simple enough to actually implement and get working reliably. Having to do lots of side-channel metadata handling would have made things much more complex. (And yes, OSes that preceded Unix had this sort of thing. The big thing about Unix in the early days was that you didn't actually need most of that shit and could focus on doing simple things well.)



  • @dkf said:

    How do you feel about Word documents, or Excel spreadsheets? Those are container formats (ZIP archives) with custom extensions on. The applications that work with them understand the container format so they hide the container-ness, but they're genuinely container formats.
    That's fine to an extent, I view it as basically an implementation of what I want. But -- and here's the catch -- most programs don't deal with that format. If I want to attach metadata to a .cpp file, I can't zip up that file with the metadata I want and then say gcc -c mything.zip and have it work.

    That's the difference, and what I think it would be really cool to see. (And... will never happen, because of the difficulties discussed throughout the thread, including my original posts. :-))

    @dkf said:

    The flip-side of that is that keeping things simple like that meant that the OS was simple enough to actually implement and get working reliably.
    I agree that it was probably the right decision at the time and helped get Unix up and running. But... I also think that we're living with the unfortunate consequences of that legacy now.



  • @EvanED said:

    The problem is they go away if you sneeze at the file in the wrong way, so you can't actually put remotely-important information into them

    Because it's stored in the local OS, and nothing cares about collecting that data and replicating it on the target system.



  • @wft said:

    Determining how to cope with the situation when I explicitly don't want the metadata (and if I do transformations on a pipe, I most certainly don't, although in some rare cases I might, or I might want to transform the metadata as well, you name it) in a world where xattrs are god is left as an exercise for the reader.
    Totally feasible. With z/OS pipes, commands can have multiple input and/or multiple output pipes that you can connect up arbitrarily. I think they might be numbered instead of named, so that would have to change, but it's a minor point. For example, the grep equivalent puts all matching lines on one pipe and all non-matching lines on another, so you could direct each to its own file. I don't remember the syntax, but the point is that you can connect up the different pipe ends on the command line. So if you cat file.txt, it could make one output pipe per stream and you could connect up the one you want. Or maybe | just operates on the primary stream only and you specify if you do want to copy streams; I dunno, you'd need to play around with it to determine what's best for usability.



  • @xaade said:

    @EvanED said:
    The problem is they go away if you sneeze at the file in the wrong way, so you can't actually put remotely-important information into them

    Because it's stored in the local OS, and nothing cares about collecting that data and replicating it on the target system.
    Nothing cares because nothing can care.



  • I bet getting the syntax right is one mother of BDSM.



  • @EvanED said:

    Nothing cares because nothing can care.

    I'm pretty sure if you can write to the extended attributes, you can collect them before compressing files or transferring them.

    The other problem is that the OS probably doesn't support an external interface to writing these attributes.

    I can't have computer A tell computer B to add attributes to its file.



  • @dkf said:

    I've used that a few times to my advantage, to do things like extracting the text content so that I could recover critical data and get working again. All with generic tools. Saved my ass.

    That's no small thing, it's actually one of the greatest advantages of having standard formats. Want to poke inside the file? Easy peasy, the tools are probably already installed in your computer. You can't do that with .doc files, or xcf, or psd, or pdf, or png... you get the point.

    If you have read "the unix philosophy" you'll notice half the book is basically "plain text is so great, you can open it anywhere". Well, all the same arguments apply to zipped formats.



  • @EvanED said:

    you can do useful things with them even if you don't have a set of standardized keys

    Then you're missing thumbnails, album arts, and basically reduce the whole thing to a string-string keystore. And a dumb one to boot - without any standard on the keys, all you can do is dump them for viewing and maybe edit them.

    @EvanED said:

    you would get at least a fair degree of agreement

    Across any conceivable content? Is an artist in a MP3 file an "author", or a less-standardized "artist"? Or is the composer an "author", which would make more sense for, say, classical, but be utterly pointless for pop?

    Not saying there's absolutely no way to do that, but... Unlikely.

    @EvanED said:

    Examples?

    Thumbnails, for example. Or chapter markers. Dunno really, that's why I said "potentially".

    As for total size - I'd expect to be able to read 1000 bytes from a 1000-byte file, and I bet lots of programs would too.

    And even if we totally disregard all those things, I still have yet to see one argument for storing those in ADS/xattrs instead of having them at the top of the file (in any format).



  • @wft said:

    explicitly don't want the metadata

    Just don't copy them (or rewrite as-is if you're copying with transformation). In my idea, the metadata are an explicitly separated header - trivial to skip all the lines until an end-of-header mark.



  • @EvanED said:

    Imagine if images were distributed this way -- you had to download an archive (of image data + format metadata + EXIF data in separate files or something) and then open something in it every time you wanted to look at an image.

    They are distributed this way. JPEG files contain a standard magic-number header that marks them as such (format metadata), plus a chunk of image data, plus a chunk of EXIF data. The process of extracting the various parts is pretty much identical to that required to extract the parts of another format that also happens to identify its chunks the same way Zip does - which would in fact make it a Zip file, thereby enabling the use of standard archiving utilities to manipulate its various parts.

    I don't think there's enough overhead difference between locate-on-disk + read-xattrs and locate-on-disk + open + read-header + read-central-directory + read-whatever-chunks to justify avoiding the use of universally supported archive formats in favor of OS-specific, occasionally completely missing xattrs.


Log in to reply
 

Looks like your connection to What the Daily WTF? was lost, please wait while we try to reconnect.