@laoc said in Big list of software that cannot handle spaces or accents in paths:
In case you can't remember more than five posts back, you said pretty much every file system in existence already supported what you need to have this parallel ID/filename scheme.
That was me, if you're referring to this post. If it wasn't clear, that was more of a quick thought I threw out to see what could come of it, rather than a finalized design spec, so thanks for your feedback!
I wouldn't expect such an implementation to be particularly well-performing, but I suppose with some optimizations (indexing, caching etc) it could reach "usable" levels. Creating a dedicated file system which is actually optimized for this use case would of course be preferable., but more work to get to a first functional prototype.
Concerning
How do things like for f in *.txt *.log; do ... work
First off, this has nothing to do with file names (as others have already pointed out), you want to query by type. If I was to set out to try and actually design such a hypothetical nirvana file system, I'd definitely try to get rid of these bizarre Reverse Hungarian Notation warts at the same time, because the whole point of the exercise is to move file metadata out of the file name to where it actually belongs.
Second off, globbing is IMHO one other very problematic piece of functionality. Apart from the classic "who is responsible for it" Windows-vs-Linux problem, it mixes data and structure, and thus makes it impossible to treat file names as opaque binary blobs. As such it has the exact same issues (plus some more) that paths have and which has triggered this whole discussion.
So probably the goal would be to get something like
for f in glob(cwd, type="text/plain") do ...
which does a look-up in some kind of metadata index which would ideally permit efficient filtering on any type of metadata (names, types, creation/modification dates, permissions etc). cwd
would be an argument passed to the script by the shell, which contains the ID of the working directory.
Yes, this would probably involve some indexing and parallel storing of data to get any kind of production-scale performance, which would probably be hard to get right. But AFAIK so does any kind of efficient lookup in SQL databases.
I'd say the "file names vs. file IDs" debate is essentially the "natural key vs. surrogate key" issue, with many of the same arguments on both sides applying to either discussion.