UNIX/WIN wildcards



  • @LaoC said in UNIX/WIN wildcards:

    Or perl -e'unlink <*.html>'

    You were supposed to improve on bash.



  • @blakeyrat said in UNIX/WIN wildcards:

    That slides firmly into "why would you do this?"
    The Windows model is that the application's functional logic is in a library separate from any interfaces. (Both user interfaces, like CLIs or GUIs, and machine interfaces, like Services.)
    If you're ever calling a user interface from another user interface in Windows, you done screwed something up. Just link to the library directly.

    Have things EVER worked out that way for any real non-Microsoft program?

    The linux way may be crappier, but at least it exists in reality.



  • @cartman82 said in UNIX/WIN wildcards:

    Have things EVER worked out that way for any real non-Microsoft program?

    Yes.

    It's true that 95% of software developers are morons who don't know what they're doing. And yet that doesn't change anything I typed to Ben L above.


  • Java Dev

    @blakeyrat said in UNIX/WIN wildcards:

    The Windows model is that the application's functional logic is in a library separate from any interfaces. (Both user interfaces, like CLIs or GUIs, and machine interfaces, like Services.)

    I can't speak for windows specifically, but in general separating things with process boundaries has the advantage of isolating risk. Like if a library is prone to crashing or leaking memory, you can isolate your main application from those effects. Then again I believe windows has more RPC infrastructure?



  • @ben_lubar said in UNIX/WIN wildcards:

    On Windows, you do some string manipulation that every program gets wrong in a different way and then the program's startup code parses the arguments to get the string array back.

    All C and Java programs will get it already split in any OS, I think it's the same with all of the most used languages.


  • Java Dev

    @wharrgarbl said in UNIX/WIN wildcards:

    All C and Java programs will get it already split in any OS

    Even in C, there is some code which runs before main() is called. That code sets this up.


  • Considered Harmful

    @cartman82 said in UNIX/WIN wildcards:

    @LaoC said in UNIX/WIN wildcards:

    Or perl -e'unlink <*.html>'

    You were supposed to improve on bash.

    It fixes exactly the same problem as the Python version with a fraction of the code.


  • Discourse touched me in a no-no place

    @PleegWat said in UNIX/WIN wildcards:

    Even in C, there is some code which runs before main() is called. That code sets this up.

    And the problem is that the code that sets it up varies between programs on Windows. For some, you have to quote things one way, and for others you need a different method (unless you happen to only ever pass the simplest of arguments, such as just words using ASCII alphanumerics). It's a complete PITA.


  • Java Dev

    @dkf Yeah, I think the actual underlying argument in windows is del *.txt while on linux it is del\0a.txt\0b.txt\0c.txt\0\0

    I was just mentioning the fact your programming language exposes positional arguments doesn't mean anything since, even in plain C, what gets passed to your main() function isn't controlled by the OS directly.


  • Discourse touched me in a no-no place

    @PleegWat I believe it is actually a counted list on Linux as you can pass empty arguments just fine, which your encoding would prevent. (Also, the environment is also passed the same sort of way, which is why it is always an inherited copy.)


  • Java Dev

    @dkf Right. I'll admit I never actually looked into it, nor have I seen a need to.



  • @blakeyrat said in UNIX/WIN wildcards:

    You can get 99% there on all of them, but that last 100% is a bitch.

    I definitely agree that all the problems are in the last 100%.



  • @dkf said in UNIX/WIN wildcards:

    And the problem is that the code that sets it up varies between programs on Windows.

    No, it varies between C runtime implementations



  • @wharrgarbl said in UNIX/WIN wildcards:

    No, it varies between C runtime implementations

    Not all programs are written in C. (And indeed, not all programs have a runtime at all, but the number that don't is much smaller than the number that don't have a C runtime.)

    But :pendant:ry aside (and saying that it varies between programs is still correct, after all), what's the difference? Do you make a note of which runtime each of your programs uses, so that you know how to wrap up its command line arguments in a way that it will understand?



  • @Scarlet_Manuka I never had a program parse a command line in a different/funny way with Windows. if it ever affected you, you were doing something wrong.



  • @dkf said in UNIX/WIN wildcards:

    @PleegWat I believe it is actually a counted list on Linux as you can pass empty arguments just fine, which your encoding would prevent. (Also, the environment is also passed the same sort of way, which is why it is always an inherited copy.)

    From how the Go runtime handles it, it appears that:

    • argc and a pointer to the argv array are on the stack when the program starts.
    • each argument is null-terminated, and after the arguments are done, there is a null pointer.
    • after the null pointer, there is another null pointer terminated list of null-terminated strings containing the environment.


  • @PleegWat said in UNIX/WIN wildcards:

    I was just mentioning the fact your programming language exposes positional arguments doesn't mean anything since, even in plain C, what gets passed to your main() function isn't controlled by the OS directly.

    I'm not sure what you mean. I mean, obviously there's some code between telling the OS you want a program to run with a certain command line and your program receiving a split command line, and that code may be in the runtime and not in the process startup code proper, but the OS does establish how they have to be split. It's documented, different runtimes aren't allowed to do whatever they want, and if they differ from the documentation then they'd be :doing_it_wrong:

    The documentation for how to break the command line is here: https://msdn.microsoft.com/en-us/library/windows/desktop/17w5ykft(v=vs.140).aspx

    Can you offer an example of a command line that would be ambiguous enough for different runtimes to split it differently while still adhering to the documented spec?

    It's true that Windows preserves the original command line too, so in that sense programs don't get the array unless they ask for it (which C and C++ programs do by default with the argv parameter). And wildcards and the like have to be handled by the program, you don't get a unified behavior for that. On the other hand, that's less of a problem if you don't spend all your time writing programs that operate on complex command lines arguments, so I don't necessarily see that as a negative.


  • Java Dev

    @ben_lubar From some googling I did this morning, that does seem to be the case at least on the argc/argv front. Somehow yesterday I assumed the OS doesn't set it up with pointers, but guess I was wrong.

    @Kian I'm not going to speculate on how windows approaches this. On linux there is a definite multistep process where the kernel sets up an initial state, calls a _start symbol almost always implemented in assembly, which in turn invokes your main function or equivalent for your language.

    That is of course for a language which compiles into an elf binary. For scripting languages, there is an interpreter or JIT compiler involved which is specified after the initial #! magic symbol in the file. This interpreter will be invoked with the arguments specified on that initial line, plus the name of the file being executed.

    For other binary formats (including PE .exe files run by mono or wine) there is yet another process setup mechanism which I won't even pretend to know how it's configured.

    I would not be surprised if it is legal for the interpreter of a shell script to itself be a shell script.


  • Java Dev

    @PleegWat I just found the following link which looks to be an interesting read on how the linux kernel loads binaries:

    http://www.linuxjournal.com/article/2568



  • @dcon said in UNIX/WIN wildcards:

    @marczellm said in UNIX/WIN wildcards:

    (For the *nix folks: In Windows you can't cd D:\stuff if the current directory is on the C: drive. You need cd /D D:\stuff)

    Sure you can. And the current directory on D: changes. You just happen to stay on C:. (Now I can do somecmd filesonC filesonD)

    Unless you're on Powershell in which case you switch to D: in the process.



  • @wharrgarbl said in UNIX/WIN wildcards:

    @Scarlet_Manuka I never had a program parse a command line in a different/funny way with Windows. if it ever affected you, you were doing something wrong.

    It doesn't affect most people, which is why the situation could develop in the first place. But if you have to do a lot of work with "odd" command line parameters across a bunch of different programs, you'll probably find it.


  • Considered Harmful

    @PleegWat said in UNIX/WIN wildcards:

    I would not be surprised if it is legal for the interpreter of a shell script to itself be a shell script.

    Sure :trollface:

    ~ $ cat yadda.sh 
    #!/bin/sh
    while read l; do
        case "$l" in
            珠*) perl -E "${l#珠}" ;;
            🐍*) python -c "${l#🐍}" ;;
            *) ;;
        esac
    done < "$1"
    ~ $ cat foo.yad 
    #!./yadda.sh
    珠say "Larry "x3;
    🐍for n in range(1,4): print "Eric",
    
    ~ $ ./foo.yad 
    Larry Larry Larry 
    Eric Eric Eric
    


  • @LaoC to explain it further, #! doesn't do anything special for binary files. It just means "put this before argv[0] before choosing the executable to run".

    I wonder what program would crash if you set the interpreter of a script to itself. The shell? The script you're trying to run? The kernel?


  • :belt_onion:

    @ben_lubar I would certainly hope it was the shell. I think it's too low-level to be the program, and I certainly hope it wouldn't be the kernel



  • @ben_lubar

    $ cat test
    #!./test
    bla
    $ ./test
    -bash: ./test: ./test: bad interpreter: Too many levels of symbolic links
    $ tcsh
    > ./test
    ./test: bad interpreter: Too many levels of symbolic links
    >
    

  • BINNED

    @Grunnen
    needs more spanking



  • @ben_lubar said in UNIX/WIN wildcards:

    each argument is null-terminated, and after the arguments are done, there is a null pointer.

    What @dkf said -- how do you distinguish between eight (or 4 or whatever sizeof(pointer) is on your platform) empty arguments and the null pointer? Also, wouldn't the null pointer end up misaligned? (Although, what exactly makes that a pointer and not just some sentinel value?)


  • Java Dev

    @ben_lubar Infinite loop in the kernel, which could be caught by:

    • Loop detection, as @Grunnen's error suggests
    • Recursion limit
    • Maximum argument list length limit exceeded (as each additional interpreter increases the argument list length and that has a cap).

    @cvi said in UNIX/WIN wildcards:

    @ben_lubar said in UNIX/WIN wildcards:

    each argument is null-terminated, and after the arguments are done, there is a null pointer.

    What @dkf said -- how do you distinguish between eight (or 4 or whatever sizeof(pointer) is on your platform) empty arguments and the null pointer? Also, wouldn't the null pointer end up misaligned? (Although, what exactly makes that a pointer and not just some sentinel value?)

    It's an array of pointers to string, not a list of strings terminated with a pointer.



  • @PleegWat said in UNIX/WIN wildcards:

    It's an array of pointers to string, not a list of strings terminated with a pointer.

    Yeah, that makes more sense.

    Filed under: missing morning coffee



  • What I've always found really WTF-y was that on Windows:

    del *.*
    

    removes all files in a directory, even those without a dot in the filename. On unix-y systems this works as expected.



  • @martijntje said in UNIX/WIN wildcards:

    What I've always found really WTF-y was that on Windows:

    del *.*
    

    removes all files in a directory, even those without a dot in the filename. On unix-y systems this works as expected.

    It makes sense on 8.3 filesystems, where the filename is the 8 part and the extension is the 3 part, so del *.* means “delete files with any filename and any extension”. Windows simply inherited this behaviour from MS-DOS, probably for continuity (if anyone even thought about it at all).



  • @PleegWat said in UNIX/WIN wildcards:

    And how common is non-gnu userland anyway? Tiny routers which run busybox? BSD? HP/UX? Does that stuff even still exist?

    It does, and it's annoying. (For ... reasons ... I use FreeBSD at work, for example.)

    See, the FreeBSD userland expects options to come before real arguments. So, type a nice command ls *.txt and forget to include the -l option (or, worse, the -d option because some clown created a directory directoryname.txt and I don't want to see its contents) ... On GNU userland, up-arrow, space, -ld, Enter, and it works. The FreeBSD userland complains that -ld doesn't exist.


  • Considered Harmful

    @Steve_The_Cynic said in UNIX/WIN wildcards:

    It does, and it's annoying. (For ... reasons ... I use FreeBSD at work, for example.)

    +1
    I have this OpenBSD VM for automated tests, and if I have to interact with it it always feels like a flashback to the early 90s.



  • Linux shell expansion is problematic when you have filenames starting with the "-" character.


  • Java Dev

    @wharrgarbl That's not a shell problem that's an option parsing problem. The reason windows does not have this problem is that its option character (/) is illegal in file names.

    The standard linux solution is making sure your filename arguments (including globs) always start with / or ./.



  • @PleegWat said in UNIX/WIN wildcards:

    The standard linux solution is making sure your filename arguments (including globs) always start with / or ./.

    solution != workaround


  • Considered Harmful

    @wharrgarbl said in UNIX/WIN wildcards:

    @PleegWat said in UNIX/WIN wildcards:

    The standard linux solution is making sure your filename arguments (including globs) always start with / or ./.

    solution != workaround

    It's still not specific to shell expansion. Programs that don't use options starting in '-' don't have that problem, but they tend to have a different one because it's a general problem of in-band signaling. Whenever you have a data stream that includes $CHARACTER as an escape that signals "whatever comes after this is special in some way", any occurrence of $CHARACTER in stuff that's not supposed to be special will be a problem.


  • Java Dev

    @LaoC Yeah. It'd be a full solution if it was mandatory for filenames to start with / or ./. However, you have to account for existing tools, so doing it on the calling side is good practice but verifying it is bound to lead to problems.



  • @Gurth said in UNIX/WIN wildcards:

    Windows simply inherited this behaviour from MS-DOS, probably for continuity (if anyone even thought about it at all).

    Yes, they did. Per The Old New Thing (emphasis added):

    But some quirks of the FCB matching algorithm persist into Win32 because they have become idiom.

    For example, if your pattern ends in .*, the .* is ignored. Without this rule, the pattern *.* would match only files that contained a dot, which would break probably 90% of all the batch files on the planet, as well as everybody's muscle memory, since everybody running Windows NT 3.1 grew up in a world where *.* meant all files.

    As another example, a pattern that ends in a dot doesn't actually match files which end in a dot; it matches files with no extension. And a question mark can match zero characters if it comes immediately before a dot.

    There may be other weird Win32 pattern matching quirks, but those are the two that come to mind right away, and they both exist to maintain batch file compatibility with the old 8.3 file pattern matching algorithm.



  • @PleegWat said in UNIX/WIN wildcards:

    And how common is non-gnu userland anyway? Tiny routers which run busybox? BSD? HP/UX? Does that stuff even still exist?

    macOS, that’s BSD-based and so the common commands come from there. Plenty of GNU stuff is there as well, but not the commands you (or at least I) would use for most daily work in the terminal.



  • @PleegWat said in UNIX/WIN wildcards:

    @wharrgarbl That's not a shell problem that's an option parsing problem. The reason windows does not have this problem is that its option character (/) is illegal in file names.

    Sort of. Forward slash is accepted by the file handling APIs (unless you want to use more than MAX_PATH total characters, but that's a separate story), but converted to backslash by the name parsing inside those APIs. And the use of forward slash as the option-introducer in Windows actually harks back to the days of CP/M, which didn't even have directories at first.

    The standard linux solution is making sure your filename arguments (including globs) always start with / or ./.

    Many GNU-style programs recognise the -- ("minus minus") option as an otherwise do-nothing option that says there aren't any more options.



  • @Scarlet_Manuka said in UNIX/WIN wildcards:

    @Gurth said in UNIX/WIN wildcards:

    Windows simply inherited this behaviour from MS-DOS, probably for continuity (if anyone even thought about it at all).

    Yes, they did. Per The Old New Thing (emphasis added):

    But some quirks of the FCB matching algorithm persist into Win32 because they have become idiom.

    For example, if your pattern ends in .*, the .* is ignored. Without this rule, the pattern *.* would match only files that contained a dot, which would break probably 90% of all the batch files on the planet, as well as everybody's muscle memory, since everybody running Windows NT 3.1 grew up in a world where *.* meant all files.

    As another example, a pattern that ends in a dot doesn't actually match files which end in a dot; it matches files with no extension. And a question mark can match zero characters if it comes immediately before a dot.

    There may be other weird Win32 pattern matching quirks, but those are the two that come to mind right away, and they both exist to maintain batch file compatibility with the old 8.3 file pattern matching algorithm.

    What Raymond doesn't mention here is that the said 8.3 file pattern matching algorithm is older than MS-DOS.



  • @Steve_The_Cynic

    64K CP/M Version 2.2 (SIMH ALTAIR 8800, BIOS V1.27, 2 HD, 02-May-2009)
    
    A>era *
    No file
    A>era *.*
    All (Y/N)?
    

    (ERA being CP/M’s erase command, for those who are wondering what it does.)



  • @Gurth said in UNIX/WIN wildcards:

    @Steve_The_Cynic

    64K CP/M Version 2.2 (SIMH ALTAIR 8800, BIOS V1.27, 2 HD, 02-May-2009)
    
    A>era *
    No file
    A>era *.*
    All (Y/N)?
    

    (ERA being CP/M’s erase command, for those who are wondering what it does.)

    That's exactly what I was thinking of.


Log in to reply