Is there a stupider solution?



  • For me the WTF is not that a sysadmin could do something daft, and turn a relatively simple job into something very complicated by doing it wrong, but that said sysadmin is writing columns about it for a fairly popular tech site (and presumably getting paid for it). That said, it's hard to think of a worse way of doing it than the way he eventually came up with - at least that doesn't involve printing files out, rolling the pages into a cylinder, and transporting them rectally to the new site. I particularly like the way he identifies that batch files are not the way to go, but never thinks that if a 20 year old DOS scriping language isn't up to the job, perhaps he should try one of the modern Windows equivalents. Mind you, all of that's a lot more complicated than using the built-in tool for the job: ntbackup.



  • @davedavenotdavemaybedave said:

    it's hard to think of a worse way of doing it than the way he eventually came up with - at least that doesn't involve printing files out, rolling the pages into a cylinder, and transporting them rectally to the new site.
     

    You forgot the wooden table.



  •  @da Doctah said:

    @davedavenotdavemaybedave said:

    it's hard to think of a worse way of doing it than the way he eventually came up with - at least that doesn't involve printing files out, rolling the pages into a cylinder, and transporting them rectally to the new site.
     

    You forgot the wooden table.

    You want to transport the table with the cylinder?



  • From the article:

    I wanted to give several command-line tools a go as well. XCopy and Robocopy most likely would have been able to handle the file volume but - like Windows Explorer - they are bound by the fact that NTFS can store files with longer names and greater path than CMD can handle. I tried ever more complicated batch files, with various loops in them, in an attempt to deal with the path depth issues. I failed.

    What the hell are they doing that the file paths hit the Windows CMD length limit? Also, I know that in Windows XP at least the Windows Explorer shell has similar limits, so I'm sure he was hitting that too.



  • @da Doctah said:

    @davedavenotdavemaybedave said:

    it's hard to think of a worse way of doing it than the way he eventually came up with - at least that doesn't involve printing files out, rolling the pages into a cylinder, and transporting them rectally to the new site.
     

    You forgot the wooden table.

    Didn't fit in his rectum.



  • There's a 260-char limit for conventional paths in the Windows API, which I imagine is what he was actually running up against. Use UNC format and your limit is 32k characters, in the command prompt or not, IIRC.



  • @Daid said:

    @da Doctah said:

    @davedavenotdavemaybedave said:

    it's hard to think of a worse way of doing it than the way he eventually came up with - at least that doesn't involve printing files out, rolling the pages into a cylinder, and transporting them rectally to the new site.
     

    You forgot the wooden table.

    Wooden fit in his rectum.

    FTFY



  • IIRC, the 260th character is the null terminator.



  • @davedavenotdavemaybedave said:

    all of that's a lot more complicated than using the built-in tool for the job: ntbackup.

    Quite. And you would HOPE that an installation with THAT many files on its servers has a 'real' backup tool/system in place, so it SHOULD be simple to just 'redirect' a selective file restore to the correct destination … shouldn't it?



  • @Tacroy said:

    From the article:

    I wanted to give several command-line tools a go as well. XCopy and
    Robocopy most likely would have been able to handle the file volume but -
    like Windows Explorer - they are bound by the fact that NTFS can store
    files with longer names and greater path than CMD can handle. I tried
    ever more complicated batch files, with various loops in them, in an
    attempt to deal with the path depth issues. I failed.

    What the hell are they doing that the file paths hit the Windows CMD length limit? Also, I know that in Windows XP at least the Windows Explorer shell has similar limits, so I'm sure he was hitting that too.

    Considering RichCopy (which he said worked for this) is basically a GUI version of RoboCopy, I think he's just full of shit. From the article, it looks like he didn't even bother to try either XCOPY or RoboCopy, which is stupid because they're built-in to Windows and would have solved this problem without trying every single copy utility on Earth.



  • @Cad Delworth said:

    @davedavenotdavemaybedave said:
    all of that's a lot more complicated than using the built-in tool for the job: ntbackup.

    Quite. And you would HOPE that an installation with THAT many files on its servers has a 'real' backup tool/system in place, so it SHOULD be simple to just 'redirect' a selective file restore to the correct destination … shouldn't it?

    On second thoughts, perhaps we should be impressed. He's not only created a completely needless problem that he can look good by, and get paid for, solving, but then managed to get paid to write about it as well.


  • I really really just want to know what the 60 million files were and why they were files at all and not something else.



  • I always thought that copy operations werde IO bound, not CPU bound. So why is it that RichCopy (or whatever it's called) speeds up the copy process by using multithreading?



  • @Juifeng said:

    I always thought that copy operations werde IO bound, not CPU bound. So why is it that RichCopy (or whatever it's called) speeds up the copy process by using multithreading?

    For small files (which the 60 million probably were), checking metadata, seeing if the filename already exists, and other non-copy "stuff" takes more time than actually copying the file.



  • @superjer said:

    I really really just want to know what the 60 million files were and why they were files at all and not something else.

    Probably has a mis-configured roaming profile setup, and he's copying 500 people's browser caches.



  • @hoodaticus said:

    IIRC, the 260th character is the null terminator.

    No, the 257th character is '.'.  Then 258, 259, and 260 are the "+3" characters.  WTF did someone decide that they still needed "+3" characters when they were going to have over 200 normal filename characters?

    I believe there's also the issue that the API apparently gives the same limit for both filenames and path length, so it's fairly easy for programs that want to refer to things by absolute paths to have problems with deep paths.

    Of course, I'm not a WIndows programmer, as I keep telling them at work, so it's possible that issue is something crazy in Cygwin, rather than in Windows.  But it looks like a Windows thing to me.  (Under most unix systems I've worked, files are limited to 256 characters, and paths are limited to 1024 characters.  Of course, this still conceivably lets one make a deep path that cannot be referenced by a full path, but it's much harder to do it by accident.  Everyone I know who's tried to do this by accident has gotten their knuckles rapped by the sysadmin long before they got that far, because they broke locate or mlocate (its update process dies if it encounters any path that is not identical in the first (length - 255) characters to the prior path, after sorting.)  Of course, by that, I mean someone who did something that would've resulted in that outcome, eventually, had it not broken for other reasons.)



  • No, there are *many* stupider solutions:

    • He could have *rebooted* into Linux to copy the files.
    • He could have written his own file copy program for Windows.
    • He could have written his own file copy program for Linux, and run *that* in a VM.
    • He could've outsourced the problem to a third world country.
    • He could've hired an illegal immigrant to manually copy the files.

    I could probably come up with many more, but I prefer thinking of *better* solutions, rather than worse ones.

    For example, he could've installed Cygwin, and used the cp from that.  It's possible rsync might have worked also, but some aspects of how it works suggests to me it might use full relative paths, rather than chdiring into source and target directories to do its work.  (I do not know how to chdir into two separate directories myself, but I've seen programs written for Linux that claim to do it.  I assume that Windows, as a fully competitive alternative OS, also has this capability.)



  • @davedavenotdavemaybedave said:

    http://www.theregister.co.uk/2010/09/24/sysadmin_file_tools/ For me the WTF is not that a sysadmin could do something daft, and turn a relatively simple job into something very complicated by doing it wrong, but that said sysadmin is writing columns about it for a fairly popular tech site (and presumably getting paid for it). That said, it's hard to think of a worse way of doing it than the way he eventually came up with - at least that doesn't involve printing files out, rolling the pages into a cylinder, and transporting them rectally to the new site. I particularly like the way he identifies that batch files are not the way to go, but never thinks that if a 20 year old DOS scriping language isn't up to the job, perhaps he should try one of the modern Windows equivalents. Mind you, all of that's a lot more complicated than using the built-in tool for the job: ntbackup.

    We have a server with 7 million 50k or smaller files on it (200GB, so it's not that big).  It used to take over a day to back it up with NetBackup.  We had to resort to backing up the whole volume as a binary image to get it down to a reasonable time.  So, I don't think ntbackup would have been such a great idea.


  • @blakeyrat said:

    @Juifeng said:
    I always thought that copy operations werde IO bound, not CPU bound. So why is it that RichCopy (or whatever it's called) speeds up the copy process by using multithreading?

    For small files (which the 60 million probably were), checking metadata, seeing if the filename already exists, and other non-copy "stuff" takes more time than actually copying the file.


    Also, on smarter drives and/or OSs, the system can re-order pending disk I/O operations so they complete faster (minimizing seeking back and forth). It can't do that if there's only ever one pending disk operation though.



  • @Jaime said:

    We have a server with 7 million 50k or smaller files on it (200GB, so it's not that big).  It used to take over a day to back it up with NetBackup.  We had to resort to backing up the whole volume as a binary image to get it down to a reasonable time.  So, I don't think ntbackup would have been such a great idea.
    You'd expect it to take all day, whatever you use - for suitable values of 'all day', anyway. But I agree, I wouldn't actually have used NTBackup. It seems to me that the simplest solution is to get around the restriction on path length using UNC paths.



  • @tgape said:

    (I do not know how to chdir into two separate directories myself, but I've seen programs written for Linux that claim to do it.  I assume that Windows, as a fully competitive alternative OS, also has this capability.)

    You can't chdir into two directories at once. However, you can do the following:

    1. chdir to path1
    2. open file1
    3. chdir to path2
    4. open file2

    You now have two files open, both of which were opened using relative filenames in different directories.



  • @blakeyrat said:

    @Juifeng said:
    I always thought that copy operations werde IO bound, not CPU bound. So why is it that RichCopy (or whatever it's called) speeds up the copy process by using multithreading?

    For small files (which the 60 million probably were), checking metadata, seeing if the filename already exists, and other non-copy "stuff" takes more time than actually copying the file.

    "For small files, IO, IO, and other non-copy IO takes more time than actually doing one type of IO"?



  • @Iago said:

    "For small files, IO, IO, and other non-copy IO takes more time than actually doing one type of IO"?
     

    You can't just pithily replace all those things with IO like you did. For small files, anything non-IO takes more time than a fast IO stream of the file.

    It means that doing stuff to 1,000 1kb files commonly takes far longer than a single 1MB file. Do you disagree?



  • @Iago said:

    @blakeyrat said:
    @Juifeng said:
    I always thought that copy operations werde IO bound, not CPU bound. So why is it that RichCopy (or whatever it's called) speeds up the copy process by using multithreading?

    For small files (which the 60 million probably were), checking metadata, seeing if the filename already exists, and other non-copy "stuff" takes more time than actually copying the file.

    "For small files, IO, IO, and other non-copy IO takes more time than actually doing one type of IO"?

    Well, if you're going to be that way, then every single thing the computer does, including shoving an image on your screen and following your mouse with a cursor, is IO. So, yes: IO.

    My computer also IOed the IO with the IO when it IOed IO IO IO IO IO IO IO. If you're defining the term that way, that's *all computers ever do*.



  • And on that farm there was a duck...E-IO-IO



  • @Sutherlands said:

    And on that farm there was a duck...E-IO-IO

    Are you referring to EIE-I/O?


Log in to reply