Linux: du shows different from stat/ls



  • Slightly curious so as why du will report file sizes that differ from ls and stat, it appears that du is reporting how much disk space is actually used by the VM's OS rather than what I set as a fixed size VDI (Virtual Disk Image). I'm using ext3 as the file system. For easier reading I spaced out each file's du/ls report.

    Also du reports more disk space usage than ls does if you look at the snap-shot file.

    [Root@Portable ~]# find .VirtualBox/ -type f -iname "*.vdi" -exec du -h {} \; -exec ls -lh {} \;
    1.5G	.VirtualBox/VDI/Server.vdi
    -rw------- 1 Root root 11G 2008-02-11 08:52 .VirtualBox/VDI/Server.vdi
    
    1.9G	.VirtualBox/VDI/LFS.vdi
    -rw------- 1 Root root 8.1G 2008-01-21 15:25 .VirtualBox/VDI/LFS.vdi
    
    1.5G	.VirtualBox/VDI/Serverxx.vdi
    -rw------- 1 Root root 21G 2008-02-28 08:15 .VirtualBox/VDI/Serverxx.vdi
    
    2.3G	.VirtualBox/VDI/Ubuntu.vdi
    -rw-r--r-- 1 Root root 11G 2008-02-26 13:14 .VirtualBox/VDI/Ubuntu.vdi
    
    2.8G	.VirtualBox/VDI/diablo1.vdi
    -rw------- 1 Root root 11G 2008-01-16 13:38 .VirtualBox/VDI/diablo1.vdi
    
    2.8G	.VirtualBox/VDI/Serveryy.vdi
    -rw------- 1 Root root 21G 2008-02-28 08:15 .VirtualBox/VDI/Serveryy.vdi
    
    36K	.VirtualBox/Machines/LFS/Snapshots/{9ed75212-3273-4a8b-f2a0-2ce10206cab6}.vdi
    -rw------- 1 Root root 33K 2008-01-21 15:25 .VirtualBox/Machines/LFS/Snapshots/{9ed75212-3273-4a8b-f2a0-2ce10206cab6}.vdi
    
    2.5G	.VirtualBox/XPBase.vdi
    -rw------- 1 Root root 11G 2008-02-01 13:51 .VirtualBox/XPBase.vdi
    


  • du shows you the disk useage (how much it uses on disk) while ls shows you the size of the file (how much bytes you can read from it).

    This gives interresting results, expecialy on compressed file systems.



  • Alright, but on a default setup (for ArchLinux, which is uncompressed afaik) why is the disk usage something different than what the file size actually is?



  • The disk images are presumably sparse files, which also have different disk usages vs. readable bytes. As Daid already pointed out, du and ls report size based on those two different properties.

    Edit: This happens even on non-compressed filesystems. Try the command given in the Wikipedia article (dd if=/dev/zero of=sparse-file bs=1 count=1 seek=1M), and then try du and ls on the resulting file. You should see different results there as well.


  • Discourse touched me in a no-no place

    @Lingerance said:

    why is the disk usage something different than what the file size actually is?
    Disk slack? Or is the difference too great?



  • The following was run on an ext3 filesystem. 

    [code]
    [bear:src] 191) dd if=/dev/zero of=sparse-file bs=1 count=1 seek=1M
    1+0 records in
    1+0 records out
    1 byte (1 B) copied, 0.0235373 s, 0.0 kB/s

    [bear:src] 194) dd if=/dev/zero of=full-file bs=1 count=1M
    1048576+0 records in
    1048576+0 records out
    1048576 bytes (1.0 MB) copied, 1.5504 s, 676 kB/s

    [bear:src] 195) ls -l *file
    -rw-r--r-- 1 1048576 2008-02-28 16:13 full-file
    -rw-r--r-- 1 1048577 2008-02-28 16:13 sparse-file

    [bear:src] 196) du *file
    1028 full-file
    8 sparse-file
    [/code]



  • @Lingerance said:

    why is the disk usage something different than what the file size actually is?

    Because 'du' is reporting actual on-disk usage in terms of "blocks used times size of block" A file might be 1200 bytes long, but on a 512byte-block file system, it will be using 3x512=1536 bytes of actual disk space. ls always shows the real size of the file in actual bytes, ignoring blocks totally.

    Some file systems have the capability of splitting those blocks into smalle fragments to lessen the wastage. I know we talked about it extensively in my OS class when we were analysing the BSD FFS system. An fs with 512byte blocks could have the blocks split into a further 4x128byte fragments, in which case our 1200 byte file would be using 2 blocks + 2 fragments, or 2.5blocks (1280 bytes used on-disk). 



  • @MarcB said:

    Because 'du' is reporting actual on-disk usage in terms of "blocks used times size of block" A file might be 1200 bytes long, but on a 512byte-block file system, it will be using 3x512=1536 bytes of actual disk space. ls always shows the real size of the file in actual bytes, ignoring blocks totally.
     

    Windows also makes this distinction in the file properties.  In Explorer, try right-clicking on a file, choosing properties, and you'll see the following fields: Size, Size on Disk.  The former is the actual file size, and the latter is the allocated space on disk (which is different from the actual size due to slack, and if applicable, compression and sparseness.)

        


Log in to reply