LINUX FILESYSTEMS, PART 2 – REMINDER ABOUT INODE, DATA, METADATA, HARDLINK AND SYMLINK

This article aims to give a clear representation of the relation between filenames and inodes, to explain what is an inode and which differences exist between a hard link and a symbolic link.

2.1 Filename as Hardlink to Inode

On ext3 and ext4 (and some other filesystems) a file is stored internally as an inode and a filename is just a pointer to that inode, called a hardlink.
The bytes stored in a file are called the data itself, while the file metadata represents the filesystem information about that file like timestamps, ownerships and permissions (and other low-level properties like the allocation table of blocks). The inode contains the data as well as the metadata. The command ‘stat’ shows you some of the metadata of a file:

echo "1234" >file1 #create a 4 Bytes file


cp -v file1 file2 #copy the file to a new inode
  `file1' -> `file2'
  
  
cp -lv file1 file1h #create a new hardlink to the file
  `file1' -> `file1h'
  
  
cp -sv file1 file1s #create a symlink to the file
  `file1' -> `file1s'
  
  
stat file* #show information about created files
  File: `file1'
  Size: 5               Blocks: 8          IO Block: 4096   regular file
  Device: 802h/2050d      Inode: 18129       Links: 2
  Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
  Access: 2013-07-19 18:29:57.000000000 +0200
  Modify: 2013-07-19 18:29:57.000000000 +0200
  Change: 2013-07-19 18:29:57.000000000 +0200
  Birth: -
  File: `file1h'
  Size: 5               Blocks: 8          IO Block: 4096   regular file
  Device: 802h/2050d      Inode: 18129       Links: 2
  Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
  Access: 2013-07-19 18:29:57.000000000 +0200
  Modify: 2013-07-19 18:29:57.000000000 +0200
  Change: 2013-07-19 18:29:57.000000000 +0200
  Birth: -
  File: `file1s' -> `file1'
  Size: 5               Blocks: 0          IO Block: 4096   symbolic link
  Device: 802h/2050d      Inode: 18131       Links: 1
  Access: (0777/lrwxrwxrwx)  Uid: (    0/    root)   Gid: (    0/    root)
  Access: 2013-07-19 18:29:57.000000000 +0200
  Modify: 2013-07-19 18:29:57.000000000 +0200
  Change: 2013-07-19 18:29:57.000000000 +0200
  Birth: -
  File: `file2'
  Size: 5               Blocks: 8          IO Block: 4096   regular file
  Device: 802h/2050d      Inode: 18130       Links: 1
  Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
  Access: 2013-07-19 18:29:57.000000000 +0200
  Modify: 2013-07-19 18:29:57.000000000 +0200
  Change: 2013-07-19 18:29:57.000000000 +0200
  Birth: -

As you can see, ‘cp -lv file1 file1h’ is just creating a new hardlink on the same inode 18129, file1 and file1h are two filenames but actually a single file with a single set of timestamps, ownerships and permissions (cf. File Ownerships, Permissions and Timestamps).
‘cp -v file1 file2′ on the other side is creating a hardlink to another inode and is then duplicating all data blocks, i.e. a new file.
And ‘cp -sv file1 file1s’ is creating a symlink pointing to the ‘file1′ filename.

To detect inodes with multiple hardlinks you can use for example ‘fdupes -Hr /bin’:

#find duplicate hardlinks (with an 'awk' command to keep only duplicate lines):
fdupes -Hrq /bin/ | xargs -l ls -i | awk '{L[$1][++I[$1]]=$0};END{for(i in I)if(I[i]>1)for(l in L[i])print L[i][l]}'
  233900 /bin/bzip2
  233900 /bin/bunzip2
  233900 /bin/bzcat
  229391 /bin/uncompress
  229391 /bin/gunzip
  229394 /bin/nisdomainname
  229394 /bin/ypdomainname
  229394 /bin/dnsdomainname
  229394 /bin/domainname

You can also use ‘find’:

find /bin/ -inum $(ls -i /bin/bzip2 | awk '{print $1}')
  /bin/bzip2
  /bin/bunzip2
  /bin/bzcat

  
find /bin/ -samefile /bin/bzip2
  /bin/bzip2
  /bin/bunzip2
  /bin/bzcat
  
  
#find duplicate hardlinks:
  find /bin/ -xdev -ls 2>/dev/null |   awk '{
    i = $1; sub(/[^/]*\.?\//, "./")
    inum[i] = inum[i] ? inum[i] SUBSEP $0 : $0
  }
  END {
    for (I in inum) {
      if ((n = split(inum[I], files, SUBSEP)) > 1) {
        print "hardlinks to inode",I":"
        for (i = 1; i <= n; i++)
          print files[i]
        }
      }
    }'
  hardlinks to inode 229394:
  ./bin/ypdomainname
  ./bin/dnsdomainname
  ./bin/domainname
  ./bin/nisdomainname
  hardlinks to inode 233900:
  ./bin/bzip2
  ./bin/bunzip2
  ./bin/bzcat
  hardlinks to inode 229391:
  ./bin/uncompress
  ./bin/gunzip

  
#find symlinks:
find /bin/ -type l -ls
  70855    0 lrwxrwxrwx   1 root     root            4 Dec 30  2012 /bin/rbash -> bash
  229429    0 lrwxrwxrwx   1 root     root           20 May 25  2012 /bin/mt -> /etc/alternatives/mt
  234771    0 lrwxrwxrwx   1 root     root           24 Jul 17 09:58 /bin/netcat -> /etc/alternatives/netcat
  234837    0 lrwxrwxrwx   1 root     root            6 Jul 17 13:50 /bin/bzcmp -> bzdiff
  234984    0 lrwxrwxrwx   1 root     root            6 Jul 17 13:56 /bin/open -> openvt
  234253    0 lrwxrwxrwx   1 root     root            4 Jul 17 13:58 /bin/lsmod -> kmod
  229406    0 lrwxrwxrwx   1 root     root           14 Jul 17 09:50 /bin/pidof -> /sbin/killall5
  233711    0 lrwxrwxrwx   1 root     root            8 Jun 10  2012 /bin/lessfile -> lesspipe
  234843    0 lrwxrwxrwx   1 root     root            6 Jul 17 13:50 /bin/bzless -> bzmore
  234839    0 lrwxrwxrwx   1 root     root            6 Jul 17 13:50 /bin/bzegrep -> bzgrep
  558132    0 lrwxrwxrwx   1 root     root            4 Jun 22  2012 /bin/rnano -> nano
  234769    0 lrwxrwxrwx   1 root     root           20 Jul 17 09:58 /bin/nc -> /etc/alternatives/nc
  229379    0 lrwxrwxrwx   1 root     root            4 Jul 17 09:50 /bin/sh -> dash
  234841    0 lrwxrwxrwx   1 root     root            6 Jul 17 13:50 /bin/bzfgrep -> bzgrep

2.2 Differences between hardlinks and symlinks

When you delete a file you actually remove the filename from the directory index and remove one link to the inode (reminder: a filename is a pointer/hardlink to an inode).
The inode is only marked as deleted when there is no hardlink left (and when all processes have closed their file descriptors, which count as links, too, see in /proc/$$/fd/).
When you modify the file it affects all filenames that are linked to it.
A hardlink is similar to a symlink (symbolic link, pointer to a filename with a relative or absolute path) but is completely transparent for the applications. For example moving, renaming, or deleting a file does not affect a hardlink pointing to its inode though it breaks a symlink pointing to its filename; several hardlinks to the same inode are indistinguishable. In the above example the new hardlink ‘file1h’ was identical to ‘file1′ and deleting ‘file1′ would not affect ‘file1h’; on the other side it would make the symlink ‘file1s’ invalid.
Though symlinks may cross filesystem, hardlinks cannot point to an inode outside of its filesystem; every filesystem has its own space of inode-id. (cf. http://linuxgazette.net/105/pitcher.html)
Only symlinks can point to directories.

Hardlinks can be very handy to create backups without using new inodes and disk space. (cf. HOWTO – LOCAL AND REMOTE SNAPSHOT BACKUP USING RSYNC WITH HARD LINKS)

Useful commands:

#show duplicates only if also present in folder duplic
fdupes -r folder1/ folder2/ duplic/ | egrep -B1 "^duplic/" | egrep -v "^(--|duplic/)" | while read i; do [ -n "$i" ] && ls -l "$i"; done
 
#remove one duplicate at the end of each set (remove 'echo' to do it)
fdupes -r folder/ | egrep -B1 "^$" | egrep -v "^(--|)$" | while read i; do [ -n "$i" ] && echo rm -v "$i"; done
 
#show duplicates sorted by size and number of occurrences:
fdupes -r1 folder/ >/tmp/fdupes.txt
cat /tmp/fdupes.txt | awk '{print length,$0}' | sort -n | cut -d" " -f2- | while read a; do echo $(du -cbs $a | tail -1; echo "$a" | sed -e 's% %\n%g' | wc -l; echo $a); done | sort -n | tail
 
#remove all occurrences of file1 and its duplicates (remove 'echo' to do it)
fgrep 'folder/path/to/file1' /tmp/fdupes.txt | while read -d " " a; do echo rm -f "$a"; done | less -S
 
#remove empty folders
find folder/ -depth -type d -empty -exec rmdir \{\} \; | less -S
 
 
#create a folder recursive list into a folder.list
(cd /path/to/folder/ && find . -type f -printf "%p %s %T+\n" | sort) >folder.list
 
 
#show recursive size of current folder
for i in *; do echo -n "$i"; find "$i" -xdev -type f -ls | awk 'BEGIN {sum=0}; {sum+=$7}; END {printf ("%.20g\n", sum)}'; done | sort -nk2 | column -t
 
 
#compare folders
sdiff -sdbB <(cd folder1/ && find . -type f -printf "%p %T+ %s\n" | sort) <(cd folder2 && find . -type f -printf "%p %T+ %s\n" | sort) | less -

2.1 Filename as Hardlink to Inode

2.2 Differences between hardlinks and symlinks

Schreiben Sie einen Kommentar Antworten abbrechen