I already
documented how to use
tar
and
ssh
to copy a directory tree to another host. This time lets copy a
filtered subset of a directory to another (local) directory.
Lets start in the middle:
find . \( -type l -exec sh -c 't="$(readlink "{}" 2> /dev/null)" && [ -n "$t" -a -e "$t" ]' \; -o -type f \) -size -2097152 -print0 2>> /tmp/error.log | tar -cf - --ignore-failed-read --ignore-command-error --null -T - 2>> /tmp/error.log | tar -xpf - --ignore-zeros -C /mnt/target 2>> /tmp/error.log
The above command will find all regular files and valid symbolic links with a size less than a gigabyte and copy them from the current working directory to
/mnt/target
. It'll create a logfile (
error.log
) of all the entries that it could not read or had any problem with. I've used this particular commandline to create a backup of the contents of a read-only mounted filesystem that got corrupted and contained a few invalid entries (eg. files with huge -several GB- sizes, etc.). Obviously I didn't want to make a copy of "bogus files" (that had a size of several gigabytes) thus I created a list of the probably "good" files that are worth saving. The
readlink
check is there to skip bogus symbolic links as well.
The filesize check is of course far from perfect, but choosing a proper filesize limit for a
find
might get you close enough to distinguish the effectively valid directory entries from the corrupted ones.
The
-print0
switch of
find
and the
--null
switch of
tar
make sure that even the most exotic file and directory names (eg. the ones containing a whitespace character or a newline) are handled properly. The
--ignore-failed-read
switch is quite self-explanatory. The
-p
switch of the second
tar
command makes sure file and directory permissions are preserved.
Note that I do all the heavy lifting to determine "bad" filesystem entries because
tar
is quite sensitive when it comes to invalid files. Eg. if it runs into a "bad" symbolic link, it simply exits (or segfaults) and you'll end up wondering why did it not copy over all of the specified directory tree. To debug issues like this you can create a verbose log for all files it reads by supplying the
-v
and the
--index-file
switches (the latter specifies the filename for the verbose log).
P.S.: the seemingly complex validity check on symbolic links is there because I've experienced "symlinks" (in failing filesystems) that
readlink
verified (and returned a zero exit value meaning the symlink is OK), however the resolved path (printed to stdout by
readlink
) seemed to be an empty string (and I say "seemed" ... because a
test -e $(readlink filepath)
returned zero as well!).
Recent comments
2 years 23 weeks ago
3 years 45 weeks ago
3 years 45 weeks ago
3 years 47 weeks ago
3 years 48 weeks ago
4 years 3 weeks ago
4 years 3 weeks ago
4 years 3 weeks ago
4 years 3 weeks ago
4 years 3 weeks ago