How to make filenames NTFS compatible

Thu, 2012.08.30 - 11:21 — müzso

Let's assume you've a bunch of files (in a directory tree) on a linux/unix system and you'd like to copy them over to a Windows NTFS filesystem. The latter allows a lot less characters in filenames (and directory names), then linux/unix. The following code goes through the entire tree (starting with the current working directory) and removes all invalid characters from directory entries. Note that it relies on a few non-standard extensions (eg. not all find implementations have a -print0 option.

find . -depth -mindepth 1 -print0 | while IFS="" read -r -d "" entry; do if [ -f "${entry}" ]; then b="$(basename "${entry}")"; n="$(echo "${b}" | tr -d '\001-\037/\\:*?"<>|')"; if [ "${b}" != "${n}" ]; then d="$(dirname "${entry}")"; [ -f "${d}/${b}" ] && mv "${entry}" "${d}/${n}"; fi; fi; done

P.S.: I used David's writeup on how to process directory entries correctly and the Wikipedia article on NTFS for the list of valid characters.

P.S.2: Beware that simply removing invalid characters might result in data loss since several filenames can be converted to the same string this way. Eg. both the filename "my test?file.txt" and the filename "my test:file.txt" will be converted to "my testfile.txt" and only one will be kept. If you really need to cover such special cases, you could replace invalid characters with a number (instead of simply removing the invalid characters) and increment this number after each processed file (ie. directory entry). This way you could be sure that no file is lost during the process.

Comments

Mon, 2014.01.06 - 17:33 — rjc (not verified)

A couple of sugestions

Good write-up. I do have a couple of suggestions, however:

1. Please format the above command with newlines so that it is more readable.
2. Please include solutions for removal of initial and trailing spaces as well as trailing dots (".").
3. '-d' option for 'read' is a bashism - only '-r' is defined by POSIX.
4. Using 'mv -i' will prompt before overwriting existing file or directory.

Cheers,

rjc

Mon, 2014.01.06 - 18:09 — müzso

Re: A couple of sugestions

Thanks for your feedback.

I did write it as a one-liner on purpose: this way it's easier to use (less prone to copy&paste errors).
That was not my goal. The script is supposed to remove only non-NTFS compatible characters. But here you're (the added sed search&replace should take care of whitespace at the start and end of filenames and trailing dots as well):

find . -depth -mindepth 1 -print0 | while IFS="" read -r -d "" entry; do if [ -f "${entry}" ]; then b="$(basename "${entry}")" n="$(echo "${b}" | tr -d '\001-\037/\\:*?"<>|' | sed -e 's/^[ \t]\+//g' -e 's/[ \t.]\+$//g')" if [ "${b}" != "${n}" ]; then d="$(dirname "${entry}")" [ -f "${d}/${b}" ] && mv "${entry}" "${d}/${n}" fi fi done
That's why I wrote: "note that it relies on a few non-standard extensions". If you've a viable suggestion on how to avoid bashism, I'd be greatful to read about it.
True. But if you've hundreds of files with non-NTFS compliant filenames, you probably wouldn't want to confirm the renaming of each of them.