Let's assume you've a bunch of files (in a directory tree) on a linux/unix system and you'd like to copy them over to a Windows NTFS filesystem. The latter allows a lot less characters in filenames (and directory names), then linux/unix. The following code goes through the entire tree (starting with the current working directory) and removes all invalid characters from directory entries. Note that it relies on a few non-standard extensions (eg. not all
find
implementations have a
-print0
option.
find . -depth -mindepth 1 -print0 | while IFS="" read -r -d "" entry; do if [ -f "${entry}" ]; then b="$(basename "${entry}")"; n="$(echo "${b}" | tr -d '\001-\037/\\:*?"<>|')"; if [ "${b}" != "${n}" ]; then d="$(dirname "${entry}")"; [ -f "${d}/${b}" ] && mv "${entry}" "${d}/${n}"; fi; fi; done
P.S.: I used
David's writeup on how to process directory entries correctly and the
Wikipedia article on NTFS for the list of valid characters.
P.S.2: Beware that simply removing invalid characters might result in data loss since several filenames can be converted to the same string this way. Eg. both the filename "my test?file.txt" and the filename "my test:file.txt" will be converted to "my testfile.txt" and only one will be kept. If you really need to cover such special cases, you could replace invalid characters with a number (instead of simply removing the invalid characters) and increment this number after each processed file (ie. directory entry). This way you could be sure that no file is lost during the process.
Recent comments
1 year 44 weeks ago
3 years 13 weeks ago
3 years 14 weeks ago
3 years 15 weeks ago
3 years 17 weeks ago
3 years 23 weeks ago
3 years 23 weeks ago
3 years 23 weeks ago
3 years 24 weeks ago
3 years 24 weeks ago