How to grep through (aka. find) binary files looking for a byte sequence (aka. pattern match)

Here's my approach using more or less standard Unix/Linux tools:
find . -type f -exec fgrep -aqs $'\x3c\x3f\x70' '{}' \; -print

If your shell does not support the $'\xHH' notation (eg. standard POSIX shell does not, but bash and zsh do), then you can always fall back to echo's octal representation (although it's a bit uncomfortable since most hexeditors and viewers support the hexadecimal format):
patt="$(/bin/echo '\074\077\160')"
find . -type f -exec fgrep -aqs "$patt" '{}' \; -print

And since echo is quite shell/system dependent as well, you might not have any way to create an arbitrary byte array (to use as an argument or to put into an environment variable). Furthermore the POSIX grep does not support searches in binary files (or to be precise: it searches in lines, thus binary patterns including a newline character are out of question). But on most current systems some combination of the above will work, you just might have to play around a little.

If your grep does not support binary files (the -a switch), but you do have od, then you can try this:
find . -type f -exec sh -c "od -v -t x1 '{}' | awk '{ str=\"\"; for (i=2; i<=NF; i++) str=str \$i \" \"; printf \"%s\", str }' | fgrep -qs '3c 3f 70'" \; -print

Or this:
find . -type f -exec sh -c "od -v -t x1 '{}' | cut -s -d' ' -f2- | tr -d '\012' | tr -s ' ' | fgrep -qs '3c 3f 70'" \; -print

Of course the latter two are a lot slower since they convert the files into the hexadecimal representation of their bytes.

If you don't have od either, you can check if you have hexdump which works similarly.

In the above examples the \x3c\x3f\x70 (or 3c 3f 70) string represents the byte pattern (in hexadecimal form) that we're looking for. Of course, you can adjust this easily to match on more complex byte patterns by using regular expressions instead of fgrep and a fixed string. Please, let me know if you've a more efficient solution to the problem. For me the first one (find+fgrep) seemed to be the fastest, out of the box working solutions.

There're a few utilities (custom ones not found in any linux distro or other OS repository) for finding a specific byte array in a file (or stream), but I was looking for a solution that I can use on almost any unix-like system without installing or compiling anything. Eg. there's bgrep and hexgrep (the latter seems to be more polished).

P.S.: the first and fastest of the above examples was tested on Debian 6.*, Ubuntu 14.* and Mac OS X 10.6.*. But I guess it should work on pretty much any recent (and most older) linux/unix distributions out of the box.

P.S.2: apparently Perl can be used to provide quite compact solutions to binary grepping problems. I had no doubt about this even before I've found the referenced post. Wink It's just that I'm not too much of a Perl scripter. I can count on one hand the number of Perl scripts I've written so far.

Syndicate content