Sed in Darwin (Leopard) is crippled in many ways

Today I had to face the fact that the sed in Darwin 9.4.0 (Mac OS X Leopard 10.5.4) lacks a lot of functionality.

Eg. you cannot embed backslash-escaped characters in the replacement expression of an s function.
The following will output "abcnef" and not "abc" + newline + "ef":
  echo 'abcdef' | sed 's/d/\n/'

Backslash-escaped sequences are not recognized in bracket expressions either. So you cannot match a tab character with [\t] or a newline with [\n]. Sad This is "normal" behaviour, at least for the Darwin sed, but quite unfortunate and makes life a lot more difficult. Eg. there's no way around this regexp (that should match a single line with the newline preceding it) without support for backslash-escapes in brackets:
  \n[^\n]*

You can use the [:cntrl:] character class as a substitute, but of course it's just a workaround and functionally not equivalent:
  \n[^[:cntrl:]]*

However there're problems even with documented functionality. Eg. this sed does not support extended regular expressions as described in the re_format manpage. But the sed manpage says that extended regexps are described there.
Eg. the following will output an error instead of "Xc" (the regexp is taken from the re_format manpage):
  echo 'chchcc' | sed -Ee 's/[[.ch.]]*c/X/'

So multi-character sequences are not accepted as a collating element and you'll get an error message like "RE error: invalid collating element".

Another weirdness: Darwin's sed is quite sensitive about separating functions/commands with newlines. You'll find that in many cases you simply must put in a newline here and there or your code won't be accepted. Eg. if you create a function-list, the terminating } must be preceded by a newline, a semi-colon won't do it. Also labels are likely to cause you headaches if you don't put them into their own line. In GNU sed you can create nice (gigantic Smile ) one-liners, because a semi-colon can replace a newline virtually everywhere in the script.

PS: most probably the problems with the Mac OS X sed are not specific to Darwin, but came from the FreeBSD roots. Maybe FreeBSD's sed worked always like this. Smile Maybe it's the GNU sed that is "out of the line". Smile Imho the way GNU sed interprets backslashed escape sequences makes it a lot more "user friendly".

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

I agree - damn Apple

Yeah that crippled sed thing was a really khunty thing to to do. Why can't they just keep *nix goodness as is?

Try: echo 'abcdef' | sed

Try:

echo 'abcdef' | sed $'s/d/\\\n/'

See also:

Multi-line pattern matching with sed

http://codesnippets.joyent.com/posts/show/2111

Re: Try: echo 'abcdef' | sed

Thanks for the tip. This is a viable workaround to embed a newline character in a sed regexp. But of course this is just a workaround (a way to add newlines to a string without actually hitting Enter), but sed in Mac OS X still does not interpret backslashed escape sequences.
(Btw. meanwhile I've upgraded to Snow Leopard and sed is still crippled ... at least compared to the GNU sed.)

Mac OS X uses FreeBSD sed,

Mac OS X uses FreeBSD sed, not GNU sed. However, you may get a GNU sed binary for Mac OS X here:

http://rudix.org/packages.html

And there even is minised that recognizes backslash-escaped characters as well:

http://freshmeat.net/projects/minised

echo 'abcdef' | minised 's/d/\n/'

(minised needs to be compiled from source though)

cheers

Re: Mac OS X uses FreeBSD sed

Thanks for your repeated feedback. Smile I already knew that Darwin is based on FreeBSD ... just added this note as a P.S. to the original post a few minutes ago. Smile I've got Fink installed so getting GNU sed takes only a short fink install sed. I wrote the post to call attention to the differences between the sed in Mac OS X (ie. FreeBSD) and the GNU sed and how a lot of things are "broken" in the former. It's not that I couldn't have installed GNU sed on my Mac anytime. I'm just a bit annoyed by the fact that the builtin sed is missing so many features that GNU sed users got used to.

Syndicate content