How to generate pseudo-random text using an Awk script

Sat, 2012.07.21 - 11:24 — müzso

Here're a few variants based on what you want to use as a data source.

The first version assumes you use a dictionary file that contains one word per line:

awk -v lineno=10 -v linelen=80 'BEGIN { i = 0 } { i++; words[i] = $0 } END { line1 = ""; line2 = ""; srand(); n = 0; w = ""; nw = 0; for (j = 0; j < lineno; j++) { while (n <= linelen) { w = words[int(rand() * i) + 1]; nw = length(w); line1 = line2; if (n > 0) { line2 = line2 " " w; n += 1 + nw } else { line2 = w; n += nw } }; print line1; line2 = w; n = nw } }' /usr/share/dict/words

The second version takes pseudo-random bytes from /dev/urandom:

cat /dev/urandom | tr -dc 'a-zA-Z\n' | awk -v lineno=10 -v linelen=80 -v wordmin=5 -v wordmax=10 'BEGIN { j = 0; n = 0; line1 = ""; line2 = ""; wordlen = wordmax - wordmin; srand(); nwlen = wordmin + int(rand() * wordlen); w = ""; nw = 1 } { len = length($0); for (i = 1; i <= len; i++) { w = w substr($0, i, 1); if (nw == nwlen) { line1 = line2; if (n > 0) { line2 = line2 " " w; n += 1 + nw } else { line2 = w; n += nw }; if (n > linelen) { print line1; line2 = w; n = nw; j++; if (j >= lineno) { exit } }; w = ""; nw = 0; nwlen = wordmin + int(rand() * wordlen) }; nw++ } }'

The third version is pure Awk and generates pseudo-random letters from a selected set of characters:

awk -v lineno=10 -v linelen=80 -v wordmin=5 -v wordmax=10 -v chars=abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ 'BEGIN { charslen = length(chars); n = 0; j = 0; line1 = ""; line2 = ""; wordlen = wordmax - wordmin; srand(); nwlen = wordmin + int(rand() * wordlen); nw = 1; while (j < lineno) { w = w substr(chars, int(rand() * charslen) + 1, 1); if (nw == nwlen) { line1 = line2; if (n > 0) { line2 = line2 " " w; n += 1 + nw } else { line2 = w; n += nw }; if (n > linelen) { print line1; line2 = w; n = nw; j++ }; w = ""; nw = 0; nwlen = wordmin + int(rand() * wordlen) }; nw++ } }'

The variables are quite self-explanatory, but I'll describe them anyway.

lineno: the number of lines in the output
linelen: the maximum length of a line in the output
wordmin: the minimum length of a word in the output
wordmax: the maximum length of a word in the output
chars: the characters to use in the output

Of course there's this really simple commandline that is almost as good (feature-wise) as the second variant and doesn't involve Awk at all (and is most probably a lot faster, but I've not timed it) ...

cat /dev/urandom | tr '0-3' ' ' | tr -dc 'a-zA-Z ' | fold -w 80 | head -n 10

The point of the tr '0-3' ' ' part is to decrease the average word length in the generated output.

Comments

Fri, 2016.12.09 - 20:00 — Anonymous (not verified)

Third Version awk script

Hi,
and let's say I wanted the script not to repeat any character or lets say repeat some and other just use a character a single time?

Nice scripts btw.

Best regards.

Dante.

monline

Search this site:

Recent links

Recent comments