How to generate pseudo-random text using an Awk script

Here're a few variants based on what you want to use as a data source.

The first version assumes you use a dictionary file that contains one word per line:
awk -v lineno=10 -v linelen=80 'BEGIN { i = 0 } { i++; words[i] = $0 } END { line1 = ""; line2 = ""; srand(); n = 0; w = ""; nw = 0; for (j = 0; j < lineno; j++) { while (n <= linelen) { w = words[int(rand() * i) + 1]; nw = length(w); line1 = line2; if (n > 0) { line2 = line2 " " w; n += 1 + nw } else { line2 = w; n += nw } }; print line1; line2 = w; n = nw } }' /usr/share/dict/words

The second version takes pseudo-random bytes from /dev/urandom:
cat /dev/urandom | tr -dc 'a-zA-Z\n' | awk -v lineno=10 -v linelen=80 -v wordmin=5 -v wordmax=10 'BEGIN { j = 0; n = 0; line1 = ""; line2 = ""; wordlen = wordmax - wordmin; srand(); nwlen = wordmin + int(rand() * wordlen); w = ""; nw = 1 } { len = length($0); for (i = 1; i <= len; i++) { w = w substr($0, i, 1); if (nw == nwlen) { line1 = line2; if (n > 0) { line2 = line2 " " w; n += 1 + nw } else { line2 = w; n += nw }; if (n > linelen) { print line1; line2 = w; n = nw; j++; if (j >= lineno) { exit } }; w = ""; nw = 0; nwlen = wordmin + int(rand() * wordlen) }; nw++ } }'

The third version is pure Awk and generates pseudo-random letters from a selected set of characters:
awk -v lineno=10 -v linelen=80 -v wordmin=5 -v wordmax=10 -v chars=abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ 'BEGIN { charslen = length(chars); n = 0; j = 0; line1 = ""; line2 = ""; wordlen = wordmax - wordmin; srand(); nwlen = wordmin + int(rand() * wordlen); nw = 1; while (j < lineno) { w = w substr(chars, int(rand() * charslen) + 1, 1); if (nw == nwlen) { line1 = line2; if (n > 0) { line2 = line2 " " w; n += 1 + nw } else { line2 = w; n += nw }; if (n > linelen) { print line1; line2 = w; n = nw; j++ }; w = ""; nw = 0; nwlen = wordmin + int(rand() * wordlen) }; nw++ } }'

The variables are quite self-explanatory, but I'll describe them anyway.
  • lineno: the number of lines in the output
  • linelen: the maximum length of a line in the output
  • wordmin: the minimum length of a word in the output
  • wordmax: the maximum length of a word in the output
  • chars: the characters to use in the output
Of course there's this really simple commandline that is almost as good (feature-wise) as the second variant and doesn't involve Awk at all (and is most probably a lot faster, but I've not timed it) ... Smile
cat /dev/urandom | tr '0-3' ' ' | tr -dc 'a-zA-Z ' | fold -w 80 | head -n 10

The point of the tr '0-3' ' ' part is to decrease the average word length in the generated output. Smile

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Third Version awk script

Hi,
and let's say I wanted the script not to repeat any character or lets say repeat some and other just use a character a single time?

Nice scripts btw.

Best regards.

Dante.