The task is trivial, the solution not. Fortunately there's a Perl module written for exactly what we aim to do. It's called
Email::Find and it can be installed through the
libemail-find-perl package on Debian based systems.
This module implements a function named
find() that finds all RFC822 compliant email addresses in the string parameter and executes a callback function for each address. We only have to do a unique sort on the result.
Here's an example code to do that:
#!/usr/bin/perl
# requires the libemail-find-perl package on Debian/Ubuntu systems
use Email::Find;
my %addresses;
# find email addresses, convert to lowercase and store unique values
while (<>) {
Email::Find->new(sub {
@addresses{lc(shift->address)} = 1;
})->find(\$_);
}
# print to the file the sorted values
print "$_\n" foreach sort keys %addresses;
Save the code into a file (eg.
find_emails.pl), add execute permission on it (
chmod u+x find_emails.pl) and call it either by supplying a file to its standard input or by specifying the input file's name as the first parameter:
./find_emails.pl inputfile.txt
or
cat inputfile.txt | ./find_emails.pl
The script will print out one address per line to the standard output. I've added a conversion to lowercase too, you can skip it by removing the call to the
lc() function.
Comments
E-mail address case
Re: E-mail address case
- 99% of the mail delivery agents handle it case-insensitively anyway
- a lot of users use their email addresses with capitals (eg. if his name is John Smith and email address is john.smith@example.com, then users tend to write it in capitals, like John.Smith@example.com ... however they should not do so)
So I figured that converting the email addresses to lowercase has more pro than contra. I just tested with Gmail and it's handling addresses case-insensitively (I get the mail regarless of the case of the letters in my address).