[nycphp-talk] Filtering input to be appended inside email

Mikko Rantalainen mikko.rantalainen at
Tue Sep 13 10:19:18 EDT 2005

Daniel Convissor wrote:
> Hi Michael:
> On Mon, Sep 12, 2005 at 12:41:12PM -0400, Michael Southwell wrote:
>>The point is simply to identify which scripts have sent emails to the 
>>known-bad addresses; those are the vulnerable ones.
> I'm afraid that will lead people into both a false sense of security and 
> using email address blacklists.  Folks should audit their email scripts, 
> period.

I agree. Broken code is broken code. If you aren't sure if your 
email script works correctly, take it offline immediately.

>>There were other problems as well, which I noted in my polished 
>>version.  We need an officially sanctioned version of the function 
>>before we can post anything.
> Agreed.  Here's what I think is a good starting point for discussion...
> <?php
> // untested!!!!
> // MUST do is_set() checks on all of these for first!
> // left out for brevity.
> if (eregi('^[a-z0-9_.=+-]+@([a-z0-9-]+\.)+([a-z]{2,6})$', $_POST['address'])) {
>     $address = $_POST['address'];
> } else {
>     echo 'bad email';
>     exit;

That looks pretty simple but it doesn't allow even nearly all valid 
email addresses.

I'd rather create two functions like

	takes string $input_email and returns RFC 2822 section 3.4
	compatible address or empty string if input cannot be
function getSafeEmail($input_email) { ... return $safe_email; }


	takes string $input_header and encodes it as a single header
	to be used for mailing.
function getSafeHeader($input_header) { ... return $safe_header; }

and I'd put all input through these functions. Like $from = 
$_POST["FROM"] or so.


Of these, the first one is much harder to implement correctly. A 
simple implementation could only accept limited addr-spec format of 
	dot-atom "@" dot-atom
where the dot-atom is defined at 
Note that this is much simpler than full address spec defined in

Note that this "simple" format wouldn't allow all valid email 
addresses but at least it would allow stuff like
mikko.rantalainen+nyphp at
unlike many complex regexes that are meant to filter email addresses.

A simple, untested implementation would look like

function getSafeEmail($input_email)
	$dot_atom = "^a-z0-9!#\$%&'*/=?_`{|}~+-";
	# filter extra characters off
	$safe_email = preg_replace("@[^{$dot_atom}]@gi","",$input_email);

(preg_match("@[{$dot_atom}](\.[{$dot_atom}])*\@[{$dot_atom}](\.[{$dot_atom}])+ at i",$safe_email))
		return $safe_email;
		return ""; # error

For the second function we have two possible ways to make sure that 
$input_header indeed contains exactly one valid header; either 
remove all line feeds from the input or append a space after every 
line feed which makes whole input a single header wrapped to 
multiple lines ( I'll choose 
the latter method for this implementation. Again, this is untested.

function getSafeHeader($input_header)
	# split as defined in
	list($name,$value) = explode(":",$input_header,2);

	# verify header name
	if (!preg_match("@^[".chr(33)."-".chr(126)."]+$@",$name))
		return "";

	# header cannot contain CRLF
	# our implementation strips out CRs, make sure all LFs
	# are safe and reinserts CRs
	$value = preg_replace("@\r@","",trim($value));
	$value = preg_replace("@\n@","\n ",$value);
	$value = preg_replace("@\n@","\r\n",$value);
	$safe_header = $name.": ".$value."\r\n";
	return $safe_header;

Body doesn't need to be handled unless you use HTML mail (shame on 
you), in which case all XSS issues are there waiting.


More information about the talk mailing list