NYCPHP Meetup

NYPHP.org

[nycphp-talk] Trapping Errors with simplexml for Not Well-Formed XML

Emmanuel M. Décarie emm at scriptdigital.com
Wed Feb 7 10:21:15 EST 2007


Hello there,

I posted the following on my blog and wanted to check with the crowd  
if I didn't miss anything obvious here.
<http://lettre13.com/2007/02/07/trapping-errors-with-simplexml-for- 
not-well-formed-xml/>

Cheers
-Emmanuel


I discovered the hard way that in PHP5 there are no obvious ways to  
detect if some XML is well-formed, especially if you want to deploy  
on Unix/Windows platform and don’t want to access the shell directly.

Adding to this problem, I discovered also that the DOM and simplexml  
extensions can’t use the PHP5 exception handling to trap the errors  
when the XML is not well-formed. Using simplexml or the DOM  
extensions against not well-formed XML, the errors generated by these  
extensions are not trapped and are displayed immediately.

It’s possible to load with the DOM or the Tidy extensions not well- 
formed XML, and then repair it on the fly. But what if you need to  
detect not well-formed XML and provide a message stating the error?

Fortunately, after some research, I found that you could use the  
libxml functions (PHP 5.1 and over) to test XML well formedness and  
trap XML errors. So, I wiped out this little function called  
get_xml_object (see here (1) for the inspiration) that allow me to  
trap errors when simplexml is used to parse XML. The function is  
quite simple, by default, you provide a path to a XML file. If you  
want to use a string, just add another argument after the first  
parameter (it can’t be anything, but here’s I chose “string”  
for clarity sakes). You can also replace the simplexml extension by  
the DOM extensions if you prefer this extension to parse XML.

The function get_xml_object will return an array that contains two  
keys, errors and xml. In this example, $result=get_xml_object($s,  
"string"), $result is an array. If there are no errors, $result 
['errors'] will be set to null. If everything is ok, $result['xml']  
will contains a simplexml object that you can then manipulate with  
the simplexml extension.

$s = "tag>hello world</tag>";
// $s = "<tag>hello world</tag>";

function get_xml_object ($xml, $xmlFormat=”file”) {

   $xml_object = null;
   $result = array (”errors” => null, “xml” => null);

   libxml_use_internal_errors (true);
   $xmlFormat == “file”  ? $xml_object = simplexml_load_file ($xml)
                         : $xml_object = simplexml_load_string ($xml);

   if (!$xml_object) {
      $errors = libxml_get_errors();
      foreach ($errors as $error) {
          $error_msg = “Error: line: ” . $error->line
                     . “: column: ” . $error->column . “: ”
                     . $error->message . “n”;
      }
      libxml_clear_errors();
      $result[”errors”] = $error_msg;
   } else {
     $result[”xml”] = $xml_object;
   }
   return $result;
}

$result = get_xml_object ($s, “string”);

if ($result[’errors’]) {
   var_dump ($result[’errors’]);
} else {
   var_dump ($result[’xml’]);
}

(1) <http://ca3.php.net/manual/en/function.libxml-use-internal- 
errors.php>


More information about the talk mailing list