html - ignore malformed XML with Perl-XML -


i'm using perl command line utility xpath extract data html code follows:

#!/bin/bash echo $html | xpath -q -e "//h2[1]" 

the html malformed causes xpath throw below error:

not well-formed (invalid token) @ line x, column y, byte z: 

i can't fix html since it's provided external source means every time html changed have fix manually again.

i looked xpath man pretty empty: http://www.linuxcertif.com/man/1/xpath.1p/

i wondering whether there way tell xpath ignore malformed html. give idea of how malformed here few lines source code:

<div id="header-background" style="top: 42px; >&nbsp;</div> <---- missing closing " <div id-"page-inner">   <---- - instead of = 

thanks

try out html::treebuilder::xpath uses html parser build document can queried using xpaths. html parser should ok malformed xml.

also see article on html scraping xpath.


Comments

Popular posts from this blog

java - SNMP4J General Variable Binding Error -

windows - Python Service Installation - "Could not find PythonClass entry" -

Determine if a XmlNode is empty or null in C#? -