html - ignore malformed XML with Perl-XML -


i'm using perl command line utility xpath extract data html code follows:

#!/bin/bash echo $html | xpath -q -e "//h2[1]" 

the html malformed causes xpath throw below error:

not well-formed (invalid token) @ line x, column y, byte z: 

i can't fix html since it's provided external source means every time html changed have fix manually again.

i looked xpath man pretty empty: http://www.linuxcertif.com/man/1/xpath.1p/

i wondering whether there way tell xpath ignore malformed html. give idea of how malformed here few lines source code:

<div id="header-background" style="top: 42px; >&nbsp;</div> <---- missing closing " <div id-"page-inner">   <---- - instead of = 

thanks

try out html::treebuilder::xpath uses html parser build document can queried using xpaths. html parser should ok malformed xml.

also see article on html scraping xpath.


Comments

Popular posts from this blog

sql server - python to mssql encoding problem -

java - SNMP4J General Variable Binding Error -

windows - Python Service Installation - "Could not find PythonClass entry" -