html - How to match a keyword on a web page that is NOT within an <a> and its href, using JavaScript? -
i'm searching page find specific keyword. easy enough. added complication don't want match keyword if part of <a>
tag.
e.g.
<p>here example content has keyword in it. want match keyword here but, don't want match <a href="http://www.keyword.com">keyword</a> here.</p>
if @ above example content, word 'keyword' appears 4 times. want match first 2 times appears paragraph, not want match when appears part of href
, part of <a>
content.
so far i've managed use below:
var tester = new regexp("((?!<a.*?>)("+keyword+")(?!</a>))", 'ig');
the problem above still matches keyword if part of href
.
any ideas? thanks
you can't reliably javascript regexes. it's hard enough .net regex engine 1 of few support infinite-length lookbehind assertions, javascript doesn't know lookbehind assertions @ all, can't see came before text want match.
so should either use dom parser (i'm sure fluent in javascript can suggest practical approach here), or read text, remove <a>
tags (which sort of regex, if you're brave type), , search keyword in rest of text.
edit:
well, there dirty hack could use. it's not pretty, , if @ alan moore's comment question, you'll able imagine multitude of ways in regex fail, work on example:
/keyword(?!(?:(?!<a).)*</a)/
how "work"?
keyword # match "keyword" (?! # if not possible match following regex in text ahead: (?: # - match... (?!<a) # -- unless it's start of <a> tag... . # -- character )* # - number of times </a> # match closing <a> tag. ) # end of lookahead assertion.
this quite cryptic, explanation. is:
- match "keyword"
- look ahead there no closing
</a>
in following text - unless opening
<a>
tag comes first.
so if <a>
tags correctly balanced, not nested, not found inside comments or script blocks, might away it.
Comments
Post a Comment