html - How to match a keyword on a web page that is NOT within an <a> and its href, using JavaScript? -


i'm searching page find specific keyword. easy enough. added complication don't want match keyword if part of <a> tag.

e.g.

<p>here example content has keyword in it.  want match keyword here but, don't want match  <a href="http://www.keyword.com">keyword</a> here.</p> 

if @ above example content, word 'keyword' appears 4 times. want match first 2 times appears paragraph, not want match when appears part of href , part of <a> content.

so far i've managed use below:

var tester = new regexp("((?!<a.*?>)("+keyword+")(?!</a>))", 'ig'); 

the problem above still matches keyword if part of href.

any ideas? thanks

you can't reliably javascript regexes. it's hard enough .net regex engine 1 of few support infinite-length lookbehind assertions, javascript doesn't know lookbehind assertions @ all, can't see came before text want match.

so should either use dom parser (i'm sure fluent in javascript can suggest practical approach here), or read text, remove <a> tags (which sort of regex, if you're brave type), , search keyword in rest of text.

edit:

well, there dirty hack could use. it's not pretty, , if @ alan moore's comment question, you'll able imagine multitude of ways in regex fail, work on example:

/keyword(?!(?:(?!<a).)*</a)/ 

how "work"?

keyword    # match "keyword" (?!        # if not possible match following regex in text ahead:  (?:       # - match...   (?!<a)   # -- unless it's start of <a> tag...   .        # -- character  )*        # - number of times  </a>      # match closing <a> tag.  )          # end of lookahead assertion. 

this quite cryptic, explanation. is:

  • match "keyword"
  • look ahead there no closing </a> in following text
  • unless opening <a> tag comes first.

so if <a> tags correctly balanced, not nested, not found inside comments or script blocks, might away it.


Comments

Popular posts from this blog

java - SNMP4J General Variable Binding Error -

windows - Python Service Installation - "Could not find PythonClass entry" -

Determine if a XmlNode is empty or null in C#? -