java - JSoup not translating ampersand in links in html -
in jsoup following test case should pass, not.
@test public void shouldprinthrefcorrectly(){ string content= "<li><a href=\"#\">good</a><ul><li><a href=\"article.php?boid=1865&sid=53&mid=1\">" + "boss</a></li><li><a href=\"article.php?boid=186&sid=53&mid=1\">" + "heavent</a></li><li><a href=\"article.php?boid=167&sid=53&mid=1\">" + "hellos</a></li><li><a href=\"article.php?boid=181&sid=53&mid=1\">" + "mr.jackson!</a></li>"; document document = jsoup.parse(content, "http://www.google.co.in/"); elements links = document.select("a[href^=article]"); iterator<element> iterator = links.iterator(); list<string> urls = new arraylist<string>(); while(iterator.hasnext()){ urls.add(iterator.next().attr("href")); } assert.asserttrue(urls.contains("article.php?boid=181&sid=53&mid=1")); }
could of please give me reason why failing?
there 3 problems:
you're asserting there's
bovikatanid
parameter present, while it's calledboid
.the html source using
&
instead of&
in source. technically invalid.jsoup parsing
&mid
|
somehow. should have scanned until;
.
to fix #1, have yourself. fix #2, have report issue serveradmin in question (it's fault, however, since average browser forgiving on this, i'd imagine google doing save bandwidth). fix #3, i've reported an issue jsoup guy see thinks this.
update: see, jonathan (the jsoup guy) has fixed it. it'll there in next release.
Comments
Post a Comment