xml - Extremely slow XSLT transformation in Java -
i try transform xml document using xslt. input have www.wordpress.org xhtml source code, , xslt dummy example retrieving site's title (actually nothing - doesn't change anything).
every single api or library use, transformation takes 2 minutes! if take @ wordpress.org source, notice 183 lines of code. googled due dom tree building. no matter how simple xslt is, 2 minutes - confirms idea it's related dom building, anyway should not take 2 minutes in opinion.
here example code (nothing special):
transformerfactory tfactory = transformerfactory.newinstance(); transformer transformer = null; try { transformer = tfactory.newtransformer( new streamsource("/home/pd/xslt/transf.xslt")); } catch (transformerconfigurationexception e) { e.printstacktrace(); } bytearrayoutputstream outputstream = new bytearrayoutputstream(); system.out.println("start"); try { transformer.transform(new saxsource(new inputsource( new fileinputstream("/home/pd/xslt/wordpress.xml"))), new streamresult(outputstream)); } catch (transformerexception e) { e.printstacktrace(); } catch (ioexception e) { e.printstacktrace(); } system.out.println("stop"); system.out.println(new string(outputstream.tobytearray()));
it's between start , stop java "pauses" 2 minutes. if take @ processor or memory usage, nothing increases. looks jvm stopped...
do have experience in transforming xmls longer 50 (this random number ;)) lines? read xslt needs build dom tree in order work. fast transformation crucial me.
thanks in advance, piotr
does sample html file use namespaces? if so, xml parser may attempting retrieve contents (a schema, perhaps) namespace uris. if each run takes 2 minutes -- it's 1 or more tcp timeouts.
you can verify timing how long takes instantiate inputsource
object (where wordpress xml parsed), line causing delay. after reviewing sample file posted, include declared namespace (xmlns="http://www.w3.org/1999/xhtml"
).
to work around this, can implement own entityresolver
disables url-based resolution. may need use dom -- see documentbuilder
's setentityresolver
method.
here's sample using dom , disabling resolution (note -- untested):
try { documentbuilderfactory dbfactory = documentbuilderfactory.newinstance(); documentbuilder db = dbfactory.newdocumentbuilder(); db.setentityresolver(new entityresolver() { @override public inputsource resolveentity(string publicid, string systemid) throws saxexception, ioexception { return null; // never resolve ids } }); system.out.println("building dom"); document doc = db.parse(new fileinputstream("/home/pd/xslt/wordpress.xml")); bytearrayoutputstream outputstream = new bytearrayoutputstream(); transformerfactory tfactory = transformerfactory.newinstance(); transformer transformer = tfactory.newtransformer( new streamsource("/home/pd/xslt/transf.xslt")); system.out.println("running transform"); transformer.transform( new domsource(doc.getdocumentelement()), new streamresult(outputstream)); system.out.println("transformed contents below"); system.out.println(outputstream.tostring()); } catch (exception e) { e.printstacktrace(); }
if want use sax, have use saxsource
xmlreader
uses custom resolver.
Comments
Post a Comment