python - Programmatically converting/parsing LaTeX code to plain text -
i have couple of code projects in c++/python in latex-format descriptions , labels used generate pdf documentation or graphs made using latex+pstricks. however, have plain text outputs, such html version of documentation (i have code write minimal markup that) , non-tex-enabled plot renderer.
for these eliminate tex markup necessary e.g. representing physical units. includes non-breaking (thin) spaces, \text, \mathrm etc. nice parse down things \frac{#1}{#2} #1/#2 plain text output (and use mathjax html). due system we've got @ moment, need able python, i.e. ideally i'm looking python package, non-python executable can call python , catch output string fine.
i'm aware of similar question on tex stackexchange site, there weren't programmatic solutions that: i've looked @ detex, plastex , pytex, seem bit dead , don't need: programmatic conversion of tex string representative plain text string.
i try writing basic tex parser using e.g. pyparsing, a) might pitfall-laden , appreciated , b) surely has tried before, or knows of way hook tex better result?
update: answers... indeed seem bit of awkward request! can make less general parsing of latex, reason considering parser rather load of regexes in loop want able handle nested macros , multi-arg macros nicely, , brace matching work properly. can e.g. reduce txt-irrelevant macros \text , \mathrm first, , handle txt-relevant ones \frac last... maybe appropriate parentheses! well, can dream... regexes not doing such terrible job.
a word of caution: more difficult write complete parser plain tex might think. tex-level (not latex) \def
command extends tex's syntax. example, \def\foo #1.{{\bf #1}}
expand \foo goo.
goo - notice dot became delimiter foo macro! therefore, if have deal any form of tex, without restrictions on packages may used, not recommended rely on simple parsing. need tex rendering. catdvi use, although not perfect.
Comments
Post a Comment