AutoIndex is rather simplistic in its handling of XML:
-
When indexing a document, all block content at the paragraph level gets
collapsed into a single string for matching against the regular expressions
representing each index term. In other words, for the most part, you can
assume that you're indexing plain text when writing regular expressions.
-
Named XML entities for &, ", ', < or > are converted to
their corresponding characters before indexing a section of text. However,
decimal or hex escape sequences are not currently converted.
-
Index terms are assumed to be plain text (whether they originate from the
script file or from scanning source files) and the characters &, ",
< and > will be escaped to & " < and >
respectively.