The XML parser will check if an XML document is well-formed, and optionally validate it against a DTD. The parser will construct an object tree which can be accessed via a DOM interface or operate serially via a SAX interface.
You may post questions, comments, or bug reports to the XML Forum on the Oracle Technology Network. Oracle customers may also call Oracle Worldwide Support for assistance.
The parser conforms to the following standards:
| license.html | Licensing agreement |
| readme.html | This file |
| bin/ | Standalone parser "xml" |
| demo/ | Example usage of the XML parser |
| doc/ | API documentation |
| include/ | Header files |
| lib/ | XML Parser/XSL Processor & support libraries |
| mesg/ | Error message files |
| libxml9.a | XML Parser/XSL Processor |
| libcore9.a | CORE functions |
| libnls9.a | National Language Support |
| -c | Conformance check only, no validation |
| -e encoding | Specify input file encoding |
| -h | Help - show this usage help |
| -n | Number - DOM traverse and report number of elements |
| -p | Print document and DTD structures after parse |
| -r | Do not ignore <xsl:output> instruction in XSLT processing |
| -x | Exercise SAX interface and print document |
| -v | Version - display parser version then exit |
| -w | Whitespace - preserve all whitespace |
Error message files are provided in the mesg/ subdirectory. The messages files also exist in the $ORACLE_HOME/oracore/mesg directory starting with Oracle 8.1.7. You may set the environment variable ORA_XML_MESG to point to the absolute path of the mesg/ subdirectory although this is not required.
The parser currently supports the following encodings: UTF-8, UTF-16, US-ASCII, ISO-10646-UCS-2, ISO-8859-1, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5, ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-9, EUC-JP, SHIFT_JIS, BIG5, GB2312, KOI8-R, EBCDIC-CP-US, EBCDIC-CP-CA, EBCDIC-CP-NL, EBCDIC-CP-WT, EBCDIC-CP-DK, EBCDIC-CP-NO, EBCDIC-CP-FI, EBCDIC-CP-SE, EBCDIC-CP-IT, EBCDIC-CP-ES, EBCDIC-CP-GB, EBCDIC-CP-FR, EBCDIC-CP-HE, EBCDIC-CP-BE, EBCDIC-CP-CH, EBCDIC-CP-ROECE, EBCDIC-CP-YU, and EBCDIC-CP-IS. In addition, any character set specified in Appendix A, Character Sets, of the Oracle National Language Support Guide may be used.
In order to be able to use these encodings, you must have the ORACLE_HOME environment variable set and pointing to the location of your Oracle installation. In addition, the environment variable ORA_NLS33 must be set to point to the location of the NLS data files. On Unix systems, this is usually $ORACLE_HOME/ocommon/nls/admin/data. On Windows NT, this is usually $ORACLE_HOME/nlsrtl/admin/nlsdata.
The default encoding is UTF-8. It is recommended that you set the default encoding explicitly if using only single byte character sets (such as US-ASCII or any of the ISO-8859 character sets) for performance up to twice as fast as with multibyte character sets, such as UTF-8.
Unless you are sure that your document will not contain any character that does not belong to the encoding of the document entity, it is recommended to specify the output/DOM data encoding explicitly. Otherwise you will get LPX-00217: invalid character when an external entity is encoded in a different encoding, or character references are used. It is recommended to use lpxinitenc() over lpxinit() and specify the output/DOM data encoding explicitly, as you can control what characters you are going to deal with. It ensures that the data you manipulate on the parser API will be in the encoding of your choice.
The following features of the XSLT recommendation are not currently supported but may be available in future releases: extension elements, extension functions, xsl:fallback, and xsl:apply-import.
No changes.
This release contains bug fixes.
Support is now available for all input and output character encodings, including varying width multibyte character sets and fixed width Unicode. This release contains bug fixes.
This release contains bug fixes.
The only change between 9.0.0.0.0 and 2.0.7.0.0 is a change in version numbering.
XSLT improvements: This release further improves upon the processing speed and memory requirements for XSLT. By using -r directive the effect of <xsl:output> instruction and "disable-escape" attributes can be taken in to account while doing XSLT processing.
This is a bugfix release with a few new features.
XSLT features: Support for xsl:output was added. Also, support for the following XSLT-specific additions to the core XPath library was added: document(), element-available(), and function-available().
XSLT performance: Performance and memory usage were greatly improved.
For the XSLT processor, the following bugs were fixed:
| 1225546 | USELESS ERROR MESSAGE NEEDS DETAIL |
| 1361834 | XSL:IMPORT FOR /ORADB IS NOT HANDLED PROPERLY |
| 1355757 | LPX-00002: OUT OF MEMORY FROM XSL:SORT WHEN NO SORT ELEMENTS |
| 1351220 | PARAMETER VALUES ARE GETTING DROPPED |
| 1345602 | PERFORMANCE OF C-BASED XSL PROCESSOR IS NOT ACCEPTABLE |
This is a bugfix release with a few new features.
File parsing: An API (xmlparsefile) has been added to support explicit parsing from a file (as opposed to a URL).
Streams parsing: An API (xmlparsestream) has been added to support parsing from a user-defined stream object.
Encoding information:Two new APIs have been added to provide information about the input document's character encoding: isSingleChar is a boolean flag that specifies whether the encoding is single or multibyte, and getEncoding() returns the actual name of the encoding ("ASCII", "UTF8", etc).
Element by ID: A new API (getElementByID) has been added that will return the element with the given ID.
The following parser bugs have been fixed:
| 1370311 | NEED GETELEMENTBYID()FOR DARWIN DATA MINING DEVELOPMENT |
| 1367903 | XML PARSER: PARSER FAILS ON FILENAME WITH SPACES |
| 1358187 | PARSER WILL FAIL IF FILENAME CONTAINS A BACKSLASH |
| 1354920 | ENHANCEMENTS IN C XML PARSER TO SUPPORT STREAMS |
Patch release that fixes a bug in the getAttrSpecified() function. No other changes from 2.0.4.0.
This is the first production V2 release. This changes in this release were mainly bug fixes.
For the XML parser, the following bugs were fixed:
| 1352943 | XMLPARSE() SOMETIMES CHOKES ON FILENAMES |
| 1302311 | PROBLEM WITH PARAMETER ENTITY PROCESSING |
| 1323674 | INCONSISTENT ERROR HANDLING IN THE C XML PARSER |
| 1328871 | LPXPRINTBUFFER UNCONDITIONALLY PREPENDS XML COMMENT TO OUTPUT |
| 1349962 | USING FREED MEMORY LOCATION CAUSES TLPXVNSA31.DIF |
For the XSLT processor, the following bugs were fixed:
| 1225546 | USELESS ERROR MESSAGE NEEDS DETAIL |
| 1267616 | TLPXST14.DIF: REPLACE DBL_MAX WITH SBIG_ORAMAXVAL IN LPXXP.C:LPXXPSUBSTRING() |
| 1289228 | ERROR CONTEXT REQUIRED FOR DEBUGGING: FILE NAME, LINE#, FUNCTION, ETC |
| 1289214 | XSL:CHOOSE DOESN'T WORK |
| 1298028 | XPATH CONSTRUCT NOT(POSITION()=LAST()) NOT WORKING |
| 1298193 | XPATH FUNCTIONS DON'T PROVIDE IMPLICIT TYPE CONVERSION OF PARAMS |
| 1323665 | C XML PARSER CANNOT SET BASE DIRECTORY OR URI FOR STYLESHEET PARSING |
| 1325452 | SEVERE MEMORY CONSUMPTION / LEAK IN XSLPROCESS |
| 1333693 | CHAINED TRANSFORMS WITH C XSL PROCESSOR DON'T WORK: LPX-00002 |
SAX memory usage: SAX memory usage is now much smaller, and flat for any input size and multiple parses (memory leaks plugged).
XSLT memory usage: XSLT memory usage is improved.
Validation warnings: Validity Constraint (VC) errors have been changed to warnings and do not terminate parsing. For compatibilty with the old behavior (halt on warnings as well as errors), a new flag XML_FLAG_STOP_ON_WARNING (or '-W' to the xml program) has been added.
Performance improvements: Switch to finite automata VC structure validation yields 10% performance gain.
HTTP support: HTTP URIs are now supported; look for FTP in the next release. For other access methods, the user may define their own callbacks with the new xmlaccess() API.
XSLT improvements: Various bugs fixed in the XSLT processor; error messages are improved; xsl:number, xsl:sort, xsl:namespace-alias, xsl:decimal-format, forwards-compatible processing with xsl:version, and literal result element as stylesheet are now available; the following XSLT-specific additions to the core XPath library are now available: current(), format-number(), generate-id(), and system-property().
Bug fixes: Some problems with validation and matching of start and end tags with SAX were fixed (1227096). Also, a bug with parameter entity processing in external entities was fixed (1225219).
Performance improvements: This version of the parser is a major
performance improvement over the last, about two and a half times faster
for UTF-8 parsing and about four times faster for ASCII parsing. Comparison
timing against previous version for parsing (DOM) and validating various
standalone files (SPARC Ultra 1 CPU time):
| File size | Old UTF-8 | New UTF-8 | Speedup | Old ASCII | New ASCII | Speedup |
|---|---|---|---|---|---|---|
| 42K | 180ms | 70ms | 2.6 | 120ms | 40ms | 3.0 |
| 134K | 510ms | 210ms | 2.4 | 450ms | 100ms | 4.5 |
| 247K | 980ms | 400ms | 2.5 | 690ms | 180ms | 3.8 |
| 1M | 2860ms | 1130ms | 2.5 | 1820ms | 380ms | 4.8 |
| 2.7M | 10550ms | 4100ms | 2.6 | 7450ms | 1930ms | 3.9 |
| 10.5M | 42250ms | 16400ms | 2.6 | 29900ms | 7800ms | 3.8 |
Lists, not arrays: Internal parser data structures are now uniformly lists; arrays have been dropped. Therefore, access is now better suited to a firstChild/nextSibling style loop instead of numChildNodes/getChildNode.
DTD parsing:A new API call xmlparsedtd() is added which parses an external DTD directly, without needing an enclosing document. Used mainly by the Class Generator.
Error reporting: Error messages are improved and more specific, with nearly twice as many as before. Error location is now described by a stack of line number/entity pairs, showing the final location of the error and intermediate inclusions (e.g. line X of file, line Y of entity). NOTE: You must use the new error message file (lpxus.msb) provided with this release; the error message file provided with earlier releases is incompatible. See below.
XSL improvements: Various bugs fixed in the XSLT processor; xsl:call-template is now fully supported.
The Oracle XML v2 parser is a beta release and is written in C. The main difference from the Oracle XML v1 parser is the ability to format the XML document according to a stylesheet via an integrated an XSLT processor.