Show HN: SupXML, modern memory-safe XML parser replacement for libxml2

jrpt1 pts0 comments

SupXML | SupXML<br>Skip to content

SupXML

SupXML<br>A memory-safe, fast, spec-compliant XML library for Rust, with a drop-in C ABI replacement for libxml2.<br>Get started Why SupXML GitHub

Why SupXML

Memory-safe<br>Pure Rust, memory-safe by construction. Roughly 70% of CVEs come<br>from memory-safety bugs, a whole class that Rust rules out at<br>compile time, so they can’t happen in SupXML.

Fast and Efficient<br>~2× faster than libxml2 on full-validation DOM parse<br>(median across 21 fixtures), and ~2.4× faster on the W3C<br>XSD 1.0 test suite. Bumpalo-backed arena DOM.

Full bench numbers →

Spec-compliant<br>Zero failures on the W3C XML Conformance Test Suite: all 2274<br>deterministic cases pass; the other 21 of 2295 are<br>implementation-defined by the spec. 98.9% schema / 98.8%<br>instance on the W3C XSD 1.0 suite (libxml2: 92.9% / 98.3%).

Cross-parser comparison →

Stream big files<br>XmlByteStreamReader pulls from any io::Read through a rolling<br>buffer, processing files larger than memory in bounded RAM. The<br>in-memory XmlBytesReader is a zero-copy SAX reader, median<br>~1.04× faster than quick-xml at the matched-contract comparison.

Drop-in for libxml2<br>Byte-compatible C ABI matches libsupxml2.so. Once a<br>consumer dynamically links libxml2, swapping the load command points<br>it at SupXML.

Per-binding setup →

Full-featured<br>XPath 1.0 + 2.0, XSD 1.0 / 1.1, XSLT 1.0 + 2.0 (3.0 partial),<br>Schematron, Canonical XML / Exc-C14N, HTML5, EXSLT, recovery<br>mode… all in one library.

W3C XML Conformance Test Suite

The W3C XML Conformance Test Suite (revision xmlts20130923) is the<br>canonical test catalog for XML 1.0 parsers. It defines 2295 tests<br>across submissions from James Clark, Sun, IBM, OASIS, and others.<br>2274 of them have a deterministic expected outcome (well-formed /<br>not-well-formed / invalid) and SupXML matches every one with zero<br>failures . The remaining 21 are tagged error in<br>the catalog itself, meaning XML 1.0 explicitly leaves their handling<br>implementation-defined, both accepting and rejecting them satisfies<br>the spec, so whatever SupXML does is conformant by definition.<br>Full breakdown →

How parsers compare on malformed input

For a like-for-like comparison across parsers we use the catalog’s<br>not-wf corpus, files engineered to violate one specific XML 1.0<br>well-formedness rule, so a conforming parser must reject them. (We<br>focus on not-wf because xml-rs and quick-xml don’t load external DTDs,<br>making fair scoring on the valid/invalid corpora difficult.) Score =<br>percentage correctly rejected:

CorpusFilesSupXMLlibxml2xml-rsquick-xmlxmltest (James Clark)20099.0% 97.0%58.5%10.0%Sun Microsystems57100% 98.2%33.3%8.8%IBM (incl. XML 1.1)89094.5% 59.6%42.7%5.2%All vendors 1147 95.6% 68.0% 45.0% 6.2%<br>This table walks every .xml file on disk under a not-wf/ directory,<br>which is a superset of what the official catalog scores (it also<br>includes files the catalog marks error or scopes to a specific XML 1.0<br>edition). That’s why SupXML reads clean on the catalog (2274/2274<br>deterministic) but 94.5% on IBM here because the bench is asking a<br>broader question, and counts implementation-defined fixtures against<br>the parser even though the catalog allows either outcome.

quick-xml’s score is low because it doesn’t check well-formedness at all.<br>It’s a fast tokenizer but unsafe to use in practice.<br>SupXML beats libxml2 by ~27 points overall.

Quick example

use sup_xml::{parse_str, ParseOptions, XPathContext};

let opts = ParseOptions { namespace_aware: true, ..Default::default() };

let doc = parse_str(

"",

&opts,

)?;

let ctx = XPathContext::new(&doc);

assert_eq!(ctx.eval_count("/catalog/book")?, 2);

", &opts,)?;let ctx = XPathContext::new(&doc);assert_eq!(ctx.eval_count("/catalog/book")?, 2);">

[dependencies]

sup-xml = { version = "*", features = ["xsd", "xslt", "html"] }

Feature matrix

FeatureCargo featureEntry pointXML 1.0 parse / serialize(default)parse_str, parse_bytes, serialize_to_stringXPath 1.0 (default) + XPath 2.0 (opt-in)(default)XPathContext, XPathOptions { xpath_2_0: true }HTML5 parsehtmlparse_html_strXSD 1.0 / 1.1 validationxsdsup_xml::xsd::SchemaXSLT 1.0 + 2.0 (3.0 partial)xsltsup_xml::xslt::StylesheetSchematron validationxsltsup_xml::xslt::schematron::SchematronCanonical XML / Exc-C14N(default)canonicalize_to_bytesTyped-struct deserializeserdesup_xml::de::*HTTPS-fetched DTDs / entitiesnetwork-resolverNetworkResolverAsync I/O entry pointstokiosup_xml::async_io::parse_async

supxml catalog memory libxml2 default test

Related Articles