Show HN: Hardwood 1.0: A Fast, Lightweight Apache Parquet Reader for the JVM

gunnarmorling1 pts0 comments

Hardwood

Skip to content

Initializing search

hardwood-hq/hardwood

Getting Started

Tutorial

How-to Guides

Concepts

Reference

Contributing

Release Notes

Hardwood¶

A lightweight Java reader for the Apache Parquet file format.<br>Available as a Java library and a command-line tool.

Hardwood 1.0 is out!

Hardwood 1.0 is released — read the announcement blog post for the story behind the project and what it can do.

Why Hardwood&para;

Hardwood gives applications Parquet read support without pulling in Hadoop, Avro, or the wider parquet-java dependency tree:

Light-weight — zero transitive dependencies beyond optional compression libraries (Snappy, ZSTD, LZ4, Brotli).

Fast — matches or exceeds parquet-java's read throughput; competitive in native-image builds and short-lived JVMs.

Concurrent — multi-threaded at the core: pages decode in parallel on a shared thread pool, with cross-file prefetching for multi-file reads.

Compatible — reads every file that parquet-java reads, with documented divergences where Hardwood applies stricter semantics (e.g. SQL three-valued notEq).

Embeddable — usable from native CLIs, S3-only pipelines (without hadoop-aws), and Avro / Spark consumers via thin shim modules, including a drop-in parquet-java replacement.

Quick Example&para;

import dev.hardwood.InputFile;<br>import dev.hardwood.reader.ParquetFileReader;<br>import dev.hardwood.reader.RowReader;

try (ParquetFileReader fileReader = ParquetFileReader.open(InputFile.of(path));<br>RowReader rowReader = fileReader.rowReader()) {

while (rowReader.hasNext()) {<br>rowReader.next();

long id = rowReader.getLong("id");<br>String name = rowReader.getString("name");<br>LocalDate birthDate = rowReader.getDate("birth_date");<br>Instant createdAt = rowReader.getTimestamp("created_at");

Ready? Install Hardwood, then read your first file end-to-end.

Prefer to learn by running code? The hardwood-examples repository collects small, self-contained examples — one per concept — that you can clone and run with a single command.

Status and Limitations&para;

Hardwood 1.0 is released and ready for production use.

The Hardwood library supports reading arbitrarily large Parquet files, provided individual column chunks are not larger than 2 GB (see Parquet file layout).<br>The interactive dive TUI currently caps S3 files at 2 GB.

Roadmap&para;

Forward-looking items tracked for post-1.0. None are committed to a specific release.

Finalize ColumnReader API — stabilize the API for columnar access and move it out of "Experimental" state. (#522)

Writer support — write Parquet files in addition to reading; today Hardwood is reader-only. (#9)

Bloom filter predicate pushdown — use per-chunk bloom filters for equality-predicate skipping on high-cardinality columns, where min/max statistics can't help. (#105)

Parquet Modular Encryption — read files encrypted under the Parquet Modular Encryption spec: encrypted footer, per-column keys, AES-GCM and AES-GCM-CTR. (#128)

Apache Arrow interop — ColumnReader output as Arrow FieldVector / VectorSchemaRoot for zero-copy handoff to DuckDB, DataFusion, Pandas-via-JNI, and other Arrow-native consumers. (#153)

Getting help&para;

Questions, ideas, design discussion — GitHub Discussions. The best first stop for "how do I…", "is X possible…", or "what's the right way to…".

Bug reports and feature requests — the GitHub issue tracker. Please check whether a similar issue already exists.

Talks & posts&para;

Hardwood: A New Parser for Apache Parquet — project announcement.

Open Source Friday with Gunnar Morling — GitHub Open Source Friday.

Chasing Efficient Java Development: From 1BRC to Developing Hardwood AI Natively — InfoQ podcast on building Hardwood.

Back to top

hardwood parquet rowreader para java file

Related Articles