Show HN: Hardwood 1.0: A Fast, Lightweight Apache Parquet Reader for the JVM

Hardwood

Initializing search

hardwood-hq/hardwood

Getting Started

Tutorial

How-to Guides

Concepts

Reference

Contributing

Release Notes

Hardwood¶

A lightweight Java reader for the Apache Parquet file format. Available as a Java library and a command-line tool.

Hardwood 1.0 is out!

Hardwood 1.0 is released — read the announcement blog post for the story behind the project and what it can do.

Why Hardwood¶

Hardwood gives applications Parquet read support without pulling in Hadoop, Avro, or the wider parquet-java dependency tree:

Light-weight — zero transitive dependencies beyond optional compression libraries (Snappy, ZSTD, LZ4, Brotli).

Fast — matches or exceeds parquet-java's read throughput; competitive in native-image builds and short-lived JVMs.

Concurrent — multi-threaded at the core: pages decode in parallel on a shared thread pool, with cross-file prefetching for multi-file reads.

Compatible — reads every file that parquet-java reads, with documented divergences where Hardwood applies stricter semantics (e.g. SQL three-valued notEq).

Embeddable — usable from native CLIs, S3-only pipelines (without hadoop-aws), and Avro / Spark consumers via thin shim modules, including a drop-in parquet-java replacement.

Quick Example¶

import dev.hardwood.InputFile; import dev.hardwood.reader.ParquetFileReader; import dev.hardwood.reader.RowReader;

try (ParquetFileReader fileReader = ParquetFileReader.open(InputFile.of(path)); RowReader rowReader = fileReader.rowReader()) {

while (rowReader.hasNext()) { rowReader.next();

long id = rowReader.getLong("id"); String name = rowReader.getString("name"); LocalDate birthDate = rowReader.getDate("birth_date"); Instant createdAt = rowReader.getTimestamp("created_at");

Ready? Install Hardwood, then read your first file end-to-end.

Prefer to learn by running code? The hardwood-examples repository collects small, self-contained examples — one per concept — that you can clone and run with a single command.

Status and Limitations¶

Hardwood 1.0 is released and ready for production use.

The Hardwood library supports reading arbitrarily large Parquet files, provided individual column chunks are not larger than 2 GB (see Parquet file layout). The interactive dive TUI currently caps S3 files at 2 GB.

Roadmap¶

Forward-looking items tracked for post-1.0. None are committed to a specific release.

Finalize ColumnReader API — stabilize the API for columnar access and move it out of "Experimental" state. (#522)

Writer support — write Parquet files in addition to reading; today Hardwood is reader-only. (#9)

Bloom filter predicate pushdown — use per-chunk bloom filters for equality-predicate skipping on high-cardinality columns, where min/max statistics can't help. (#105)

Parquet Modular Encryption — read files encrypted under the Parquet Modular Encryption spec: encrypted footer, per-column keys, AES-GCM and AES-GCM-CTR. (#128)

Apache Arrow interop — ColumnReader output as Arrow FieldVector / VectorSchemaRoot for zero-copy handoff to DuckDB, DataFusion, Pandas-via-JNI, and other Arrow-native consumers. (#153)

Getting help¶

Questions, ideas, design discussion — GitHub Discussions. The best first stop for "how do I…", "is X possible…", or "what's the right way to…".

Bug reports and feature requests — the GitHub issue tracker. Please check whether a similar issue already exists.

Talks & posts¶

Hardwood: A New Parser for Apache Parquet — project announcement.

Open Source Friday with Gunnar Morling — GitHub Open Source Friday.

Chasing Efficient Java Development: From 1BRC to Developing Hardwood AI Natively — InfoQ podcast on building Hardwood.

Show HN: Hardwood 1.0: A Fast, Lightweight Apache Parquet Reader for the JVM

Related Articles

US Government directive to suspend access to Fable 5 and Mythos 5

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI

How to Earn a Billion Dollars