Doc-to-Atom: Learning to Compile and Compose Memory Atoms

[2606.12400] Doc-to-Atom: Learning to Compile and Compose Memory Atoms

-->

Computer Science > Computation and Language

arXiv:2606.12400 (cs)

[Submitted on 10 Jun 2026]

Title:Doc-to-Atom: Learning to Compile and Compose Memory Atoms

Authors:Xingjian Diao, Wenbo Li, Yashas Malur Saidutta, Avinash Amballa, Lazar Valkov, Srinivas Chappidi View a PDF of the paper titled Doc-to-Atom: Learning to Compile and Compose Memory Atoms, by Xingjian Diao and 4 other authors

View PDF HTML (experimental)

Abstract:Long input sequences are central to document understanding and multi-step reasoning in Large Language Models, yet the quadratic cost of attention makes inference both memory-intensive and slow. Context distillation mitigates this by compressing contextual information into model parameters, and recent work such as Doc-to-LoRA amortizes context distillation into a single forward pass that generates one LoRA adapter per document. However, producing a single monolithic adapter for all queries leads to irrelevant-query interference, limited compositional recall, and poor scalability to long-document reasoning. To address these challenges, we propose Doc-to-Atom (Doc2Atom), a compositional parametric memory framework that decomposes each document into semantically typed knowledge atoms. Each atom is compiled into an independent micro-LoRA adapter and a provenance retrieval key. At inference time, a lightweight query router selects and assembles only the relevant atoms into a query-specific adapter, which is then injected into a frozen base model. The entire system is trained end-to-end through a multi-objective distillation framework. Experiments on six diverse QA benchmarks demonstrate that Doc2Atom outperforms Doc-to-LoRA baselines while reducing the memory cost of document internalization.

Comments: 20 pages

Subjects:

Computation and Language (cs.CL); Information Retrieval (cs.IR)

Cite as: arXiv:2606.12400 [cs.CL]

(or arXiv:2606.12400v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2606.12400

Focus to learn more

arXiv-issued DOI via DataCite (pending registration)

Submission history From: Xingjian Diao [view email] [v1] Wed, 10 Jun 2026 17:58:20 UTC (491 KB)

Full-text links: Access Paper:

View a PDF of the paper titled Doc-to-Atom: Learning to Compile and Compose Memory Atoms, by Xingjian Diao and 4 other authors View PDF HTML (experimental) TeX Source

view license

Current browse context:

cs.CL

next >

new recent | 2026-06

Change to browse by:

cs cs.IR

References & Citations

NASA ADS Google Scholar

Semantic Scholar

export BibTeX citation Loading...

BibTeX formatted citation

Data provided by:

Bookmark

Bibliographic Tools

Bibliographic and Citation Tools

Bibliographic Explorer Toggle

Bibliographic Explorer (What is the Explorer?)

Connected Papers Toggle

Connected Papers (What is Connected Papers?)

Litmaps Toggle

Litmaps (What is Litmaps?)

scite.ai Toggle

scite Smart Citations (What are Smart Citations?)

Code, Data, Media

Code, Data and Media Associated with this Article

alphaXiv Toggle

alphaXiv (What is alphaXiv?)

Links to Code Toggle

CatalyzeX Code Finder for Papers (What is CatalyzeX?)

DagsHub Toggle

DagsHub (What is DagsHub?)

GotitPub Toggle

Gotit.pub (What is GotitPub?)

Huggingface Toggle

Hugging Face (What is Huggingface?)

ScienceCast Toggle

ScienceCast (What is ScienceCast?)

Demos

Replicate Toggle

Replicate (What is Replicate?)

Spaces Toggle

Hugging Face Spaces (What is Spaces?)

Spaces Toggle

TXYZ.AI (What is TXYZ.AI?)

Doc-to-Atom: Learning to Compile and Compose Memory Atoms

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

Apple WWDC 2026 Livestream

Claude Fable 5

It's Not Just X. It's Y

Show HN: GoPeek – open links in live mini browser windows without new tabs