Decoding OneNote's File Format Secrets

msiemens1 pts0 comments

Decoding OneNote’s File Format Secrets — m-siemens.de

Skip to content

Decoding OneNote’s File Format Secrets<br>May 16, 2026

I hate losing stuff.

I’m not talking about things like keys or wallets (although I don’t particularly enjoy misplacing these either). I mean stuff that has emotional and sentimental value to me. And most of it is digital.

Take, for instance, my first serious programming project: As a teenager, I decided to write a hangman game in Java. I got it I didn’t really understand how classes work and made everything public static just so my two classes could access each other’s data. But it worked and I was proud of it.all sorts of wrong and only got it working with my dad’s help. But in the end, it worked! I proudly shared it with my dad and with a friend of mine who also was into computers. I even uploaded it to Sourceforge!

Then I lost the source code. I don’t know when or why. Maybe I switched to a new computer and didn’t transfer all my data properly. Maybe I managed to screw up my OS once again and didn’t back everything up when wiping the disk for the reinstall. And at some point I also deleted the Sourceforge project. Who would even care about a small silly game like that, right? Turns out, I would, if only to remember how beautifully awful the code was.

This and other occurrences of valuable digital memories being lost taught me to be a little obsessive when it comes to my personal data. I store my stuff on an external RAID that is backed up online. As for data stored in various online services, I have made a habit (or rather a reminder that still refuses to settle into habit) of downloading a copy of my data every month (thanks GDPR!) so it can be backed up properly.

For most of my online data, that’s not very complicated. Most services provide your data as some form of JSON, some hand you a collection of HTML files, and a few have their data stored as CSV.

And then there is Microsoft OneNote.

Don’t get me wrong. I am a heavy OneNote user, with multiple thousands of notes, a subset of which I use almost every day. It’s where I keep my journal, collect quotes from books I read, keep a local copy of articles and blog posts I find interesting. It’s where my wife and I planned our wedding, where I make lists of gifts for the Christmas season, and where I keep a ton of archived notes from my time at university.

In other words, of all the services I use, OneNote is where losing data would be most painful to me. After all, I don’t think it’s unreasonable to assume that Microsoft might lose interest in supporting OneNote at some point in the next 40 to 50 years. But the notes will be valuable to me even then (or arguably more so!).

A couple of years ago I sat down to research how to back up my digital notebooks in order to keep them accessible even if OneNote is shut down. What I found was both good and bad news. The good news was: You can download your OneNote notebooks quite easily by selecting the notebook files in OneDrive web and downloading them. The bad news: The file format they use is (or rather was, at the time) virtually unsupported by the rest of the world. There were no open source file viewers or converters. There were reports that Evernote on Windows could import from OneNote. But that was just another vendor that could Now, Evernote is still in business, but with the free plan limited to 50 notes since late 2023, it might as well be, at least for me.go out of business at any point in the future.

Then I found two documents, [MS-ONESTORE] and [MS-ONE]. Together they specify how OneNote files work. The first one is a specification for the OneNote Revision Store File Format which describes a revision store that is able to store arbitrary data in a list of revisions that represents the object’s history. The second one describes the data model OneNote uses and how this data is stored using the OneNote revision store. In other words, as long as I keep these documents around, I can build a parser for OneNote files even if Microsoft should decide to discontinue it. I downloaded a copy of the specifications and moved on.

A journey into the woods #

I couldn’t hold back for too long though: In the fall of 2020 my curiosity got the better of me and I started working on a OneNote file parser based on the specifications I found earlier. I started, as you’d expect, by reading the specs end to end. Both are around 100 pages long (due to the number of tables involved in explaining all byte offsets), plus another good 100 or so pages of related documents. They are dense, formal, and surprisingly thorough. With patience and a willingness to keep flipping back to earlier sections, you can implement a parser for OneNote files using nothing but the spec.

So that’s what I did. Sort of. The spec tells you what the bytes mean, but no one quite walks you through how to go from raw bytes to a revision store, and from a revision store to the actual object model. This is where libmson,...

onenote data file store keep format

Related Articles