FiberFS Technical Overview

nyc_pizzadev1 pts0 comments

FiberFS Technical Overview

FiberFS Technical Overview

Home

What is FiberFS?

FiberFS is a POSIX compatible networked filesystem that uses S3 compatible storage as a backend. It has a custom distributed filesystem protocol built for S3 and it supports a virtually unlimited number of concurrent readers and writers across any number of hosts.

How do you use FiberFS?

All you need is the FiberFS binary (fiberfs) and a config (fiberfs.conf). The FiberFS config just needs to define your S3 endpoint. Starting FiberFS will then automatically mount your S3 endpoint using FUSE.

FiberFS can also be used directly as an API in your application, allowing you to bypass both the kernel and FUSE while still retaining full filesystem functionality. As of right now, there is no stable API for this and we plan on formally releasing and supporting this in an upcoming version.

How is FiberFS designed internally?

FiberFS was designed ground up around S3, HTTP, and caching. These principles permeate all aspects of FiberFS’s design and architecture. FiberFS has an in-memory multi-state multi-version core. All FiberFS states are versioned with no inter-dependencies. States cannot encompass more than a single directory or file and many different versions of the same state can co-exist and be operated on concurrently. Collectively these versioned states are combined as a consistent and cohesive view of the mounted filesystem. FiberFS was designed this way because both clients and S3 can be slow and potentially error prone, so by allowing operations to happen across many different versioned states on many different hosts without any overarching IO locks or dependencies allows every operation to run smoothly, consistently, and potentially latency free.

FiberFS has 3 levels of distributed synchronization: file, host, and S3. Each level allows clients to operate concurrently, consistently, and most importantly, correctly.

Inodes within FiberFS map to files and are purely synthetic. They are primarily used to track and invalidate system cache (page cache and directory cache). Inodes are unique per mount and cannot be correlated across mounts or systems. When an external update comes in, those changes are isolated and made visible as a new inode. This means readers and their page cache will stay consistent and intact on whichever inode they are reading from regardless of external updates. New readers will pick up the new inodes and consume fresh content and attributes in a consistent manner based on global flush order.

The only exception to this rule is when an inode has a local writer. In this scenario, FiberFS will match writeback page cache behavior and make updates available on the existing inode. So for example, if an inode has both local readers and local writers concurrently, those writes will be seen by the local readers immediately based on local write order. Remote readers will continue to see updates isolated to newer inodes based on global flush order as described in the paragraph above. If remote writes are also happening concurrently to local writes, they will trigger a page cache invalidation and merge back into the local writers inodes based on global flush order and can be read by local readers on the same inode. This process is designed to allow all writers to correctly merge their writes with global writes in global flush order. In the future, FiberFS will introduce a special “isolation mode” configuration flag which when used with O_WRONLY will isolate local readers from local writers, making the merge process private to the local writers and readers will return to being consistent on their inode’s view of things. This would be similar to “open to close consistency” semantics without having to invalidate any page cache. Be aware that this process only happens when you have concurrent readers and writers on the same file and the same inode, which is a scenario that most applications explicitly avoid.

How does FiberFS store itself on S3?

While FiberFS was built ground up around S3, its core logic has no knowledge of S3. FiberFS simply flushes itself after an operation and that state gets translated and written to S3, in a consistent and distributed correct manner. FiberFS can also operate in reverse where a state is read from S3 and loaded into the local system, again consistently, correctly and internally versioned. This process is isolated to a single directory, so operations happening in different directories have no relation or dependencies on each other. Meaning if 100 clients are operating on 100 different directories concurrently, including sub directories, they will all operate independently of each other with zero impact between them.

FiberFS stores 4 types of files on S3. Chunks, indexes, roots, and full files.

Chunks are file content, but they are the result of what’s given to FiberFS at the end of a write, close, or flush operation. FiberFS will split these buffers up into logical chunk sizes...

fiberfs local readers inode writers cache

Related Articles