How VictoriaLogs Stores Your Logs in a Columnar Layout<br>Skip to content
Products
Toggle Products submenu<br>VictoriaMetrics<br>VictoriaLogs<br>VictoriaTraces
VictoriaMetrics Enterprise<br>VictoriaMetrics Cloud<br>Anomaly Detection
Plans & Features<br>Resources
Toggle Resources submenu<br>Community<br>GitHub Repos<br>Documentation<br>Blog<br>Case Studies<br>Security<br>🌱 Sustainability
Support
Toggle Support submenu<br>Enterprise Support<br>Community Support
About Us
Toggle About Us submenu<br>The Team<br>Careers<br>Customers<br>Partners<br>News & Articles<br>Contact Us
Contact Us<br>Book a Demo
×Book a Demo
🛡️<br>Embedded widget blocked<br>Your privacy settings or an extension blocked the booking widget.<br>Open booking page<br>If you use a blocker, allow embeds for this site and try again.
Blog
/How VictoriaLogs Stores Your Logs in a Columnar Layout
How VictoriaLogs Stores Your Logs in a Columnar Layout<br>Open Source Tech<br>VictoriaLogs<br>Observability<br>Share:
If you run VictoriaLogs, your day-to-day comes down to three things: sending logs, querying them, and setting retention so the disk does not fill up. Everything else happens quietly on disk.<br>This post follows a single log line from the moment it arrives to where it finally rests on disk, so you can picture what VictoriaLogs is doing under the hood and explain what you’re seeing: why your queries come back fast, why you sometimes see many files on disk, and which flags and metrics matter when something looks off. This article is for everyone, no programming background needed and no Go code to read. If you do want to go deeper, the VictoriaLogs source is always the reference.<br>1. A log line arrives<br>VictoriaLogs accepts logs over many protocols: JSON Lines, Elasticsearch bulk, Loki push, OpenTelemetry, syslog, and more (see the data ingestion docs for the full list).<br>Whichever one you use, the first thing VictoriaLogs does is translate that record into a single internal shape that the rest of the system understands: a timestamp, a set of named fields, and a “stream identity”.
Every protocol maps to one internal shape.Each protocol has its own small processor that does this translation, and you can influence it either with query arguments or with headers in the request itself:<br>Drop fields you do not want to store with the ignore_fields query argument or the VL-Ignore-Fields header.<br>Strip terminal color codes from values with the decolorize_fields query argument or the VL-Decolorize-Fields header.<br>Attach extra fields to every record with the extra_fields query argument or the VL-Extra-Fields header.<br>Point VictoriaLogs at the main message field (_msg) with the _msg_field query argument or the VL-Msg-Field header.<br>Tell it which field holds the timestamp with the _time_field query argument or the VL-Time-Field header.<br>Choose which fields define the stream identity with the _stream_fields query argument or the VL-Stream-Fields header.<br>Stream identity is the most important idea in this whole post. Logs that share the same stream fields are treated as a single stream , and you are the one who decides what that stream looks like. For example, set _stream_fields=pod,container, and all logs with the same pod and container form one stream.<br>VictoriaLogs keeps each stream’s logs together on disk, and that grouping is what makes them compress so well and lets a query touch only the streams it needs instead of scanning everything.
Logs with the same stream fields form one stream.The practical rule for you as an operator: keep stream fields stable and low-cardinality, meaning they should have only a handful of distinct values, such as host, app, pod, or container, and keep high-cardinality values, ones with very many unique entries like trace_id or user_id, as normal fields, not stream fields.<br>Now, after receiving and normalizing the incoming records, VictoriaLogs does not handle them one at a time either. It accumulates them in an in-memory buffer and, about once a second (or sooner if the buffer fills up), turns the whole batch into a small searchable chunk that still lives in RAM (an in-memory part ).
The buffered batch flushes into one in-memory part.That in-memory buffer is not a single shared queue. If every incoming batch had to line up for the same buffer, they would waste time waiting on each other, so VictoriaLogs splits the buffer into shards, one per CPU core, and spreads incoming batches across them in turn.<br>So on a 3-CPU machine there are 3 buffer shards filling in parallel, and each shard flushes on its own, writing its batch out as a new in-memory part about once a second:
The buffer is split into per-CPU shards, each flushing its own in-memory part.A part is one of the core data structures across VictoriaLogs (and the other VictoriaMetrics products): a self-contained bundle of data that is searchable, which is to say queryable.<br>Most of the time, the buffered batch is flushed into an in-memory part, but in some rare cases, if a batch is large enough to exceed the in-memory size limit, the part...