Understanding B-Tree Indexes in PostgreSQL: A Comprehensive Guide- Part 1 | MediumSitemapOpen in appSign up<br>Sign in
Medium Logo
Get app<br>Write
Search
Sign up<br>Sign in
Understanding B-Tree Indexes in PostgreSQL: A Comprehensive Guide— Part 1
Devli
11 min read·<br>Jun 1, 2024
Listen
Share
Press enter or click to view image in full size
Generated by DALL·ETable of content:<br>Types of indexes in PostgreSQL<br>Logical schema of B-Tree index<br>[Recap] How PostgreSQL stores data ?<br>Logical representation of BTree index in PostgreSQL<br>How B-Tree index is implemented in PostgreSQL<br>We’re all familiar with optimizing queries on large tables, especially when filtering operations begin to slow down. However, to truly leverage index effectively, it’s beneficial to understand its internal mechanics. This series of articles delves into the inner workings of the B-Tree index in PostgreSQL and discusses strategies for using this type of index more efficiently.<br>Types of indexes in PostgreSQL<br>First of all, we need to mention about all types of indexes in PostgreSQL. It provides following ones, each using a different algorithm:<br>B-tree<br>Hash<br>GiST<br>SP-GiST<br>GIN<br>BRIN<br>Every index has its own purpose, so in every situation you should choose the one that fits best.<br>Logical schema of B-Tree index<br>To start speaking about B-Tree index in Posgtres we need firstly remember what is B-Tree ?<br>The term “B-Tree” is short for “balanced tree” indicating that the distance between each node and the root is consistent across all levels. Furthermore, the root and its parent nodes can have more than two children, which effectively minimizes the depth of the tree, enhancing search efficiency. Let’s see main aspects about B-Trees:<br>Node Flexibility : Nodes can have more than two children, typically varying from a minimum of two to a maximum defined by the tree order (M).<br>Height Management : Automatically adjusts to maintain a height of logM N, optimizing search operations.<br>Sorting Order : Maintains data in sorted order, ensuring the smallest values are on the left and the largest on the right.<br>Uniformity and Efficiency : All leaf nodes are at the same level to ensure efficiency and consistency, and there are no empty subtrees above the leaf nodes.<br>B-trees are optimized for balance and retrieval efficiency, making them suitable for systems that require frequent data insertions and retrievals, like database indices and filesystems.
B-TreeThis type of index is based on the Lehman and Yao Algorithm , which was developed in 1981. When a simple search is conducted on a table without any indexes, PostgreSQL defaults to a sequential scan. This means it iterates over every data entry in the table, compares them, and if the conditions match, it returns the relevant data. This is an excellent opportunity to illustrate the differences in O complexity with a comparative graph.<br>Press enter or click to view image in full size
ComplexityWe observe that using an index results in quicker information retrieval for larger reads.<br>[Recap] How PostgreSQL stores data ?<br>When we create table, we expect that it will be stored in some sort of an ordered data on a disk, but in reality our information can be distributed across different pages and places in that pages.<br>CREATE TABLE t(<br>A INTEGER,<br>B INTEGER,<br>C VARCHAR<br>);For this table physically we will have something like this:<br>Press enter or click to view image in full size
How PostgreSQL stores dataLet’s try to know more about our “t” table.<br>SELECT class.oid AS "OID", relname AS "Relation Name",<br>schema.nspname AS "Schema", usr.rolname AS "Object Owner",<br>coalesce(tblsp.spcname, 'pg_default') AS "Tablespace",<br>relpages AS "Amount Pages", reltuples AS "Amount Tuples",<br>reltoastrelid AS "TOAST table",<br>CASE<br>WHEN relpersistence = 'p' THEN 'Permanent'<br>WHEN relpersistence = 't' THEN 'Temporary'<br>ELSE 'Unlogged'<br>END AS "Type",<br>pg_relation_filepath(class.oid) AS "File Path",<br>pg_size_pretty(pg_relation_size(class.oid)) AS "Relation Size"<br>FROM pg_class class INNER JOIN pg_namespace schema<br>ON schema.oid = class.relnamespace<br>INNER JOIN pg_authid usr<br>ON usr.oid = class.relowner<br>LEFT JOIN pg_tablespace tblsp<br>ON tblsp.oid = class.reltablespace<br>WHERE relname = 't';<br>Table metadataOID (Object Identifier) is an identifier created within PostgreSQL, based on a system sequence. This sequence ensures that each new object we create — whether it be a column, function, trigger, virtual table, or something else — receives a unique identifier. This unique identifier allows us to reference the object unambiguously.<br>What else should be noted in a query?<br>Firstly, there is the default tablespace, which in PostgreSQL is pg_default.<br>The schema is public by default.<br>The owner of the table is the individual who created it and is responsible for managing access to it.<br>There are columns that indicate the number of pages and rows — tuples.<br>The concept of TOAST tables or auxiliary tables also exists. These tables support the main table and allow you to split long rows if they do...