If you are planning a career in software development as a backend or a full-stack engineer, understanding how databases work and how to write efficient SQL queries will give you a good headstart in developing systems that will scale. In this database engineering series written by Sawali & me, we will delve into some concepts and general guidelines that can be followed to architect efficient data querying.
Here are links to all the articles in this series:
- Understanding database storage
- Database Indexes
- Understanding EXPLAIN & ANALYZE
- Partitioning
- Sharding
- Transactional vs Analytical Databases
We will cover the above topics with example code. Setting up PostgreSQL is beyond the scope of these articles. However, if you have docker, the following command can quickly spin up a Postgres instance.
docker run -e POSTGRES_PASSWORD=password -d -p 5432:5432 --name postgres postgres
docker exec -it postgres psql -U postgres
Table of contents
Open Table of contents
Terminology
Once you have understood how data is stored, indexed and organized by PostgreSQL, it becomes easier to figure out why certain SQL queries return results faster than others. Here are some building blocks of Postgres - some terms to know of to understand the upcoming articles.
Items
In a relational database system, we always prepare entities with certain attributes.
An item is a single attribute of the row. For example, an employee John Doe can have a salary
of 10_000.
The salary is an attribute or item. A set of attributes or items are combined to make a row
or tuple
.
The sequence in which items are saved within a row is dependent on the schema of the table.
Rows/Tuples
A relational database system contains tables. A table has attributes (columns) and rows. A tuple also known as a row is a group of attribute values of the table.
For example, a table that has (name, age, salary)
as attributes can have (John Doe, 30, 10_000)
as a tuple or row.
(Mithun, 30, INT_MAX)
is another tuple.
Each row or tuple is assigned a row_id
by the DBMS.
Pages
PostgreSQL stores data in multiple blocks
. Each block has a size of 8KB.
These blocks are otherwise called pages
. The pages contain multiple tuples/rows.
The amount of tuples that a page can hold depends on the size of the tuple.
A page consists of a header and data region.
The tuples are written from bottom-right towards left in a page’s data region.
The header holds the information of identifiers that point to each tuple.
Heap
PostgreSQL organizes data in files called heaps.
A heap or heap file
contains pages as their building blocks.
There is no hierarchy or ordering of pages within the heap. A tuple is written to the next free space or freshly allocated page.
Finally, a table will have data stored in heaps. Optionally the table can have indexes that point to rows in the heap. Postgres will look at the index if present and the heap where rows are present in pages to fetch results when a SQL query runs.
The relationship between tuples, heaps, and pages is important for understanding how PostgreSQL stores data. By understanding this relationship, you can better understand how PostgreSQL queries work and how to optimize your queries for performance.
DB Storage Quiz
Next, we will learn about database indexes and how they improve query performance. Click here to read.