Ironcore Backup Architecture - Polystack Documentation

Overview

Ironcore Backup Solution (IBS) is a layered architecture optimised for incremental, deduplicated, encrypted backups across virtual machines, system containers, and physical hosts. At its core is Changed Block Tracking (CBT) — the hypervisor identifies precisely which disk blocks have changed since the previous backup so that only modified blocks are ever read, transmitted, and stored. CBT is what makes incremental backups fast and storage-efficient at scale. This page describes each component, the data flow between them, and the design decisions behind the storage model.

Prerequisites

Administrator role on the Polystack platform
Understanding of TCP networking, storage backends, and TLS

Deployment Model

Ironcore Backup Solution (IBS) splits the backup pipeline between the host that owns the workload and the backup server. The split is fixed: the host always performs change tracking, chunking, compression, and encryption; the backup server always performs deduplication, storage, replication, and verification. The location of the “host side” depends on the workload type:

Workload Type	Pipeline Runs On	Software in the Guest
Virtual machine on Ironcore	Ironcore hypervisor host	None — fully agentless
System container on Ironcore	Ironcore hypervisor host	None — fully agentless
Physical Linux host	The host itself	`ironcore-backup-client` package
File-archive from any Linux machine	The machine itself	`ironcore-backup-client` package

Agentless for hypervisor-managed workloads — block-level change tracking is performed by the hypervisor against the VM’s virtual disks. The guest OS is not touched, does not need a backup driver, and cannot detect the backup. This applies to all VMs and system containers running on Ironcore.

Agent-based for physical workloads — bare-metal hosts, standalone Linux servers, and machines outside the virtualisation layer require the lightweight ironcore-backup-client package. The client provides the same pipeline (chunk, compress, encrypt) and pushes to the backup datastore.

High-Level Component Diagram

Changed Block Tracking (CBT) is the foundational mechanism of every incremental backup. The hypervisor maintains a per-disk bitmap that records every block written since the previous backup. Only blocks marked dirty are read, hashed, compressed, encrypted, and transmitted — every other block is skipped entirely. CBT is what allows multi-terabyte VMs to back up in seconds when only a small fraction of disk has changed.

Service Inventory

Service	Process	Purpose
Backup API	`ironcore-backup-api`	REST API for all backup operations
Scheduler	`ironcore-backup-scheduler`	Triggers jobs based on configured schedule
Worker	`ironcore-backup-worker`	Executes backup, restore, and sync tasks
Verification Worker	`ironcore-backup-verifier`	Re-reads chunks and validates SHA-256
Garbage Collector	`ironcore-backup-gc`	Reclaims orphaned chunks
Replication Worker	`ironcore-backup-sync`	Mirrors snapshots between sites
Tape Worker	`ironcore-backup-tape`	Reads / writes LTO tape libraries
Notification Engine	`ironcore-backup-notify`	Routes alerts to SMTP, webhooks, and metric servers

Host-Side Pipeline

The host owns the most expensive part of the pipeline: identifying changes, chunking, compressing, and encrypting. As described in the Deployment Model above, the “host” is the hypervisor host for VM and container backups (agentless from the guest’s perspective), or the physical host itself for bare-metal backups via the ironcore-backup-client package.

Changed Block Tracking (CBT)

Changed Block Tracking (CBT) is the mechanism IBS uses to identify exactly which blocks of a disk have been modified since the previous backup. It is the single most important determinant of backup performance and storage efficiency. The hypervisor maintains a dirty bitmap per disk: a compact in-memory data structure where each bit represents a block on the disk. The bitmap is updated every time the guest writes to a block. At the start of each incremental backup:

The hypervisor exposes the current dirty bitmap.
The host-side pipeline iterates the dirty bits — and only the dirty bits.
For each dirty block, the data is read, chunked, compressed, encrypted, and transmitted to the datastore.
After the snapshot is committed, the bitmap is reset so the next incremental captures only post-backup writes.

Clean blocks (those not marked dirty) are never read. This is the core performance contract of CBT.

Property	Behaviour
Tracking granularity	Per-block (typically 64 KiB per bit)
Persistence	Bitmap survives guest reboot
Reset trigger	Successful backup commit
Bitmap size	~64 KB per 1 TB of disk capacity (negligible memory cost)
Disk coverage	All virtual disks attached to the VM at backup time
Failure handling	A corrupted or lost bitmap falls back to a full read; the next backup is implicitly a full

Why CBT Matters

Scenario	Without CBT	With CBT
1 TB VM, 1% daily change	Read 1 TB every day	Read 10 GB
5 TB VM, 0.5% daily change	Read 5 TB every day	Read 25 GB
10 TB database VM, 2% daily change	Read 10 TB every day	Read 200 GB

For the production reference fleet (200 VMs × 100 GB × 2% daily change), CBT reduces daily backup IO from 20 TB to ~400 GB — a 50x reduction that makes the standard nightly backup window practical.

The first backup of any new VM has no bitmap history — it reads every block once to establish a baseline (effectively a full backup). All subsequent backups use CBT and read only changed blocks.

CBT applies to VM and container backups where the hypervisor owns the storage. For physical host backups via the ironcore-backup-client package, IBS uses an equivalent file-level change detection approach — comparing inode metadata and content fingerprints against the previous backup’s manifest. The net effect is the same: only changed data is transmitted.

Content-Defined Chunking

Backup data is split into variable-sized chunks using a rolling hash. Chunk boundaries are determined by content fingerprints, not fixed offsets — this means inserting a byte in the middle of a file does not shift every subsequent chunk boundary, so deduplication remains effective.

Chunk parameter	Default
Minimum size	512 KiB
Average size	4 MiB
Maximum size	16 MiB

Zstandard Compression

Per-chunk Zstandard compression precedes encryption. Typical compression ratios on mixed workloads are between 2x and 4x. Throughput on modern x86_64 hardware exceeds 1 GiB/s per core.

AES-256-GCM Encryption

Each chunk is encrypted with AES-256 in Galois/Counter Mode using the project encryption key. GCM provides authenticated encryption: any tampering with the ciphertext fails the GCM authentication tag during decryption.

Property	Value
Cipher	AES-256
Mode	GCM (authenticated)
IV	96-bit per-chunk random
Auth tag	128-bit
Key handling	Client-side only — server never sees the key

See Security and Encryption for key lifecycle, master keys, and paperkey recovery.

Server-Side Storage Model

The backup server stores data in a chunk-addressed datastore. Every chunk is named after its post-encryption SHA-256 fingerprint. Identical chunks always produce the same fingerprint, so writing a duplicate is detected and skipped.

Component	Description
Manifest	Per-snapshot signed metadata: owner, timestamp, source, key fingerprint
Chunk Index	Ordered list of chunk SHA-256s that reconstruct the source
Chunk	Encrypted, compressed, content-addressed data blob

Deduplication

When the client offers a chunk, the server checks whether the fingerprint already exists in the datastore. If so, the client uploads nothing — the snapshot manifest simply references the existing chunk. This applies across all sources sharing the datastore.

Append-Only Semantics

Existing chunks are never modified. New snapshots add new chunks; pruning removes snapshot manifests; garbage collection reclaims chunks no manifest references. This append-only model is critical for ransomware protection — see Security and Encryption.

Datastore Backends

Backend	Use Case	Notes
Local filesystem	Primary DC fast-access datastore	XFS or ZFS on flash or hybrid array
Replicated filesystem	Two-node clustered datastore	ZFS replication or shared filesystem
S3-compatible object storage	Archive or cloud backend	Native S3 API; reduces hardware footprint
Tape library	Long-term compliance archival	LTO-5 and newer; barcoded catalog

See Datastores for provisioning detail and capacity planning.

Replication Topology

Replication mirrors snapshots from one datastore to another over an encrypted TLS 1.3 channel. The default topology pairs a Primary DC datastore with a Backup site datastore.

Property	Behaviour
Transport	TLS 1.3
Direction	Push (default) or pull
Frequency	Configurable; typically weekly for archival
Bandwidth	Throttled per-job via `--bwlimit`
Chunk transfer	Only chunks not present on the destination
Integrity verification	SHA-256 after each sync

See Replication and Sync.

Compliance Mapping

The following table maps internal architecture features to compliance requirements.

Requirement	Architectural Mechanism
Changed Block Tracking (CBT)	Hypervisor dirty bitmaps; per-block tracking with persistent state across guest reboots
Full + incremental backup	Snapshot model with chunk references
File-level restore	File archive format supports random access
VM-level restore + live-restore	Chunk index streams blocks on demand
Replication encryption	TLS 1.3 transport with per-session keys
At-rest encryption	AES-256-GCM at the chunk level
Integrity verification	SHA-256 per chunk with GCM auth tags
Deduplication	Content-defined chunking + chunk-addressed store
Compression	Zstandard per chunk
Ransomware protection	Append-only chunks, role-restricted prune
Mock recovery drill	Restore-test job from the Backup site datastore

Next Steps

Datastores

Provision local, replicated, and object storage datastores

Retention Policies

Configure daily, weekly, monthly, and yearly retention windows

Security and Encryption

Encryption key lifecycle and ransomware protection model

Infrastructure Sizing

Plan capacity for Primary DC and Backup site

​Overview

​Deployment Model

​High-Level Component Diagram

​Service Inventory

​Host-Side Pipeline

​Changed Block Tracking (CBT)

​Why CBT Matters

​Content-Defined Chunking

​Zstandard Compression

​AES-256-GCM Encryption

​Server-Side Storage Model

​Deduplication

​Append-Only Semantics

​Datastore Backends

​Replication Topology

​Compliance Mapping

​Next Steps

Datastores

Retention Policies

Security and Encryption

Infrastructure Sizing

Overview

Deployment Model

High-Level Component Diagram

Service Inventory

Host-Side Pipeline

Changed Block Tracking (CBT)

Why CBT Matters

Content-Defined Chunking

Zstandard Compression

AES-256-GCM Encryption

Server-Side Storage Model

Deduplication

Append-Only Semantics

Datastore Backends

Replication Topology

Compliance Mapping

Next Steps