> ## Documentation Index
> Fetch the complete documentation index at: https://docs.polystack.tech/llms.txt
> Use this file to discover all available pages before exploring further.

# Ironcore Backup Architecture

> Component-level architecture of Ironcore Backup Solution — block-level change tracking, content-defined deduplication, Zstandard compression, and AES-256-GCM encryption.

## Overview

Ironcore Backup Solution (IBS) is a layered architecture optimised for
incremental, deduplicated, encrypted backups across virtual machines, system
containers, and physical hosts. At its core is **Changed Block Tracking (CBT)**
— the hypervisor identifies precisely which disk blocks have changed since the
previous backup so that only modified blocks are ever read, transmitted, and
stored. CBT is what makes incremental backups fast and storage-efficient at
scale.

This page describes each component, the data flow between them, and the design
decisions behind the storage model.

<Note>
  **Prerequisites**

  * Administrator role on the Polystack platform
  * Understanding of TCP networking, storage backends, and TLS
</Note>

***

## Deployment Model

Ironcore Backup Solution (IBS) splits the backup pipeline between **the host
that owns the workload** and **the backup server**. The split is fixed: the
host always performs change tracking, chunking, compression, and encryption;
the backup server always performs deduplication, storage, replication, and
verification.

The location of the "host side" depends on the workload type:

| Workload Type                       | Pipeline Runs On         | Software in the Guest            |
| ----------------------------------- | ------------------------ | -------------------------------- |
| Virtual machine on Ironcore         | Ironcore hypervisor host | **None** — fully agentless       |
| System container on Ironcore        | Ironcore hypervisor host | **None** — fully agentless       |
| Physical Linux host                 | The host itself          | `ironcore-backup-client` package |
| File-archive from any Linux machine | The machine itself       | `ironcore-backup-client` package |

<Info>
  **Agentless for hypervisor-managed workloads** — block-level change tracking
  is performed by the hypervisor against the VM's virtual disks. The guest OS
  is not touched, does not need a backup driver, and cannot detect the backup.
  This applies to all VMs and system containers running on Ironcore.
</Info>

<Note>
  **Agent-based for physical workloads** — bare-metal hosts, standalone Linux
  servers, and machines outside the virtualisation layer require the
  lightweight `ironcore-backup-client` package. The client provides the same
  pipeline (chunk, compress, encrypt) and pushes to the backup datastore.
</Note>

***

## High-Level Component Diagram

```mermaid theme={null}
graph TD
    subgraph "Host-Side Pipeline"
        direction TB
        H1[Hypervisor host<br/>for VMs and containers<br/>— agentless from guest]
        H2[Physical host<br/>with ironcore-backup-client<br/>— agent on the host]
        CBT["<b>Changed Block Tracking</b><br/>(CBT)<br/>identify modified blocks"]
        CHUNK[Content-Defined Chunker]
        COMP[Zstandard Compressor]
        ENC[AES-256-GCM Encryptor]
        H1 --> CBT
        H2 --> CBT
    end

    subgraph "Backup Server"
        API[REST API]
        SCHED[Scheduler]
        WORK[Workers]
        DEDUP[Deduplication Layer]
        DS[(Chunk Datastore)]
        VERIFY[Verification Worker]
        SYNC[Replication Worker]
        GC[Garbage Collector]
    end

    subgraph "Long-Term Archival"
        REMOTE[(Backup Site Datastore)]
        TAPE[(Tape Library)]
        S3[(Object Storage)]
    end

    CBT --> CHUNK
    CHUNK --> COMP
    COMP --> ENC
    ENC -->|Encrypted chunks| API
    API --> DEDUP
    DEDUP --> DS
    SCHED --> WORK
    DS --> VERIFY
    DS --> SYNC
    SYNC --> REMOTE
    DS --> GC
    REMOTE --> TAPE
    REMOTE --> S3
```

<Info>
  **Changed Block Tracking (CBT)** is the foundational mechanism of every
  incremental backup. The hypervisor maintains a per-disk bitmap that records
  every block written since the previous backup. Only blocks marked dirty
  are read, hashed, compressed, encrypted, and transmitted — every other block
  is skipped entirely. CBT is what allows multi-terabyte VMs to back up in
  seconds when only a small fraction of disk has changed.
</Info>

***

## Service Inventory

| Service                 | Process                     | Purpose                                             |
| ----------------------- | --------------------------- | --------------------------------------------------- |
| **Backup API**          | `ironcore-backup-api`       | REST API for all backup operations                  |
| **Scheduler**           | `ironcore-backup-scheduler` | Triggers jobs based on configured schedule          |
| **Worker**              | `ironcore-backup-worker`    | Executes backup, restore, and sync tasks            |
| **Verification Worker** | `ironcore-backup-verifier`  | Re-reads chunks and validates SHA-256               |
| **Garbage Collector**   | `ironcore-backup-gc`        | Reclaims orphaned chunks                            |
| **Replication Worker**  | `ironcore-backup-sync`      | Mirrors snapshots between sites                     |
| **Tape Worker**         | `ironcore-backup-tape`      | Reads / writes LTO tape libraries                   |
| **Notification Engine** | `ironcore-backup-notify`    | Routes alerts to SMTP, webhooks, and metric servers |

***

## Host-Side Pipeline

The host owns the most expensive part of the pipeline: identifying changes,
chunking, compressing, and encrypting. As described in the Deployment Model
above, the "host" is the **hypervisor host** for VM and container backups
(agentless from the guest's perspective), or the **physical host itself** for
bare-metal backups via the `ironcore-backup-client` package.

```mermaid theme={null}
flowchart LR
    A[Hypervisor / Filesystem] -->|Dirty block map| B[CBT Tracker]
    B --> C{Block changed?}
    C -->|No| Z[Skip]
    C -->|Yes| D[Read block]
    D --> E[Content-defined chunker]
    E --> F[Per-chunk SHA-256]
    F --> G{Chunk exists on server?}
    G -->|Yes| H[Send reference only]
    G -->|No| I[Zstd compress]
    I --> J[AES-256-GCM encrypt]
    J --> K[Upload chunk]
```

### Changed Block Tracking (CBT)

**Changed Block Tracking (CBT)** is the mechanism IBS uses to identify exactly
which blocks of a disk have been modified since the previous backup. It is the
single most important determinant of backup performance and storage efficiency.

The hypervisor maintains a **dirty bitmap per disk**: a compact in-memory data
structure where each bit represents a block on the disk. The bitmap is updated
every time the guest writes to a block. At the start of each incremental
backup:

1. The hypervisor exposes the current dirty bitmap.
2. The host-side pipeline iterates the dirty bits — and **only the dirty bits**.
3. For each dirty block, the data is read, chunked, compressed, encrypted, and
   transmitted to the datastore.
4. After the snapshot is committed, the bitmap is reset so the next incremental
   captures only post-backup writes.

Clean blocks (those not marked dirty) are **never read**. This is the core
performance contract of CBT.

| Property             | Behaviour                                                                                  |
| -------------------- | ------------------------------------------------------------------------------------------ |
| Tracking granularity | Per-block (typically 64 KiB per bit)                                                       |
| Persistence          | Bitmap survives guest reboot                                                               |
| Reset trigger        | Successful backup commit                                                                   |
| Bitmap size          | \~64 KB per 1 TB of disk capacity (negligible memory cost)                                 |
| Disk coverage        | All virtual disks attached to the VM at backup time                                        |
| Failure handling     | A corrupted or lost bitmap falls back to a full read; the next backup is implicitly a full |

#### Why CBT Matters

| Scenario                           | Without CBT          | With CBT    |
| ---------------------------------- | -------------------- | ----------- |
| 1 TB VM, 1% daily change           | Read 1 TB every day  | Read 10 GB  |
| 5 TB VM, 0.5% daily change         | Read 5 TB every day  | Read 25 GB  |
| 10 TB database VM, 2% daily change | Read 10 TB every day | Read 200 GB |

For the production reference fleet (200 VMs × 100 GB × 2% daily change), CBT
reduces daily backup IO from 20 TB to \~400 GB — a 50x reduction that makes
the standard nightly backup window practical.

<Info>
  The first backup of any new VM has no bitmap history — it reads every block
  once to establish a baseline (effectively a full backup). All subsequent
  backups use CBT and read only changed blocks.
</Info>

<Note>
  CBT applies to **VM and container backups** where the hypervisor owns the
  storage. For physical host backups via the `ironcore-backup-client` package,
  IBS uses an equivalent **file-level change detection** approach — comparing
  inode metadata and content fingerprints against the previous backup's
  manifest. The net effect is the same: only changed data is transmitted.
</Note>

### Content-Defined Chunking

Backup data is split into variable-sized chunks using a rolling hash. Chunk
boundaries are determined by content fingerprints, not fixed offsets — this
means inserting a byte in the middle of a file does not shift every subsequent
chunk boundary, so deduplication remains effective.

| Chunk parameter | Default |
| --------------- | ------- |
| Minimum size    | 512 KiB |
| Average size    | 4 MiB   |
| Maximum size    | 16 MiB  |

### Zstandard Compression

Per-chunk Zstandard compression precedes encryption. Typical compression ratios
on mixed workloads are between 2x and 4x. Throughput on modern x86\_64 hardware
exceeds 1 GiB/s per core.

### AES-256-GCM Encryption

Each chunk is encrypted with AES-256 in Galois/Counter Mode using the project
encryption key. GCM provides authenticated encryption: any tampering with the
ciphertext fails the GCM authentication tag during decryption.

| Property     | Value                                        |
| ------------ | -------------------------------------------- |
| Cipher       | AES-256                                      |
| Mode         | GCM (authenticated)                          |
| IV           | 96-bit per-chunk random                      |
| Auth tag     | 128-bit                                      |
| Key handling | Client-side only — server never sees the key |

See [Security and Encryption](/services/ironcore-backup/admin-guide/security-encryption)
for key lifecycle, master keys, and paperkey recovery.

***

## Server-Side Storage Model

The backup server stores data in a **chunk-addressed datastore**. Every chunk
is named after its post-encryption SHA-256 fingerprint. Identical chunks always
produce the same fingerprint, so writing a duplicate is detected and skipped.

```mermaid theme={null}
graph TD
    subgraph "Snapshot"
        M[Manifest]
        I[Chunk Index]
    end
    subgraph "Datastore"
        C1[Chunk a1b2...]
        C2[Chunk c3d4...]
        C3[Chunk e5f6...]
    end
    M --> I
    I --> C1
    I --> C2
    I --> C3
```

| Component       | Description                                                             |
| --------------- | ----------------------------------------------------------------------- |
| **Manifest**    | Per-snapshot signed metadata: owner, timestamp, source, key fingerprint |
| **Chunk Index** | Ordered list of chunk SHA-256s that reconstruct the source              |
| **Chunk**       | Encrypted, compressed, content-addressed data blob                      |

### Deduplication

When the client offers a chunk, the server checks whether the fingerprint already
exists in the datastore. If so, the client uploads nothing — the snapshot
manifest simply references the existing chunk. This applies across all
sources sharing the datastore.

### Append-Only Semantics

Existing chunks are never modified. New snapshots add new chunks; pruning
removes snapshot manifests; garbage collection reclaims chunks no manifest
references. This append-only model is critical for ransomware protection — see
[Security and Encryption](/services/ironcore-backup/admin-guide/security-encryption).

***

## Datastore Backends

| Backend                          | Use Case                         | Notes                                     |
| -------------------------------- | -------------------------------- | ----------------------------------------- |
| **Local filesystem**             | Primary DC fast-access datastore | XFS or ZFS on flash or hybrid array       |
| **Replicated filesystem**        | Two-node clustered datastore     | ZFS replication or shared filesystem      |
| **S3-compatible object storage** | Archive or cloud backend         | Native S3 API; reduces hardware footprint |
| **Tape library**                 | Long-term compliance archival    | LTO-5 and newer; barcoded catalog         |

See [Datastores](/services/ironcore-backup/admin-guide/datastores) for provisioning
detail and capacity planning.

***

## Replication Topology

Replication mirrors snapshots from one datastore to another over an encrypted
TLS 1.3 channel. The default topology pairs a Primary DC datastore with a
Backup site datastore.

```mermaid theme={null}
graph LR
    subgraph Primary DC Site
        P[Primary Datastore]
    end
    subgraph Backup Site
        B[Backup Datastore]
    end
    P -->|Encrypted push sync| B
    B -.->|Pull validation| P
```

| Property               | Behaviour                                   |
| ---------------------- | ------------------------------------------- |
| Transport              | TLS 1.3                                     |
| Direction              | Push (default) or pull                      |
| Frequency              | Configurable; typically weekly for archival |
| Bandwidth              | Throttled per-job via `--bwlimit`           |
| Chunk transfer         | Only chunks not present on the destination  |
| Integrity verification | SHA-256 after each sync                     |

See [Replication and Sync](/services/ironcore-backup/admin-guide/replication-sync).

***

## Compliance Mapping

The following table maps internal architecture features to compliance
requirements.

| Requirement                     | Architectural Mechanism                                                                 |
| ------------------------------- | --------------------------------------------------------------------------------------- |
| Changed Block Tracking (CBT)    | Hypervisor dirty bitmaps; per-block tracking with persistent state across guest reboots |
| Full + incremental backup       | Snapshot model with chunk references                                                    |
| File-level restore              | File archive format supports random access                                              |
| VM-level restore + live-restore | Chunk index streams blocks on demand                                                    |
| Replication encryption          | TLS 1.3 transport with per-session keys                                                 |
| At-rest encryption              | AES-256-GCM at the chunk level                                                          |
| Integrity verification          | SHA-256 per chunk with GCM auth tags                                                    |
| Deduplication                   | Content-defined chunking + chunk-addressed store                                        |
| Compression                     | Zstandard per chunk                                                                     |
| Ransomware protection           | Append-only chunks, role-restricted prune                                               |
| Mock recovery drill             | Restore-test job from the Backup site datastore                                         |

***

## Next Steps

<CardGroup cols={2}>
  <Card title="Datastores" icon="database" href="/services/ironcore-backup/admin-guide/datastores" color="#bf9667">
    Provision local, replicated, and object storage datastores
  </Card>

  <Card title="Retention Policies" icon="calendar-days" href="/services/ironcore-backup/admin-guide/retention-policies" color="#bf9667">
    Configure daily, weekly, monthly, and yearly retention windows
  </Card>

  <Card title="Security and Encryption" icon="lock" href="/services/ironcore-backup/admin-guide/security-encryption" color="#bf9667">
    Encryption key lifecycle and ransomware protection model
  </Card>

  <Card title="Infrastructure Sizing" icon="ruler" href="/services/ironcore-backup/admin-guide/infrastructure-sizing" color="#bf9667">
    Plan capacity for Primary DC and Backup site
  </Card>
</CardGroup>
