Skip to main content

Overview

A backup that cannot be restored is no backup at all. Ironcore Backup Solution (IBS) operates two complementary safeguards:
  • Verification — periodic SHA-256 integrity checks on every chunk in the datastore to detect bit rot, media failure, or tampering.
  • Mock Recovery Drills — bi-annual end-to-end restore exercises from the Backup site to confirm full recoverability of the data and the playbook.
This page covers the schedule, automation, and operational procedure for both.
Prerequisites
  • Administrator role on the Polystack platform
  • At least one datastore with active backups
  • For mock drills: dedicated compute and storage resources at the Backup site

Verification

What Verification Checks

For every snapshot in scope:
  1. Chunk presence — every chunk referenced by the manifest exists in the datastore.
  2. Chunk SHA-256 — the recomputed hash matches the manifest reference.
  3. GCM auth tag — the AES-256-GCM tag validates the ciphertext is unmodified.
  4. Manifest signature — the manifest signature is valid.
Any failure marks the snapshot as CORRUPT in the verification report and emits a notification.

Schedule a Verification Job

Open Verification

Navigate to Backup Solution > Datastores > (select datastore) > Verification.

Configure the schedule

Set:
  • Schedule: Sun 06:00 (weekly is the default)
  • Scope: All snapshots, or filter by namespace / age
  • Skip recently verified: Skip snapshots verified within the last keep-verified window (default: 30 days) to limit IO load

Set notification

Attach the operational notification group. Failures dispatch immediately; successful runs dispatch on the configured success policy.

Save

Click Save.
The verification job runs on the next scheduled tick and produces a report visible under the snapshot Verification tab.

Verification Report

FieldDescription
SnapshotThe snapshot under verification
StartedWhen verification began
DurationElapsed verification time
Chunks checkedNumber of chunks read and validated
ResultOK, CORRUPT, MISSING, or STALE
First failureFirst chunk fingerprint and reason
VerifierJob ID that produced the report
A snapshot status flows through:

Limit Verification Load

Verification reads every chunk, which is IO-intensive. Tune for production:
SettingDefaultEffect
--verify-window-days30Skip snapshots verified within N days
--max-concurrent-snapshots4Parallel verification work
--bwlimitunlimitedCap read bandwidth
--io-prioritynormalLower IO priority during verification
For very large datastores, configure verification to run hourly with a short window (keep-last=20) — this gradually verifies every snapshot over a rolling period without a heavy weekly spike.

Mock Recovery Drills

A mock drill is a scheduled, end-to-end recovery exercise using backup data from the Backup site. The drill validates the recovery playbook, IBS data, restored application correctness, and operator readiness — without touching the production environment.

Why Bi-Annual?

Bi-annual drills satisfy common compliance frameworks (banking, government infrastructure, telecom). Two drills per year balance preparedness against operator burden. Higher-stakes environments may drill quarterly.

Drill Roles and Resources

RoleResponsibility
Drill coordinatorPlans the drill, sets success criteria, captures findings
Backup adminProvides Backup site access, encryption keys, restore configuration
Workload ownerValidates the restored workload functions correctly
AuditorRecords timings, deviations, and outcomes
Required resources at the Backup site:
  • Compute resources matching the largest workload to be drilled
  • Storage capacity for rehydration of archived backup data
  • Network connectivity between the Backup site and the restored workload’s test network

Drill Workflow

Schedule and announce

Schedule the drill at least 4 weeks in advance. Announce to all participating teams and the executive sponsor. Drills run twice per year on a recurring calendar.

Select workloads

Select a representative sample:
  • One Tier-0 production VM
  • One Tier-1 production VM
  • One container
  • One physical host backup
  • One database with point-in-time recovery requirements

Provision drill infrastructure

On the Backup site, provision compute hosts and a target storage backend for the drill workloads. Use a dedicated test network — never connect drill workloads to the production network.

Execute the restore

For each selected workload:
  1. Identify the most recent weekly archival snapshot in the Backup site datastore.
  2. Restore using the Backup Solution Dashboard or CLI.
  3. Record start time, time to first guest availability (for live-restore), and time to full restore completion.
Drill restore example
ironcore-backup vm restore \
  --snapshot ibs-archival:production/vm/12345/2026-05-19T03:00:00Z \
  --target-host drill-compute-01 \
  --storage drill-storage \
  --rename drill-vm-12345-2026-05-21 \
  --live

Validate the workload

The workload owner runs a defined functional test:
  • Service starts
  • Critical processes are running
  • Application responds to its standard health check
  • Database connectivity, replication state, and integrity all healthy
Record any deviations from production behaviour.

Restore individual files

From the same snapshot, restore an individual file and a directory archive. Confirm content, permissions, and modification times match expectations.

Tear down

De-provision drill infrastructure. Confirm no drill data remains accessible to production users.

Capture findings

Produce a drill report covering:
  • Snapshots restored, with timings
  • Workloads validated, with pass / fail status
  • Deviations from the recovery playbook
  • Issues found in the playbook, infrastructure, or backup data
  • Action items with owners and deadlines
File the report in the compliance documentation system.

Drill Success Criteria

CriterionThreshold
Restore RPOMost recent weekly archival snapshot ≤ 7 days old
Restore RTORecovery completes within the published target
Live-restore time to running≤ 5 seconds for any tier
Application validationAll functional tests pass
File-level restoreSingle-file restore completes in ≤ 1 minute
Playbook accuracyNo critical deviations from documented procedure
A failed drill is a critical finding. Restoration playbooks, backup data, and any infrastructure deviation must be remediated before the next production change window. File a formal incident if the failure indicates data loss.

Automate the Drill Restore Step

The restore portion of a drill can be automated via a sync job that periodically restores a sample workload to a test environment. The output is compared against a known-good baseline.
Automated weekly restore-test job
ironcore-backup restore-test create \
  --name weekly-vm-restore-test \
  --schedule "Mon 09:00" \
  --snapshot-filter "namespace:production,backup-id:vm/12345" \
  --target-host drill-compute-01 \
  --target-storage drill-storage \
  --rename "test-{{snapshot-id}}" \
  --validate-script /etc/ironcore/drill-validate.sh \
  --teardown
The restore-test job:
  1. Picks the latest matching snapshot
  2. Restores to the test target
  3. Runs the validation script and records exit code
  4. Tears down the test target
  5. Emits a notification with the result

Compliance Reporting

The verification and drill systems produce evidence suitable for compliance audits:
EvidenceSource
Per-snapshot verification logSnapshot detail > Verification tab
Periodic verification job historyTasks panel filtered by verify
Mock drill reportsFiled in compliance documentation
Encryption key rotation historyAudit log
Access grant historyAudit log
Replication completion logTasks panel filtered by sync
Export evidence:
Open Audit Log > Export. Choose the time range and the categories to export. The export is delivered as CSV and JSON.

Troubleshooting

Reduce scope (lower --verify-window-days), increase parallelism, or split verification into multiple jobs targeting different namespaces.
The Backup site typically has fewer or slower storage tiers than Primary. Drill targets should be provisioned to match the expected disaster-recovery capacity — undersized drill infrastructure can give a false negative.
Verification is not running. Confirm the schedule and check the task history. If the worker is offline, restart ironcore-backup-verifier.
The validation script may be tied to production-specific paths or hostnames. Maintain drill-specific overrides in the validate script.

Next Steps

Replication and Sync

Backup site replication required for drills

Security and Encryption

Encryption keys required to restore during a drill

Notifications

Alert routing for verification and drill failures

Architecture

Underlying integrity and append-only design