Overview
This page covers administrator-level diagnostic procedures across every layer of Ironcore Backup Solution (IBS) — service health, datastore integrity, sync failures, encryption key issues, and capacity exhaustion. For operator-level issues, see the User Troubleshooting page.- Task UPID
- Exact failure text
- Affected datastore and namespace
- Time of failure (with timezone)
- Frequency (one-off or recurring)
Service Health Checks
Service Inventory
| Service | Expected State |
|---|---|
ironcore-backup-api | active (running) |
ironcore-backup-scheduler | active (running) |
ironcore-backup-worker | active (running) |
ironcore-backup-verifier | active (running) |
ironcore-backup-gc | active (running) |
ironcore-backup-sync | active (running) |
ironcore-backup-notify | active (running) |
Log Locations
| Service | Log Path |
|---|---|
| API | /var/log/ironcore-backup/api.log |
| Scheduler | /var/log/ironcore-backup/scheduler.log |
| Worker | /var/log/ironcore-backup/worker.log |
| Verifier | /var/log/ironcore-backup/verifier.log |
| Garbage Collector | /var/log/ironcore-backup/gc.log |
| Sync | /var/log/ironcore-backup/sync.log |
| Notifications | /var/log/ironcore-backup/notify.log |
Datastore Issues
Datastore status `offline`
Datastore status `offline`
-
For local datastores, confirm the mount:
findmnt /mnt/backup/ibs-primary. -
For S3 datastores, confirm the endpoint is reachable:
curl -I https://s3.<your-domain>. -
Re-probe the datastore:
- Inspect the API log for the exact reason.
Datastore reports `no space` despite free disk
Datastore reports `no space` despite free disk
- Garbage collection has not run; chunks are referenced but unused
- Quota set on the underlying filesystem
- Inode exhaustion (many small chunks)
GC takes excessively long
GC takes excessively long
- Confirm the datastore size and snapshot count.
- Check IO bandwidth utilisation during GC.
- Consider partitioning by creating additional datastores.
`chunk not found` errors
`chunk not found` errors
Backup Job Failures
`source unreachable` for all VMs of a host
`source unreachable` for all VMs of a host
- Check compute host health from the Polystack Dashboard.
- Confirm the hypervisor agent responds.
- Failover the affected workloads if the host cannot be quickly recovered.
Backup job hangs at `Starting`
Backup job hangs at `Starting`
`encryption key not registered`
`encryption key not registered`
Backups complete but with `WARNING`
Backups complete but with `WARNING`
- An exclude path matched no files (configuration drift)
- A file changed during read; the changed copy was captured
- A symlink target was unreachable
Restore Failures
`destination not writable`
`destination not writable`
Live-restore stays in `restoring` after hours
Live-restore stays in `restoring` after hours
- Check the live-restore task metrics: chunks fetched per second, bytes remaining.
- Check the source datastore for IO contention with other jobs.
`fingerprint mismatch` during decryption
`fingerprint mismatch` during decryption
Live-restore VM crashes on first IO
Live-restore VM crashes on first IO
Sync and Replication Failures
`TLS fingerprint mismatch`
`TLS fingerprint mismatch`
- Verify the new fingerprint out-of-band (call the remote’s owner).
-
Update the registered fingerprint:
Sync stalls mid-run
Sync stalls mid-run
- Check the inter-site bandwidth telemetry.
- Check remote datastore free space.
Sync repeatedly retransmits the same chunks
Sync repeatedly retransmits the same chunks
Pull sync from Primary fails with `not authorised`
Pull sync from Primary fails with `not authorised`
Verification Failures
`CORRUPT` chunk detected
`CORRUPT` chunk detected
-
Quarantine the affected datastore — disable new writes:
- Identify every snapshot referencing affected chunks.
- Restore from the Backup site replica.
- Open an incident.
Verification report stuck `STALE`
Verification report stuck `STALE`
Verification takes most of the verification window
Verification takes most of the verification window
--verify-window-days, run verification more
frequently with --max-concurrent-snapshots, or partition the datastore.Notification Issues
Critical events not arriving via SMTP
Critical events not arriving via SMTP
- Check the notification target’s last test result.
- Inspect the notification service log for delivery errors.
Webhook delivery returns 5xx repeatedly
Webhook delivery returns 5xx repeatedly
- Check the receiver’s logs.
- Use
curlto send the same payload manually.
Tape Library Issues
Tape drive not detected
Tape drive not detected
dmesg for hardware errors.Autoloader inventory mismatch
Autoloader inventory mismatch
Tape backup completes with `WARNING`
Tape backup completes with `WARNING`
Encryption Key Issues
`master key not available` after a system restart
`master key not available` after a system restart
Key rotation broke some clients
Key rotation broke some clients
Paperkey rejected during master key restore
Paperkey rejected during master key restore
- Re-scan with better lighting.
- Compare the printed paperkey’s printed checksums with the QR-decoded values.
Performance Issues
Backup throughput is below expected
Backup throughput is below expected
- Datastore IO utilisation
- Backup server CPU utilisation
- Network utilisation between client and server
- Increase backup server CPU (compression and encryption are CPU-bound)
- Add faster storage to the datastore
- Tune chunk cache size on the worker
Restore is slower than backup
Restore is slower than backup
Escalation Procedure
For incidents that affect data integrity, security, or service-wide availability, follow this escalation path:Collect evidence
Root-cause
Open a Support Case
For issues that exceed local capability, open a support case with Polystack Technologies:| Information to Provide | Source |
|---|---|
| Platform version | Dashboard footer |
| Affected service version | ironcore-backup --version |
| Task UPIDs | Task panel |
| Logs (sanitised) | /var/log/ironcore-backup/ |
| Datastore inventory | ironcore-backup datastore list |
| Time of incident | Audit log |
| Reproduction steps | Operator narrative |
