Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.polystack.tech/llms.txt

Use this file to discover all available pages before exploring further.

Protect your workloads from compute host failures with automatic detection and recovery. Polystack Instance HA continuously monitors compute nodes and instances, triggering evacuation and restart workflows the moment a fault is detected — without manual intervention.

Powered by VM2Cloud

Ironcore virtualization technology is powered by VM2Cloud.

Instance High Availability

User Guide

Understand protection segments, instance protection policies, and how to monitor recovery workflows for your running workloads.

Admin Guide

Configure failover segments, host and instance monitors, notification drivers, and integrate Instance HA with your compute cluster.

CLI Reference

Complete command reference for managing failover segments, hosts, and recovery notifications using the openstack CLI.

Compute Service

Polystack Compute provides the hypervisor layer that Instance HA monitors and manages during host failover events.

Key Capabilities

Host Failure Detection

IPMI and SSH-based monitors detect unreachable hosts in seconds and immediately trigger evacuation of all protected instances.

Automatic Instance Recovery

Failed instances are automatically restarted on healthy hosts within the same protection segment, respecting affinity rules.

Reserved Host Failover

Designate standby compute hosts that remain idle until a failover event occurs — guaranteeing resource availability for recovery.

Protection Segments

Group hosts and instances into logical fault domains. Each segment has its own recovery policy, monitors, and notification targets.

Notification Drivers

Integrate with IPMI, SSH, and custom notification sources to receive precise fault signals from infrastructure monitoring tools.

Audit Trail

Every recovery event is logged with timestamps, affected instances, and resolution outcomes — fully queryable via the Dashboard and CLI.

How It Works


Platform Resilience

VM High Availability

Automatic instance restart on host failure. Configurable per-instance priority. Failover segments for per-group recovery policies. Requires Ironcore.

Power Recovery Automation

9-phase automated recovery playbook. Target recovery time: 7-13 minutes. Sequential service startup with health gates between each phase. Requires Ironcore.

Container Self-Healing

Three-tier autoheal daemon with dependency-aware restart ordering. Circuit breaker pattern prevents restart loops. Exponential backoff. Requires Ironcore.

Proactive Monitoring

Pre-configured alert rules across 13 groups covering storage, database, message queue, compute, networking, containers, APIs, system resources, disk, memory, security, and capacity. Predictive alerts for capacity forecasting. Requires Ironcore.

Network Resilience

L3 high availability and DHCP high availability with sub-3-second failover. Automatic ARP gratuitous announcements for fast VIP convergence. Requires Ironcore.

Rolling Upgrades with Rollback

Per-service container upgrades with 2-10 second swap time. Canary deployment (first node only). Image tag rollback mechanism. Previous images cached locally. Requires Ironcore.

Related Services

Polystack Compute

The hypervisor layer monitored and managed by Instance HA

Resource Optimizer

Rebalances workloads after recovery to restore cluster efficiency

Polystack Block Storage

Persistent volumes that survive host failover when using shared storage