Protect your workloads from compute host failures with automatic detection and recovery. Polystack Instance HA continuously monitors compute nodes and instances, triggering evacuation and restart workflows the moment a fault is detected — without manual intervention.Documentation Index
Fetch the complete documentation index at: https://docs.polystack.tech/llms.txt
Use this file to discover all available pages before exploring further.
Powered by VM2Cloud
Ironcore virtualization technology is powered by VM2Cloud.
Instance High Availability
User Guide
Understand protection segments, instance protection policies, and how to monitor
recovery workflows for your running workloads.
Admin Guide
Configure failover segments, host and instance monitors, notification drivers, and
integrate Instance HA with your compute cluster.
CLI Reference
Complete command reference for managing failover segments, hosts, and recovery
notifications using the openstack CLI.
Compute Service
Polystack Compute provides the hypervisor layer that Instance HA monitors and manages
during host failover events.
Key Capabilities
Host Failure Detection
IPMI and SSH-based monitors detect unreachable hosts in seconds and immediately
trigger evacuation of all protected instances.
Automatic Instance Recovery
Failed instances are automatically restarted on healthy hosts within the same
protection segment, respecting affinity rules.
Reserved Host Failover
Designate standby compute hosts that remain idle until a failover event occurs —
guaranteeing resource availability for recovery.
Protection Segments
Group hosts and instances into logical fault domains. Each segment has its own
recovery policy, monitors, and notification targets.
Notification Drivers
Integrate with IPMI, SSH, and custom notification sources to receive precise
fault signals from infrastructure monitoring tools.
Audit Trail
Every recovery event is logged with timestamps, affected instances, and resolution
outcomes — fully queryable via the Dashboard and CLI.
How It Works
Platform Resilience
VM High Availability
Automatic instance restart on host failure. Configurable per-instance priority. Failover segments for per-group recovery policies. Requires Ironcore.
Power Recovery Automation
9-phase automated recovery playbook. Target recovery time: 7-13 minutes. Sequential service startup with health gates between each phase. Requires Ironcore.
Container Self-Healing
Three-tier autoheal daemon with dependency-aware restart ordering. Circuit breaker pattern prevents restart loops. Exponential backoff. Requires Ironcore.
Proactive Monitoring
Pre-configured alert rules across 13 groups covering storage, database, message queue, compute, networking, containers, APIs, system resources, disk, memory, security, and capacity. Predictive alerts for capacity forecasting. Requires Ironcore.
Network Resilience
L3 high availability and DHCP high availability with sub-3-second failover. Automatic ARP gratuitous announcements for fast VIP convergence. Requires Ironcore.
Rolling Upgrades with Rollback
Per-service container upgrades with 2-10 second swap time. Canary deployment (first node only). Image tag rollback mechanism. Previous images cached locally. Requires Ironcore.
Related Services
Polystack Compute
The hypervisor layer monitored and managed by Instance HA
Resource Optimizer
Rebalances workloads after recovery to restore cluster efficiency
Polystack Block Storage
Persistent volumes that survive host failover when using shared storage
