ZStack Cloud 4.8.10>Product Manuals>User Guide>HA Policy

What is HA Policy?

HA Policy is a mechanism that ensures sustained and stable running of the business if VM instances are unexpectedly or scheduled stopped or are errored because of errors occurring to compute, network, or storage resources associated with the VM instances. By enabling this feature, you can customize VM HA policies to ensure your business continuity and stability.

Concepts

The HA Policy feature involves the following key concepts:
  • HA mode: Specifies whether to enable auto restart if VM instances are unexpectedly or scheduled stopped or are errored because of errors occurring to compute, network, or storage resources associated with the VM instances. None and NeverStop are supported:
    • None: VM instances scheduled to be stopped or unexpectedly stopped are not auto restarted.
    • NeverStop:
      • VM instances scheduled to be stopped are auto restarted.
      • Unexpectedly stopped VM instances are auto restarted on another host depending on the failover strategy you configure for them.
  • VM Failover Strategy: Specifies whether to migrate a VM instance to another host if errors occur to the compute resource, storage resource, or network resource associated with the VM instance.
    The VM failover mechanism inspects the following resource status:
    • Management Network Connectivity Status:
      • Management network connectivity status indicates the status of the network that connects the management node and the host where VM instances reside.
      • This status may turn Abnormal if errors occur to the management node or to the management network.
    • Storage Network Connectivity Status:
      • Detects the connectivity status of the network that VM instances use to access the primary storage where the root volumes of these VM instances reside.
      • This status may turn Abnormal if errors occur to the primary storage or to the storage network.
    • Business NIC Status:
      • Business NIC status may turn Abnormal if errors occur to the host business NIC or the switch port directly connecting to the host business NIC that is associated with the L2 network of VM instances.
    Based on the resource status inspection, the Cloud provides the following truth table for configuring VM failover strategies:
    Management Network Connectivity Status Storage Network Connectivity Status Business NIC Status Fail Over
    Normal Normal Abnormal Yes/No
    Normal Abnormal Normal Yes/No
    Normal Abnormal Abnormal Yes/No
    Abnormal Normal Normal No

Fundamentals

ZStack Cloud HA Policy has the following mechanisms:
  • The Cloud polls the running status of VM instances. If a VM instance is scheduled or unexpectedly stopped, its HA mode is checked. If the HA mode of the VM instance is NeverStop, then the VM instance is restarted on the current host or another host.
    Figure 1. VM HA Started After Unexpectedly Stopped


  • The Cloud polls the status of the hosts where VM instances reside. Either of the management network connectivity status, storage network connectivity status, and business NIC status of the host turns abnormal, the corresponding VM failover strategy and VM HA mode are checked. If the corresponding failover strategy is Yes and VM HA mode is NeverStop, then related VM instances are migrated to another host.
    Figure 2. VM HA Started After Host Business NIC Turns Down


Characteristics

HA Policy has the following characteristics:
  • Comprehensive & Powerful: Covers all mainstream HA scenarios, including various failures, and ensures the stability and continuity of your business.
  • Flexible & Visualized: Provides a simple table that allows you to configure VM failover strategies with one click. This table functions together with the HA Mode that can be configured on all and individual VM instances, thus greatly improving the flexibility of your business HA configuration.

Scenarios

The following describes the scenarios of the HA Policy feature.

  • Host Business NIC Turns Down:
    If a host business NIC turns down, to ensure high availability of business, all VM instances associated with this NIC are expected to migrate to other hosts.
    • For example, your business VM instances are running MySQL database service which is required to achieve high availability. In this case, you can set the HA mode of these VM instances to NeverStop and turn on the switch corresponding to Abnormal Business NIC Status. Then as long as host resources are sufficient, in case that a host business NIC associated with these VM instances turns down, these VM instances will be auto started on other hosts.
  • VM Unexpectedly Stops:
    If a VM instance is unexpectedly stopped, it is expected to auto HA start.
    • For example, your VM instances are running important business applications. To ensure business auto-recovery in case of VM stops due to reasons such as host powered-offs or business overloads, you can set the HA mode of these VM instances to NeverStop. Then if these VM instances are stopped, they are auto started.

Manage HA Policy

On the main menu of ZStack Cloud, choose Settings > Platform Setting > HA Policy. Then, the HA Policy page is displayed.

HA Policy supports the following actions:
Action Description
Enable HA Policy Enables the HA Policy feature.
Disable HA Policy Disables the HA Policy feature.
Note: If you disable HA Policy, VM instances will not be auto restarted if they are stopped. This may cause business interruptions. Proceed with caution.

HA Policy|Failover Policy

On the Enable HA Policy page or the Overview page of HA Policy, you can modify the following true table to configure failover policies for VM instances.
Management Network Connectivity Status Storage Network Connectivity Status Business NIC Status Fail Over
Normal Normal Abnormal Yes/No
Normal Abnormal Normal Yes/No
Note: If the storage type is SharedBlock and this status is Abnormal, VM instances will auto fail over regardless of this configuration.
Normal Abnormal Abnormal Yes/No
Note: The failover policy of this scenario follows the preceding two failover policies of this table. If you set both the preceding two policies to No, then this failover policy is set to No. If you set either of the two to Yes, then this failover policy is set to Yes.
Abnormal Normal Normal No
Note: If the management network is in Abnormal status, you cannot set this failover policy.
Note:
  • For Storage Network Connectivity Status, only shared storage is detected. Local storage is not supported.
  • If an L2 network of a VM instance is of the VXLAN type or the L2 network applies the SR-IOV or Smart NIC, and errors occur to the host business NIC associated with this L2 network or occur to the switch port directly connecting to the host business NIC, this VM instance will not fail over.
On the Enable HA Policy page or the Overview page of HA Policy, you can modify the following host error inspection settings to modify the inspection intervals of the preceding failover policy.
Name Description
Host Self-Inspection Interval The interval that a host inspects its own status. Default: 5. Unit: second.
Maximum Host Self-Inspection Attempts The maximum number of attempts that a host inspects its own status. If the self-inspection of a host fails by the maximum attempts, it is determined that network errors occur with the host. Default: 6.

HA Policy|Advanced Settings

On the Enable HA Policy page or the Overview page of HA Policy, you can modify the advanced settings of HA Policy. They can be classified into the following two categories:
Category Name Description
VM Instance VM Cross-Cluster HA Specifies whether to enable VM migration across clusters to achieve high availability. Default: false. If set to true, hosts across clusters can be detected to achieve VM high availability.
Note: Before you enable this feature, make sure that clusters are well connected.
Maximum GC Retry Interval of NeverStop VM The maximum interval of garbage collection (GC) attempts to start up NeverStop VM instances that are stopped unexpectedly. Default: 300. Unit: second.
Delay of NeverStop VM Startup Attempt The delay of another retry to start up a NeverStop VM instance after the last startup attempt fails. Default: 60. Unit: second.
NeverStop VM Scanning Interval The interval of scanning NeverStop VM instances that fail to start up. Default: 60. Unit: second.
Sync Speed of HA VM State Update
  • The synchronization speed of the state of highly available VM instances on the UI. Default: 1. Valid values: -1 to 5, integer.
  • A higher value indicates a lower synchronization speed. However, a higher value lowers system loads because outdated status update notifications are ignored.
  • The value -1 indicates the state of HA VM instances on the UI does not automatically change.
VM HA Mode Specifies whether to enable auto restart if VM instances are scheduled or unexpectedly stopped or are errored because of errors occurred to compute, network, or storage resources associated with the VM instances. Valid values: None and NeverStop.
  • If you set HA mode to None, VM instances scheduled or unexpectedly stopped are not auto restarted.
  • If you set HA mode to NeverStop:
    • VM instances scheduled or unexpectedly stopped are auto restarted.
    • If errors occur to compute, network, or storage resources, associated VM instances are auto restarted on another host depending on the HA policy you configure for them.
Note: Note that you can specifically set VM HA mode for a VM instance. If you do, this global setting does not take effect on the VM instance.
Host Abnormal Host Check Interval The interval that the management node pings abnormal hosts. Default: 5. Unit: second.
Maximum Attempts to Determine Host Disconnection The maximum number of failed connections that are required to determine that a host is disconnected. Default: 12.
Host Successful Connection Period The time period of a successful connection to a host. Default: 5. Unit: second. If a connection request is responded within the specified time, the connection succeeds.
Host Successful Connection Possibility The possibility of successful connections in contrast to failed connections that determine whether a host is successfully connected. Default: 50. Unit: %.
Minimum Attempts to Determine Successful Host Connection The minimum number of successful connections that are required to determine that a host is successfully connected. Default: 5.
Timeout Period of Primary Storage Inspection by Host The timeout period that a host checks its connection with primary storages. Default: 5. Unit: second.

HA Log

On the main menu of ZStack Cloud, choose Settings > Platform Setting > HA Policy. Then, the HA Policy page is displayed. If HA policy is enabled and the HA mechanism is triggered, then HA logs are generated.

This page displays all VM HA logs in the Cloud. You can view the log information such as task result, VM name, VM owner, host information, and start and end time. These logs can be applied in O&M and audit.
  • You can select a time span to view HA logs. Available time spans: recent 7 days and recent 1 month. By default, logs generated in recent 7 days are displayed.
  • You can customize a time span to view the HA logs in the specified time span.
  • You can search for HA logs by VM name or VM owner.
  • You can filter HA logs by task result. The task results include succeeded and failed.
  • You can sort HA logs by creation or completion time.
  • You can export the HA logs in CSV format.
  • You can adjust the number of HA logs displayed on each page. Optional values: 10, 20, 50, and 100.

Archives

Download Document Archives

Back to Top

Download

Already filled the basic info?Click here.

Enter at least 2 characters.
Invalid mobile number.
Enter at least 4 characters.
Invalid email address.
Wrong code. Try again. Send Code Resend Code (60s)

An email with a verification code will be sent to you. Make sure the address you provided is valid and correct.

同意 不同意

I have read and concur with the Site TermsPrivacy PolicyRules and Conventions on User Management of ZStack Cloud

Download

Not filled the basic info yet? Click here.

Invalid email address or mobile number.
同意 不同意

I have read and concur with the Site TermsPrivacy PolicyRules and Conventions on User Management of ZStack Cloud

Email Us

contact@zstack.io
ZStack Training and Certification
Enter at least 2 characters.
Invalid mobile number.
Enter at least 4 characters.
Invalid email address.
Wrong code. Try again. Send Code Resend Code (60s)

同意 不同意

I have read and concur with the Site TermsPrivacy PolicyRules and Conventions on User Management of ZStack Cloud

Email Us

contact@zstack.io
Request Trial
Enter at least 2 characters.
Invalid mobile number.
Enter at least 4 characters.
Invalid email address.
Wrong code. Try again. Send Code Resend Code (60s)

同意 不同意

I have read and concur with the Site TermsPrivacy PolicyRules and Conventions on User Management of ZStack Cloud

Email Us

contact@zstack.io

The download link is sent to your email address.

If you don't see it, check your spam folder, subscription folder, or AD folder. After receiving the email, click the URL to download the documentation.

The download link is sent to your email address.

If you don't see it, check your spam folder, subscription folder, or AD folder.
Or click on the URL below. (For Internet Explorer, right-click the URL and save it.)

Thank you for using ZStack products and services.

Submit successfully.

We'll connect soon.

Thank you for using ZStack products and services.