Skip to content

KVM agent PostCertificateRenewalTask fails with IllegalStateException during certificate provisioning #12795

@jmsperu

Description

@jmsperu

Description

When provisionCertificate is called on a KVM host, the agent's PostCertificateRenewalTask consistently fails with java.lang.IllegalStateException: Shutdown in progress. This causes the host to permanently report secured=false in StartupRoutingCommand.hostDetails, showing as "Unsecure" in the UI despite having valid TLS certificates and a working SSL connection.

Steps to Reproduce

  1. Add a new KVM host to CloudStack 4.22 with ca.plugin.root.auth.strictness=true
  2. Run provisionCertificate hostid=<uuid>
  3. The API returns {"success": true} — keystore, cert, CA cert, and key are all created correctly
  4. The PostCertificateRenewalTask attempts to restart libvirtd and reconnect, but the cert provisioning triggers an agent restart
  5. During the agent shutdown, Runtime.removeShutdownHook() throws IllegalStateException: Shutdown in progress
  6. The agent never sets secured=true — the host permanently shows "Unsecure"

Error Log (agent.log)

INFO  [resource.wrapper.LibvirtPostCertificateRenewalCommandWrapper] Restarting libvirt after certificate provisioning/renewal
WARN  [resource.wrapper.LibvirtPostCertificateRenewalCommandWrapper] Execution of process for command [sudo service libvirtd restart ] failed.
WARN  [cloud.agent.Agent] Failed to execute post certificate renewal command: java.lang.IllegalStateException: Shutdown in progress
	at java.base/java.lang.ApplicationShutdownHooks.remove(ApplicationShutdownHooks.java:82)
	at java.base/java.lang.Runtime.removeShutdownHook(Runtime.java:244)
	at com.cloud.agent.Agent$PostCertificateRenewalTask.runInContext(Agent.java:1377)

Root Cause

In Agent.java:1377, the PostCertificateRenewalTask calls Runtime.getRuntime().removeShutdownHook() during JVM shutdown, which is not allowed per Java spec. The certificate provisioning triggers an agent reconnect/restart, creating a race condition where the PostCertificateRenewal task runs during the shutdown window.

Impact

  • Host permanently shows "Unsecure" in the UI despite valid TLS
  • The actual SSL connection works correctly (keystore loads, handshake succeeds)
  • The secured flag in host_details DB table is overwritten to false on every agent reconnect
  • Manual DB updates are overwritten by the agent's StartupRoutingCommand
  • Reproduced consistently on multiple provisionCertificate attempts

Suggested Fix

The PostCertificateRenewalTask.runInContext() should catch IllegalStateException from removeShutdownHook() and still proceed with setting the secured flag. Alternatively, check Thread.currentThread().isInterrupted() or use a guard flag before calling removeShutdownHook().

// In Agent.java PostCertificateRenewalTask.runInContext()
try {
    Runtime.getRuntime().removeShutdownHook(shutdownThread);
} catch (IllegalStateException e) {
    // JVM is already shutting down, skip hook removal
    LOG.debug("Skipping shutdown hook removal during shutdown", e);
}

Environment

  • CloudStack: 4.22.0.0
  • OS: Ubuntu 22.04 (also reproduced on fresh provision)
  • Java: OpenJDK 11.0.30 and 17.0.18
  • KVM/libvirt: working correctly
  • ca.plugin.root.auth.strictness: true

Workaround

Manually update the DB: UPDATE host_details SET value='true' WHERE host_id=<id> AND name='secured';
This is overwritten on next agent restart but the TLS connection is functionally secure regardless of the flag.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Todo

    Status

    Todo

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions