Failover Cluster Service won’t start on Server 2025

0
123

Photo by MW on Unsplash

Recently, a customer was trying to deploy Hyper-V in an environment where NTLM is not allowed. Policies are in place to prevent NTLM traffic.

People pointed us to this article at Microsoft  that seems to declare NTLM is no-longer required. However; after some experimentation, and a few support tickets, for Server 2022 at least, this is not entirely accurate.

When queried, the closest response we were able to get was that the use of NTLM is required with Server 2022 – for *some* things (I did ask, but they could not tell me anything specific).  From experimentation on my side, it seems that WMI queries may still forcibly use NTLM.

We were instead advised that if we wish to try to reduce our reliance on NTLM, we would need to re-deploy our Fail-over Cluster with Windows Server 2025.

We again drilled for more information regarding NTLM – had it been eradicated? The response was still a little unclear: “In Server 2025, the reliance on NTLM has been significantly reduced”.

The good news seems to be that we were able to build a new environment running Server 2025 without the reliance of NTLM for our Failover Clustering! Happy days!

The bad news for us was that upon deploying Server 2025 and Failover clustering, a CIS GPO setting (specifically from the LSA node introduced in 2022) now seems to think that SSP/AP used by Failover Clustering in Server 2025 is custom and therefore does not allow it to be loaded or used. Boo!

In order for clustering to work, this GPO setting MUST NOT be configured on Server 2025. I believe it is a bug that Microsoft’s own CLUSAUTHMGR.DLL file is declared as a custom package.

The GPO Element is known as:
Allow Custom SSPs and APs to be loaded into LSASS and it is set to Disabled

In the registry this GPO restriction will appear under:
HKLM\Software\Policies\Microsoft\Windows\System

With a value of:
AllowCustomSSPsAPs REG_DWORD 0

If the GPO Element is set after a cluster is formed, the Cluster Service will not start.
If the GPO Element is set BEFORE a cluster is even formed – it will seemingly hang whilst trying to form the cluster for around 15 minutes and fail.

Removing this setting or GPO (and rebooting) will resolve this specific condition.

The failure can be seen in the system log as Event 7024:
“A specified authentication package is unknown”

Log Name:      System
Source:        Service Control Manager
Date:          1/24/2025 4:19:29 PM
Event ID:      7024
Task Category: None
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      Server.FQDN.Name

Description:
The Cluster Service service terminated with the following service-specific error:
A specified authentication package is unknown.


Environments that have not restricted LSASS in this way will obviously not have this problem. However; there are many active threats that make use of this weakness, so not implementing this LSASS restriction really is not good at all.

Have a look at what threat actors are doing on systems where this setting is not enabled.

For more details on the CIS Recommendation, check out this link at Tenable

Stay tuned – Fingers crossed – Microsoft may be fixing this issue

 

 

LEAVE A REPLY

Please enter your comment!
Please enter your name here
Captcha verification failed!
CAPTCHA user score failed. Please contact us!