Having built many Microsoft based PKI deployments over the years, I was fairly sure I was across most of the best practices and gotchas out there (don’t jump on me Brian, Vadims and Mark if you’re still out there *bow – we are not worthy, etc* – I also hope Greg has recovered by now :D).
It seems that I was missing one scenario relating to firewall rules and communication between the Issuing CAs and the standalone web servers. In this scenario, the Web Enrolment pages, CRLs, OCSP were not installed on the same servers as the ICAs (which is fairly common – but I thought I would spell out the configuration). The Web Servers were being used to broker certificate requests through that delightful age old legacy ADCS Web Enrolment Feature (certsrv) as depicted below:
Whilst I really don’t like loading up that ancient ADCS Web Enrolment feature, there are customers out there who demand it be installed. Yes – I do explain to them that the pages wont work with newer templates and the fact that it may suddenly disappear from a future release of ADCS. Incidentally, take a look at this ripper of an article KB2015796 that gives what I would consider a bad workaround for that problem (yikes).
Anyway, the situation was as follows:
Clients could browse directly to the /certsrv page, but clicking any on “Download a CA certificate, certificate chain, or CRL” or drilling down into one of the “create certificate” items would cause the website to hang for a very long time – it would eventually return with the correct action……. Minutes later!
Environment:
Servers are all in AWS (may be relevant)
Servers are all 2016 (irrelevant)
Offline Root (irrelevant)
Issuing CAs (Relevant)
Web Enrollment Servers (Relevant)
For ease of access, the Issuing CAs were configured to talk DCOM on a standalone port as explained in this aging yet still very relevant article from 2014 by Tom Aafloen on how to Configure ADCS to use a standalone port – good job Tom.. Turns out, if I had have read the comments, I may have found my answer.
I did all the things I would normally do in spinning up an environment – built an offline root, stamped the issuing CAs, restricted them to talk on a standalone DCOM port (lets say I set 6666 for our example here). I sent all the relevant rules over to the AWS build guys and girls – all the rules I would normally send off to a firewall team to implement in a traditional environment.
Allow inbound 135 (*shudder*) and 6666 on the Issuing CAs
Allow port 80 and 443 on the Web Enrollment servers (these enrollment servers also hosted some CRLs – hence port 80).
Enable relevant DC traffic flows.
I then tested auto enrollment for workstations – all looking fine. Also tested manual enrollment -also looking fine.
It wasn’t until I tried to use the ADCS Web Enrollment Server that I noticed the delay – It took so long that I initially thought it had completely hung. Checking the logs showed nothing of interest. I checked everywhere and discovered that each time I fired off a request from a client to the web enrollment server, it would in turn send a request off to the Issuing CA. Taking a look at the Issuing CA, I noticed that it was trying to respond to the webserver (10.77.77.77) on port 135:
In my experience, whenever you see “SYN_SENT” through a netstat like this, there is usually a firewall blocking the request or sometimes it is as a result of asymmetric routing. In this case, port 135 back towards the webserver is most definitely blocked.
It turns out I needed to explicitly open up the return path from the CA to the webserver. I am fairly certain that I have never needed to do this before in more “traditional” environments (companies with “legacy” servers and “legacy” Datacentres). This may be because the return path from the Issuing CA to the webserver would typically allow for a stateful return of packets on “legacy” firewalls (even though it might technically look like a secondary unrelated connection sent outbound from the Issuing CA to the Webserver – firewalls use to be smart enough to see that this is a related flow and would allow it). I could of course be wrong and maybe traffic was less restricted between the ICAs and Web Front Ends in other environments – I really can’t be too sure now.
It seems that the AWS Access groups do not play this stateful return game with RPC/DCOM and you MUST specify the return path ports involved (Port 135 included) and perhaps the ephemeral ports. In fairness, RPC/DCOM is quite messy and belongs in the past, but whilst we are all still renewing machine certificates for wifi and the like, I fear it may stay around for some time.
So there you have it: If you have some kind of “firewall” between your ICA and standalone web enrollment server, make sure you HARD ALLOW in both directions – you know in those old firewall requests your firewall guys and girls would say no no, you don’t need bidirectional…? In AWS, it does seem as though you need to do this..