ConfigMgr PXE issues with DHCP Guard & Server Block in NSX

Recently, I was engaged in supporting a customer to help them progress with the roll out of Windows 10. Understandably, they wanted to leverage PXE to build the devices which worked perfectly fine until they decided to upgrade to CB 1910 along with ADK 1903. Post upgrade, PXE stopped working suddenly. Now, I have done countless upgrades till date, so it didn't make any sense as to why such a routine upgrade would cause PXE to break. This is where my quest starts. Now the write up from here on may appear long but bear with me as I try to capture the entire troubleshooting process.

PXE can become tricky if the setup is not configured correctly. As the first step, I checked for the usual areas of the PXE configuration, starting with the SMSPXE.log on the Distribution Point hosting the PXE role. I could clearly see that the client was sending the PXE request and the PXE server was even responding with the policy and the NBP file. However, the client kept on requesting using the default IP 0.0.0.0 indicating that the PXE client was not receiving a response from the DHCP server to accept an IP. That clearly appeared to be an issue at the Network layer and on checking with the customer they revealed that they were using DHCP scope options instead of using IPHelpers. Naturally, at this point one would advise to use IPHelpers, but the customer had been using DHCP scope options for years and there was no reason as to why things will stop working suddenly. However, some modifications were already made with the scopes so it was decided to configure the IPHelpers after all.

After doing this, as expected, things moved along a bit and atleast the PXE client was able to download the NBP file. The machine started to send request from an IP after it got a response from the DHCP server. A step in the right direction. 

However, the process stopped again at the stage where the PXE client did not download the Bootmgfw.efi. The PXE server responded in a timely manner each time the PXE client made the request. The PXE client showed the following message without timing out for hours.



This seemed unusual as I could clearly see in the SMSPXE.log that the PXE server was responding with the boot image after receiving a response from the ConfigMgr server on client look up action. There was a policy available for the device.


I captured network traffic and could see that the traffic logs were matching the logs in SMSPXE.log. Again, no errors at this point. Not ruling out an issue with the PXE configuration itself, I went ahead and re-added the role in ConfigMgr. Unfortunately, this made no difference and the PXE behavior remained the same. Suspecting an issue with the boot image I tried different versions of ADK, but even that didn’t help. By now I had also tested with PXE provider service to see if the issue was with the WDS in some way, but that didn’t help either. This is where PXE troubleshooting can become really tricky. If there are no errors, then it becomes difficult to pinpoint the issue. The PXE client will just not download the Bootmgfw.efi, with no errors.

At this point, I decided to open a case with MS premier support as there was clearly something going on here that was beyond my control. Boy I was right. Soon after opening the case, MS confirmed that the issue is not known and it only seems to affect virtual servers running PXE point. In this case, the PXE point was running on a Server 2012 R2 VM and was being governed under ‘Server-block’ setting found within NSX-T\Nerworking\Segments\Segments Profiles.

This is part of Segment security in VMware that provides stateless Layer 2 and Layer 3 security by checking the traffic to the segment and dropping unauthorized packets sent from VMs by matching the IP address, MAC address, and protocols to a set of allowed addresses and protocols. One can use segment security to protect the segment integrity by filtering out malicious attacks from the VMs in the network. One can configure the Bridge Protocol Data Unit (BPDU) filter, DHCP Snooping, DHCP server block, and rate limiting options to customize the security on a segment profile.

This is similar to DHCP guard & route Advertisement Guard applied on network segments in Hyper-V.

Once the NSX-T profile was disabled, the PXE process started to work immediately. On checking further with the Wintel team, they confirmed that they had configured this Security segment profile for all the VMs around the same time when the ConfigMgr was upgraded to CB 1910. That would explain why things stopped working suddenly after the upgrade.

Since there is no official documentation released by Microsoft on this, I decided to blog my experience. Hope this helps someone and saves them time and effort.

Until next time..

Reference:
https://docs.vmware.com/en/VMware-NSX-T-Data-Center/index.html

Comments

Popular posts from this blog

How to force escrowing of BitLocker recovery keys using Intune

Intune: Configure Printers for Non-Administrative Users

Intune: UAC Elevation Prompt Behavior for Standard Users