I’d love to hear about your journeys, lessons learned or epic mistakes!
Recently got a master class on patch policy and thought I’d document some of my journey here. One of my biggest mistakes going from an SCCM driven world into Automox was not carrying over some of those well tested enterprise habits. I’ve been trying to optimize to create the best outcome.
Here are a few pain points I’m focused on
- Machines that are not always online, or offline for a few weeks and magically show up.
- Corrupt windows update agents.
- Reboot Expectations and User Experience.
- Patches failing to install
- Multiple reboots and the same patch being applied
- Security tools that were working stopped. So Now I need to re-install… I think
- ConfigMgmt tools that were working stopped. Okay. let’s fix that
- Any security controls I can use a script to gather intelligence on for custom monitoring with the API…
First win
was converting a single patch everything policy into separate policies for OS, 3rd party, O365, Defender, Teams Machine-Wide Update. This will create a lot of policies. Especially if you are like me, have all your workstations in one group and use tags to create “ring” deployment. That looks something like where the # of days is when patches will start installing (I’m using patch age to control that):
- Dev 1% of assets - 0 days
- Test 5% of assets - 2 days
- Quality 20% of assets - 4 days
- Prod everything else - 6 days
Second Win
was organizing all deployment times to fit a schedule that was easier to communicate. Also wanted to put distance between worklets and patching policy. I’m still checking that box to ensure a client runs it if they miss the deadline on many of them. Here is an example schedule
- Daily 12am break/fix worklets (i.e manual security configurations, install missing critical software)
- Daily 6:30am reserved for active key initiative changes
- Daily 7am/7:30am/7pm all those patching policies
- Daily 8am custom reboot app
- Daily 9am security or hygiene worklets
- 2nd Tuesday 9am - security or hygiene worklets
Third Win
realizing if you point 150+ policies at a single agent, the bulk of which are not scheduled, that the eval code better be clean. Yeah… 30 minute scan times are not cool. Not proud to admit this one.. so please learn from my mistake here. Stop making amazing eval code unless you intend to use it. Likely will put the code in anyway, but drop an exit 0 at the top of any worklet not scheduled. If the day comes I need to schedule or gain any other value, by all means.
Code Tips
- Watchout for 64-bit registry keys. The OS will just route you to the 32-bit value from a 64-bit process so it gets confusing fast.
- If the worklet is not scheduled, drop an exit 0 at the top. Unless you really want to drive those icons in the console
- If the worklet is scheduled, write clean eval code, avoid long running code, check states multiple times.
- Start process can be danger if you use the -wait command. What if it never stops… That might cause a log jam of policy (not always and I could have something else brewing). Instead try to be creative with starting a process and finding a different way to tell PowerShell to wait and have it time out at some point
- Inventory software with the registry. I’m not a fan of Win32_Product as it seems to throw a registering program event in the app log for everything installed.
- Exit codes are important. I’ve had agents run eval code where it should have been exit 0 and was caught using the exact same code in the remediation code. I’ve started ending the code with exit 0 and only using exit 1 when its explicitly time to.
- Validation in depth --- kinda cool. I like to use multiple methods to detect “states”. Not sure about you, but I’ve observed that not all cmdlets, executables or techniques pushed in a script will always provide you the same results on every machine.
- Service State: win32_service and Get-service
- Installed software: Check registry and test-path for an EXE
- OS Patch Reboot Required: COM Object + Registry
- Timeouts - Think long and hard about every cmdlet, EXE or thing you interact with code. Could running this result in something that never finishes. Will all systems actually run this code right.
- There is a 50k character limit on scripts. I wonder how many people find that…
Self Healing Environments
Ever pick up on a pattern where end users will always submit a support ticket. What if you could detect that in code before they ask you to fix it. Be on the lookout for those. My most recent example was picking up on when a patch would install every day, the user would reboot and the process would repeat until someone came in, ran a worklet to reset windows update and rebooted. Once that was done things were back on track. That meant creating code that could accomplish two things
- detect reboots
- detect patching reboots
I’m cheating because I use my own reboot app, I was able to drop in code a step to export a CSV file with the timestamp and action taken by the reboot app. I then correlated that data to reboot detections. Then associated a threshold of 2 reboots and 2 reboot app pop-up’s meant the user was likely stuck in a reboot-patch loop. That enabled me to then action on that step. Now if a user contacts us about that, Automox will have already helped us to detect and remediate