As the dust settles on the CrowdStrike event, it highlights the juxtaposition of combating security issues with both third party Agents / automated software update systems and trying to keep our servers patched up and secure.
Any automated patch system that is changing the underlying behaviour and has the ability to affect the behaviour of the applications we spend time crafting and that are so critical to the operation of your business.
However, applying best practices across our software development lifecycle has many safeguards to protect us from incidents like this - we just have to be sure we don't shortcut them as we jump on board with some vendors fancy toolset in the name of "protecting us" - for a service fee.
- Utilising infrastructure as code, which is a key enabler to repeatable infrastructure build/recreation processes.
- Ready access to a high functioning support team able to investigate infrastructure issues, apply patches and rebuild infrastructure. This is not phone support to a team of ticket takers, but actually software engineers who can diagnose and resolve issues.
- Always think about dependencies you are introducing and minimise them.
- Use CI/CD pipelines to ensure updates/patches can be rolled out systematically across environments. Your pipelines should be backed by high coverage automated testing processes.
- Use of both containerisation and/or virtualisation is key to bundle updates and rolling out change-sets between environments.
- Use of cloud native services such as DynamoDB, Lambda and other infrastructure light / cloud native services. Removing servers that you manage can reduce your footprint and exposure.
- Ensure you have High observability and monitoring channels so you can "see" what is happening - critical to get an understanding when things go wrong.
- Ensure patches are rolled out to "pre-production" environments first so you can fully test the integration of your server. Zero-day patches need to be rolled out in timely fashion so managing the risk/timeliness here is crucial. Security oriented operating systems such as CentOS allow patches to flow from non-production to production in a managed process so we can be sure our automated testing catches any issues before they hit production.
These practices are not new but should reduce your exposure and improve your system recovery should (or when) this plays out again.
Get in touch to discuss your unique requirements and how you can better prepare yourself for resilience in the ever increasing connected world our systems operate in.