It’s been too long since I’ve updated this blog. Since switching to my new job as director of site reliability at WorkFusion, where I’m tasked with forging the role of SRE at a rapidly growing AI/ML software startup with many customers of varying sizes residing both on-prem and on various cloud platforms, I’ve been collecting and recollecting a lot of ideas about the role of SRE from the perspective of how others in the industry define this role, and how it was defined at Palantir, my first time working in site reliability. The challenge is to apply the same core principles of site reliability to stacks of varying sizes, rather than to one monolithic stack, or to many monolithic stacks.

More on this to come…