Plus Ça Change Management: 20 Years of SRE

Fri 19 July 2024

On July 19th 2004, I spent my first day at Google. I showed up at the Datacenter in Dublin, since that's apparently where SREs and production folks were going to sit. There wasn't a corp network connection, because this was Google in 2004. A couple of days later, I moved my stuff to Barrow Street and commandeered a desk. A few weeks after that, someone from facilities asked "Hey, isn't your team based in the DC?". "Nope", I said. They shrugged, and updated some spreadsheets, and then SRE was based in Barrow Street. Again, this was Google in 2004. We were making it up as we went along, in the best possible way.

I could claim I was doing SRE-adjacent things before this, as I suspect could many people, but I'm going with that day as when I became an SRE. The function was still figuring itself out; in many ways it still is. My work with Busy Teams as a freelancer isn't "SRE work" on paper, but in practice, it very much is. Resilience, defense in depth, common sense, backup plans for backup plans. Across industry, SRE/Reliability/Devops/ProdEng/Whatevsies is many things to many people. So, in the spirit of the core of the function, I've been thinking a little about uncomfortable gaps in capability. What have we not figured out yet?

Here's my short list of boiling hot takes:

  1. Most companies haven't figured out how much they steady-state care about reliability. I expand on this in "6 Reasons you Don't need an SRE Team".

  2. Assuming we've figured out that we care, we still can't agree how hard a problem this is. SRE (and traditional ops) begat DevOps, which at its core has the premise that busy developers could side-gig the production bits, and you can just wave your hand and say "You build it, you run it" and that'll cover you. This isn't true today, and was even less true when 'DevOps' started being a thing. If I squint my eyes, I can see a course correction happening with "Production Engineering", which has at its premise a lot more sensible of an acknowledgment that this whole area is hard (as in, requires smartness, innovation, and Real Engineering(tm)) as opposed to difficult (as in, it's boring and I don't want to do it). These are glacial shifts; it's taken more than 20 years for us to go in this circle back to acknowledgment that this is a real and specialised set of problems.

  3. We actually run less and less of our own infra, and practices need updating. For a brief period before AI was suddenly what all startups are about, there were (and still are!) great startups producing big parts of your tool-chain as SaaS/PaaS. This is great for not having to build in in-house expertise in that particular area, but can leave you dead in the water if you don't have a good strategy around vendor management. I spoke about this a bit at SRECon EMEA 2023, and I do feel like a lot of our practices involve declaring an outsourced part of our tool-chain to be an opaque cuboid, and then not having defense in depth for when it goes away (temporarily or permanently). Many of the SRE practices as set out in various books/articles kind if assume you own your whole stack. This is becoming mostly untrue, and we are currently a Frog of Moderately Troubling Temperature here.

Anyway. It's Friday, it's 22 degrees outside in Dublin, and it's time to go enjoy the next 20. No shortage of things to do.

Category: Writing Tagged: log work google sre

July 16 2024

Tue 16 July 2024


Going to be playing a bit of catch-up, since it's been a few months.

Work is still workin' -- I've got some coaching going on, and am still working mainly with one team on consulting. It's been fun times, and the gig is flexible enough that I've gone essentially part time …

Category: Log Tagged: log motorcycle travel

Read More

April 01 2024

Mon 01 April 2024

A few months further into my "not working full-time and seeing how that goes" journey.

I've picked up a contract for the consulting end of my 'job' since January, that'll likely last til the summer or thereabouts -- it'd been fun doing the good and useful parts of the job I …

Category: Log Tagged: log

Read More

November 21 2023

Tue 21 November 2023


The last few weeks were a bit of a hive of activity. I went back to work, started a small business, and went from no-busy-at-all levels of work-related concentration to quite-busy-and-not-very-stressful-yet. I'm going to be doing more than just coaching, but that's for another time.

Earlier this year, I asked …

Category: Log Tagged: log life

Read More
Page 1 of 21

Next »