log.andvari.net

Plus Ça Change Management: 20 Years of SRE

Fri 19 July 2024

On July 19th 2004, I spent my first day at Google. I showed up at the Datacenter in Dublin, since that's apparently where SREs and production folks were going to sit. There wasn't a corp network connection, because this was Google in 2004. A couple of days later, I moved my stuff to Barrow Street and commandeered a desk. A few weeks after that, someone from facilities asked "Hey, isn't your team based in the DC?". "Nope", I said. They shrugged, and updated some spreadsheets, and then SRE was based in Barrow Street. Again, this was Google in 2004. We were making it up as we went along, in the best possible way.

I could claim I was doing SRE-adjacent things before this, as I suspect could many people, but I'm going with that day as when I became an SRE. The function was still figuring itself out; in many ways it still is. My work with Busy Teams as a freelancer isn't "SRE work" on paper, but in practice, it very much is. Resilience, defense in depth, common sense, backup plans for backup plans. Across industry, SRE/Reliability/Devops/ProdEng/Whatevsies is many things to many people. So, in the spirit of the core of the function, I've been thinking a little about uncomfortable gaps in capability. What have we not figured out yet?

Here's my short list of boiling hot takes:

Most companies haven't figured out how much they steady-state care about reliability. I expand on this in "6 Reasons you Don't need an SRE Team".
Assuming we've figured out that we care, we still can't agree how hard a problem this is. SRE (and traditional ops) begat DevOps, which at its core has the premise that busy developers could side-gig the production bits, and you can just wave your hand and say "You build it, you run it" and that'll cover you. This isn't true today, and was even less true when 'DevOps' started being a thing. If I squint my eyes, I can see a course correction happening with "Production Engineering", which has at its premise a lot more sensible of an acknowledgment that this whole area is hard (as in, requires smartness, innovation, and Real Engineering(tm)) as opposed to difficult (as in, it's boring and I don't want to do it). These are glacial shifts; it's taken more than 20 years for us to go in this circle back to acknowledgment that this is a real and specialised set of problems.
We actually run less and less of our own infra, and practices need updating. For a brief period before AI was suddenly what all startups are about, there were (and still are!) great startups producing big parts of your tool-chain as SaaS/PaaS. This is great for not having to build in in-house expertise in that particular area, but can leave you dead in the water if you don't have a good strategy around vendor management. I spoke about this a bit at SRECon EMEA 2023, and I do feel like a lot of our practices involve declaring an outsourced part of our tool-chain to be an opaque cuboid, and then not having defense in depth for when it goes away (temporarily or permanently). Many of the SRE practices as set out in various books/articles kind if assume you own your whole stack. This is becoming mostly untrue, and we are currently a Frog of Moderately Troubling Temperature here.

Anyway. It's Friday, it's 22 degrees outside in Dublin, and it's time to go enjoy the next 20. No shortage of things to do.

Category: Writing Tagged: log work google sre

What Would It Take?

Fri 12 May 2023

One of the most constructive things I've found when facing a difficult work situation is to externalise and write things down -- I do quite well with talking things through with folks, and coaching and mentoring is something I get a lot of energy out of. When it's just me and …

Category: Writing Tagged: work coaching

Fri 12 September 2014

Second week back, and seeing what my job ends up looking like for the next while. Think 'busy'. I'm trying to nail down travel until the end of the year, it being increasingly impossible. I'll be in Zurich for a while next week, and after that is anyone's guess.

Other …

Category: Log Tagged: log motorbike work

Sun 07 September 2014

First week back at work - done!

Things have been more physically tiring than otherwise. I've been trying to nail down the next couple of months travel wise, and managed to make it into work on the bike two days out of five, which was good. That's been helped along by …

Category: Log Tagged: log work brain

Sun 31 August 2014

Last day of leave, and a day of normalcy and domesticity. Mainly hung around and enjoyed the last of the days since the end of June, when I last went near work. Tomorrow will see some hectic times, when I figure out what's happened while I've been away. If I've …

Category: Log Tagged: log leave work brain

Plus Ça Change Management: 20 Years of SRE

Fri 19 July 2024

What Would It Take?

Fri 12 May 2023

September 12th 2014

Fri 12 September 2014

September 7th 2014

Sun 07 September 2014

August 31st 2014

Sun 31 August 2014

Plus Ça Change Management: 20 Years of SRE

Fri 19 July 2024

What Would It Take?

Fri 12 May 2023

September 12th 2014

Fri 12 September 2014

September 7th 2014

Sun 07 September 2014

August 31st 2014

Sun 31 August 2014

Page 1 of 2