From b1308157f821fd5f864f553cb2c5f26e589cf377 Mon Sep 17 00:00:00 2001 From: Martin Ashby Date: Sun, 5 Feb 2023 16:37:15 +0000 Subject: SRE book --- content/posts/2022-12-27-book-site-reliability-engineering.md | 11 ----------- 1 file changed, 11 deletions(-) delete mode 100644 content/posts/2022-12-27-book-site-reliability-engineering.md (limited to 'content/posts/2022-12-27-book-site-reliability-engineering.md') diff --git a/content/posts/2022-12-27-book-site-reliability-engineering.md b/content/posts/2022-12-27-book-site-reliability-engineering.md deleted file mode 100644 index 84996e7..0000000 --- a/content/posts/2022-12-27-book-site-reliability-engineering.md +++ /dev/null @@ -1,11 +0,0 @@ ---- -title: "Book - Site Reliability Engineering" -date: 2022-12-27T16:16:43Z -draft: true ---- - -I've been reading [Site Reliability Engineering](https://sre.google/sre-book/table-of-contents/) from Google/O'Reilly. It's an interesting insight into how Google scales their operations work. So far I'm about 1/3 of the way through. - -I'm reading this looking for tips to apply at my current job. It's fairly plain that most of the advice and stories are relevant to a huge organization with sprawling complexity, but also enormous resources to manage it. It's easy to see how some advice like holding meaningful postmortems for incidents, and having and maintaining incident response plans, and having extensive monitoring is possible and useful at Google, but less clear which pieces could be applied at a smaller organization. - -A secondary take-away is outsourcing as much as possible: when SRE isn't your core capability, and you aren't big enough to need it, use hosted / fully managed services wherever possible; taking away as much of the maintenance burden as possible. -- cgit v1.2.3-ZIG