From d8d7a2a5ff43f913322473a9714a63f1fd8d2303 Mon Sep 17 00:00:00 2001 From: Martin Ashby Date: Tue, 27 Dec 2022 18:59:24 +0000 Subject: draft post on google SRE book --- content/posts/2022-12-27-book-site-reliability-engineering.md | 11 +++++++++++ 1 file changed, 11 insertions(+) create mode 100644 content/posts/2022-12-27-book-site-reliability-engineering.md (limited to 'content') diff --git a/content/posts/2022-12-27-book-site-reliability-engineering.md b/content/posts/2022-12-27-book-site-reliability-engineering.md new file mode 100644 index 0000000..84996e7 --- /dev/null +++ b/content/posts/2022-12-27-book-site-reliability-engineering.md @@ -0,0 +1,11 @@ +--- +title: "Book - Site Reliability Engineering" +date: 2022-12-27T16:16:43Z +draft: true +--- + +I've been reading [Site Reliability Engineering](https://sre.google/sre-book/table-of-contents/) from Google/O'Reilly. It's an interesting insight into how Google scales their operations work. So far I'm about 1/3 of the way through. + +I'm reading this looking for tips to apply at my current job. It's fairly plain that most of the advice and stories are relevant to a huge organization with sprawling complexity, but also enormous resources to manage it. It's easy to see how some advice like holding meaningful postmortems for incidents, and having and maintaining incident response plans, and having extensive monitoring is possible and useful at Google, but less clear which pieces could be applied at a smaller organization. + +A secondary take-away is outsourcing as much as possible: when SRE isn't your core capability, and you aren't big enough to need it, use hosted / fully managed services wherever possible; taking away as much of the maintenance burden as possible. -- cgit v1.2.3-ZIG