Fastly, Google and Amazon’s “Bug Already Present” Pattern Caused 3 Biggest Outages This YearIn all cases, a bug that wasn’t triggered until long after release caused a cascade of failuresJun 30, 2021Jun 30, 2021
What Reliability Engineers can learn from Google’s December 2020 OAuth OutageFive Takeways to Apply to Your Reliability StrategyJun 30, 2021Jun 30, 2021
What Reliability Engineers can learn from Amazon’s November 2020 Kinesis OutageSix Takeways to Apply to Your Reliability StrategyJun 30, 2021Jun 30, 2021
Published inExpedia Group TechnologyHow To Export Medium Stats for Monthly AnalysisLearn how to generate and retain monthly stats on MediumJun 23, 2021A response icon3Jun 23, 2021A response icon3
Published inExpedia Group TechnologyMicroservices Are Not a Technical Solution, They’re a Teamwork SolutionThe first answer to “make this more reliable” isn’t “make it a set of microservices”Apr 29, 2021A response icon4Apr 29, 2021A response icon4
Published inExpedia Group TechnologyTraffic Shedding, Rate Limiting, Backpressure, Oh My!How to stop your service from getting overloadedMar 25, 2021A response icon1Mar 25, 2021A response icon1
Published inExpedia Group TechnologyThe Cost of 100% ReliabilityHow do reliability costs stack up? Where do they come from?Mar 31, 2020Mar 31, 2020
Published inExpedia Group TechnologyPractical JVM GC tuning for everyoneThe modern garbage collection tuning procedure for JVMsFeb 20, 2020A response icon2Feb 20, 2020A response icon2
Published inExpedia Group TechnologyDevOps = Dev + ErrorBudget + OpsIn this decade we entered a new era of IT: rapid release as a standard. This practice has grown from niche to such a common practice that…Jun 19, 2019A response icon1Jun 19, 2019A response icon1
Published inThe Hotels.com Technology BlogOptimizing your server by limiting request overheadsSuccess is a double-edged sword — increased request volume and more edge case requests stress your server. Many a server has failed…Jan 17, 2019Jan 17, 2019