Fastly, Google and Amazon’s “Bug Already Present” Pattern Caused 3 Biggest Outages This YearIn all cases, a bug that wasn’t triggered until long after release caused a cascade of failuresJun 30, 2021Jun 30, 2021
What Reliability Engineers can learn from Google’s December 2020 OAuth OutageFive Takeways to Apply to Your Reliability StrategyJun 30, 2021Jun 30, 2021
What Reliability Engineers can learn from Amazon’s November 2020 Kinesis OutageSix Takeways to Apply to Your Reliability StrategyJun 30, 2021Jun 30, 2021
Published inExpedia Group TechnologyHow To Export Medium Stats for Monthly AnalysisLearn how to generate and retain monthly stats on MediumJun 23, 20214Jun 23, 20214
Published inExpedia Group TechnologyMicroservices Are Not a Technical Solution, They’re a Teamwork SolutionThe first answer to “make this more reliable” isn’t “make it a set of microservices”Apr 29, 20214Apr 29, 20214
Published inExpedia Group TechnologyTraffic Shedding, Rate Limiting, Backpressure, Oh My!How to stop your service from getting overloadedMar 25, 20211Mar 25, 20211
Published inExpedia Group TechnologyThe Cost of 100% ReliabilityHow do reliability costs stack up? Where do they come from?Mar 31, 2020Mar 31, 2020
Published inExpedia Group TechnologyPractical JVM GC tuning for everyoneThe modern garbage collection tuning procedure for JVMsFeb 20, 20202Feb 20, 20202
Published inExpedia Group TechnologyDevOps = Dev + ErrorBudget + OpsIn this decade we entered a new era of IT: rapid release as a standard. This practice has grown from niche to such a common practice that…Jun 19, 20191Jun 19, 20191
Published inThe Hotels.com Technology BlogOptimizing your server by limiting request overheadsSuccess is a double-edged sword — increased request volume and more edge case requests stress your server. Many a server has failed…Jan 17, 2019Jan 17, 2019