As a member of the Platform team, one of my responsibilities is to plan and execute large-scale migrations. These pose a unique challenge: how to get people to complete some tasks without any authority over them.
Maybe it’s because I’m an immigrant desperately trying to fit in. Maybe it’s one more way of keeping the imposter syndrome at bay. Whatever the reason, I’ve always been low-key obsessed with software development lingo. I’d go through meetings dropping principles and laws every chance I got. I’d lurk Slack channels waiting for a chance to pounce on a conversation to say “That’d make it worse based on Brooke’s Law” or “You’re falling for confirmation bias!”. Because I thought that gave me street cred. Same way I’d cover my laptop with stickers to showcase all the frameworks I knew and the conferences I’ve been to (but only if it was a modern framework and a “cool” conference). Just last week, I wrote a post teaching an imaginary audience about Conway’s Law and posted it on the social network for professional narcissists.
When I first joined the Netflix Platform team circa 2020, the Observability offering was composed of a series of tools serving different purposes. There was Atlas for metrics, Edgar for distributed tracing, Radar for Logs and Alerts, Lumen for dashboards, Telltale for app health, etc. It was a portfolio of about 20 different apps. Big and small, ranging from business-specific tools to analyze playback sessions to low-level tools for CPU profiling.
Pete Hodgson explored the different ways in which a Platform team works with other teams to get shit done in this article. I thought it was interesting to see how collaboration changes based on the type of work, so I put together this visual summary to compare and contrast each type of interaction.
Extension functions are great! But if you define them all over the place, it can get confusing pretty quickly. So here’s a cool idiom to limit extension function usage to a specific context.
Distributed tracing can be ridiculously expensive if you try to trace a hundred percent of requests. A common technique to reduce costs is to sample only a small portion of the traffic. But naive sampling techniques like uniform sampling will inevitably capture more common-case executions and might miss the more interesting edge cases. Instead, Sifter’s approach is to bias sampling decisions towards outliers and anomalous traces. This way, anomalous traces have a higher chance of being sampled, and the more uninteresting traces are discarded.
Lots of businesses run on Google Docs. It’s how we write memos, define strategies, discuss proposals, document decisions, write tutorials, and plenty of other things.
Google Docs is a fantastic piece of technology. I almost can’t imagine how we worked before it (productStrategy-Jun-2004-version13.docx anyone?). And yet, I sometimes feel like it could be so much more! Like we’ll look back in 10 years and think: “My god! I can’t believe we were working that way!”. Improving Docs has the potential of completely overhauling the way information flows through an organization. Here are some ideas on how Google could improve it.
Common knowledge says that you don’t deploy on Friday if you want to have a peaceful weekend. Yet, some people will tell you that if you’re not comfortable deploying every day of the week, you’re doing it wrong. They’ll say that deploying shouldn’t be scary and that you probably don’t have enough tests. So, which one is it?
It’s Monday morning. You’re sitting at your desk with your steaming cup of Joe, ready to sink your teeth into that new feature you have to develop. The git pull downloads months worth of changes, and you dive into the code. Piece by piece, you start building a mental model of the system, trying to make sense of the different components. But something doesn’t feel right. Why was it built this way? It feels weird, it feels so obviously wrong, so poorly designed, so suboptimal.
You realize you need help. Whoever wrote this mess should be able to provide some context. You run git blame and your own name hits you in the face like a brick. You start thinking that maybe it’s no so wrong. That you probably had your reasons. If you could only go back in time and ask your past self…
What we include in a test is as important as what we leave out. Having the right amount of information helps us understand what the test is doing at a glance.
Flaky tests are those that randomly fail for no apparent reason. If you have a flaky test, you might re-run it, over and over, until it succeeds. If you have a couple of them, the chances of all passing at the same time are slim, so maybe you ignore the failures. You know, just this one time… Soon enough, you’re not paying attention to failures on this test suite. Congratulations! Your tests are now worthless.
Queues are a powerful tool for building reliable systems. In this article, I’ll describe some of the tips and tricks I came across when working with queues.
Some of the advice is specific to Amazon SQS queues because that’s what I’ve been using the most lately. And also because some of them come from this amazing article from the Amazon Builders’ Library.
We have deluded ourselves into thinking that being able to invert a binary tree on a whiteboard is the hallmark of great software engineering. It’s time we look for better ways of evaluating coding skills.
I love reading about how people do creative work. Be it writing books or designing video games, there’s something magical about peeking behind the curtain and learning how the pros do their thing.
Today I’m reviewing Shape Up, a book about the process of writing software at Basecamp.
You might think that being a backend engineer means you’ll never have to draw anything more complex than a bunch of boxes connected with arrows (or hexagons if are going all cloud native). This is simply not true, and that’s why you’re here.
At some point you’ll find yourself producing system diagrams, flow-charts, slides, mockups, maybe even icons! So, let me show you some tools and tricks I picked up over the years to fake it at design.
Over the last few years Mockk has been gaining ground as the go-to mocking library in KotlinWorld ™. Just recently, it was listed as “adopt” in the ThoughtWorks technology Radar. Want to know what all the fuss is about?
This is the third and final post of the Interviewing in Silicon Valley series. In this last piece I talk about how to make the most of your on-site, how to handle rejection and how to compare competing offers.
Welcome to the second part of the Interviewing Series! It’s time to cover the thing that terrifies most candidates1: the technical questions. We’ll see what different types of questions there are, and how we can prepare for them. We have a lot of ground to cover so let’s jump right into it.
For the past few months, I’ve been interviewing with different companies on the Valley, from some of the well-known giants to promising startups. Over the next couple of weeks I’ll be publishing a series of articles about the things I learned in the journey. This is Part 1.
You’re already using Kotlin on your codebase. Maybe, you’ve even migrated to the new Kotlin DSL for Gradle. Wouldn’t it be nice if you could use Kotlin for your git hooks too?
Just finished reading Migrating to microservices databases by Edson Yanaga. If you can relate to the 3 nouns in the title then you’ll want to check it out.
The first thing you learn about Knockout is about observables. The second thing is computed observables. They are dead simple. They even form part of the Hello World example. But then, the magic was not working for me. Here’s why:

This week I needed to test a class that depended on a method from an static class. I saw we were using PowerMock and thought to myself: “Well this sounds pretty common, I bet it’s easy to accomplish”. But of course I ran into half a dozen issues before I was able to make it work. Here’s my two cents to make your experience easier than mine.
In this post I’ll introduce the concept of Feature Toggles as a release alternative to FeatureBranches. This technique is also known as: Feature toggles, Feature switches, Feature flippers, etc.
So I was learning to animate Views in Android using this video and was having trouble with the second time the animation runned. First run the objects end up in their destination, second run it was mayhem.
I had fallen victim to the great misunderstanding everyone makes about Android animations: they are just a magic trick.
We were having a race condition on a server which was “fixed” by adding an sleep to the thread to check again later. Yes, it sucked, so I decided to make something more sophisticated and went looking for a library to handle retries with multiple strategies. That’s when I first read about Guava Retrying.
Remember my last post on value types using Google’s AutoValue? Today while doing some work on a new Android project I’m starting I thought: ‘Great chance to use AutoValue!’. Guess what, there is a port of Google AutoValue for the Android platform.
Value types is a fancy name for those classes where you have to implement equals() and hashCode(), and usually toString(). You’ve probably wrote thounsands of those classes, but have you ever wonder why do you have write almost 50 lines of code to express such a common concept?
Implementing compare() and compareTo() methods was never fun. Luckily Guava provides an utility that makes comparison methods easier to write and more pleasing to the eye.
I’m starting a series of posts on Guava (Google’s core libraries). Today I am going to start with null, how to use it, and how to avoid it when necessary.
There are billions of blogs written by more experienced and talented devs, there is stackoverflow.com and communities for each tool and language ever created, so why bother creating yet another dev blog? Here are the two reasons why I started this blog.