How Platform Teams Get Shit Done

written in collaboration, migrations, platform

Pete Hodgson explored the different ways in which a Platform team works with other teams to get shit done in this article. I thought it was interesting to see how collaboration changes based on the type of work, so I put together this visual summary to compare and contrast each type of interaction.

Click image to enlarge

I added a few things here and there, but most of the stuff is taken from the original article, so if you care about this topic I recommend you check it out!

I found particularly interesting the realization that migrations are hard because the team that owns the code that needs changing is not the one driving the migration. This creates a situation of misaligned incentives and makes it a socio-technical problem. The article describes the different ways the teams can collaborate to get it done, but it’s also important to understand the tools a platform team has to remove this friction in the first place, things like sidecars, meshes and service chasis.

I also included a section on how Google does Large-Scale Changes (LSC). They created a tool that allows anybody to submit Large-Scale Changes that are applied across the whole codebase. They advocate for the approach of centralizing the migration, to the point where most changes are reviewed by a single expert, and local teams hold no veto power over the LSC. They rely on code analysis and transformation tools to write the LSC and have a test infrastructure to automatically run all tests that a given change might affect in an efficient manner. To read more about their approach refer to Software Engineering at Google Chapter 14.