May 01, 2021
Mobile At Scale covers topics that dives deep into how mobile teams scale building and releasing apps. I haven’t seen a resource as good since Monzo Bank’s 2018 talk “The Release Mindset” given by Andy Smart who I noticed is now Monzo’s VP of Engineering.
I’ve picked some lessons I’ve found interesting and split these into sections below. You might find some advice useful if you’re building an app in a team of 5-10+ engineers.
This may be an egregious question, but what does the Uber app actually do that makes it so big? It displays a map with some moving dots on them, allows you to pick a location, and asks the server for a route and price. Plus a bit of workflow for signup, credit card, reviews; not a huge number of screens.
To which a former Uber Engineer replies:
The “there are only a few screens” is not true. The app works in 60+ countries, with features shipped in the app that often for a country, and - in rare cases - a city.
The app has thousands of scenarios. It speaks to good design that each user thinks the user is there to support their 5 use cases, not showing all the other use cases (that are often regional or just not relevant to the type if user - like business traveler use cases).
Uber builds and experiments with custom features all the time. An experimental screen built for London, UK would be part of the app. Multiply this by the 40-50 product teams building various features and experiments outside the core flows you are talking about (which core flows are slightly different per region as well).
I worked on payments, and this is what screens and components are in the Uber app:
Credit cards (yes, this is only a a few screens)
Apple Pay / Google Pay on respective platforms
PayTM (15+ screens)
Special screens for India credit cards and 2FA, EU credit cards and SCA, Brazil combo cards and custom logic
Cash (several touch points)
AMEX rewards and other credit card rewards (several screens)
Uber credits & top-ups (several screens)
UPI SDK (India)
We used to have Campus Cards (10 screens), Airtel Money (5), Alipay (a few more), Google Wallet (a few) and I other payment methods I forget about. All with native screens. Still with me? This was just payments. The part where most people assume “oh, it’s just a credit card screen”. Or people in India assume “oh it’s just UPI and PayTM”. Or people in Mexico “oh, it’s just cash”. And so on.
Then you have other features that have their own business logic and similar depths behind the scenes when you need to make them work for 60 countries: - Airport pickup (lots of specific rules per region)
Commmuter card functionality
Product types (there are SO many of these with special UI, from disabled vehicles, vans, mass transport in a few regions etc)
Uber for Business (LOTS of touchpoints)
On-trip experience business logic
Pickup special cases
Safety toolkit (have you seen it? Very neat features!)
Custom fraud features for certain regions
Customer support flows
Regional business logic: growth features for the like of India, Brazil and other regions.
Uber Eats touchpoints
Jump / Lime integrations (you can get bikes / scooters through the app)
Transit functionality (seen it?)
A bunch of others I won’t know about.
Much of the app “bloat” has to do with how business logic and screens need to be bundled in the binary, even if they are for another region. E.g. the UPI and PayTM SDKs were part of the app, despite only being used for India. Uber Transit was in a city or two when it launched, but it also shipped worldwide. And then you have the binary size bloat with Swift that OP takes about. _
A really key theme of Gergely’s Mobile at Scale is that mobile engineering actually has all of this hidden complexity which often make non-engineers (he notes PMs in particular) go “Oh I didn’t realise that”.
When mobile apps feel really simple, that’s often by design. But there’s often an enormous amount of engineering done to deal with the complexity to get this done, and it means hiding a lot of unnecessary functionality when you can only send 1 binary.
I’d learned a few years back that Uber was doing incredible “Platform” engineering to enable hundreds of mobile engineers to develop features in its app. I’d previously read a pretty academic paper by Uber engineers “Keeping master green at scale” where they devise an incredibly clever approach to the issue of merges into master sometimes causing regressions.
I ran into these issues at a previous workplace where as the “Core” mobile team, we were accepting Pull Requests from different feature teams across the organisation that could cause tests on master to fail (which causes issues for every engineer). Something that I don’t think people realise is that actually compiling/building an app can take 30 minutes alone, and that’s before running UI tests. If you add Firebase to your application, you might end up adding several dependencies such as LevelDB which underpin Firebase products. And it turns out that it’s pretty difficult to even cache these dependencies even if they don’t change.
There are several tools that have popped up since then that help with this issue such as MergeQueue which cleanly explain the problem:
While CI tools can run test on every pull request when it’s opened, and on every branch after it’s pushed, it may not be sufficient to avoid broken builds. For instance, if you have two pull requests that modify dependent code, the tests could pass on each pull request independently and Github would allow the merge but the build may break after the merge.
During the podcast Gergely mentions when the app was re-written in 2016 to use Swift and the RIBs architecture.
There’s a great series of Tweets from another engineer on the amount of engineering just to enable that migration.
I found myself wanting to understand the RIBs architecture which Uber have open sourced and then adapting for React Native where it makes sense.
The RIBs architecture provides:
- Shared architecture across iOS and Android. Build cross-platform apps that have similar architecture, enabling iOS and Android teams to cross-review business logic code.
We get this “for free” with React Native.
- Testability and Isolation. Classes must be easy to unit test and reason about in isolation. Individual RIB classes have distinct responsibilities like: routing, business, view logic, creation. Plus, most RIB logic is decoupled from child RIB logic. This makes RIB classes easy to test and reason about independently.
When we’re disciplined about separating:
It becomes easier to test with Jest.
- Tooling for developer productivity. RIBs come with IDE tooling around code generation, memory leak detection, static analysis and runtime integrations - all which improve developer productivity for large teams or small.
React Native does some of this well, but it’s not obvious:
- An architecture that scales. This architecture has proven to scale to hundreds of engineers working on the same codebase and apps with hundreds of RIBs.
I believe that RIBs as an architecture actually advocates for more separation than React Native developers might be used to.
The idea of an “Interactor” (the I in Rib) sounds like what Wix demonstrate in their architecture for scaling React Native, and I’ve heard of companies that have split their teams into separate concerns architect this out further. https://www.youtube.com/watch?v=IFaTQVH7elI
I attended the React Native London meetup back in 2019 and listened to this talk which details Zopa’s approach here: https://www.youtube.com/watch?v=K6secfFpl3Q
I have heard of a team that has a custom CodePush solution that allows feature teams to control releasing their modules into the app.
I’ve also seen Microsoft build on top of Redux libraries that suit this architecture:
Take a look at the product landscape for A/B testing: Optimizely Mobile, Apptimize or Firebase’s A/B testing with Remote config.
Now read what Uber has:
Broadly, we use four types of statistical methodologies: fixed horizon A/B/N tests (t-test, chi-squared, and rank-sum tests), sequential probability ratio tests (SPRT), causal inference tests (synthetic control and diff-in-diff tests), and continuous A/B/N tests using bandit algorithms (Thompson sampling, upper confidence bounds, and Bayesian optimization with contextual multi-armed-bandit tests, to name a few). We also apply block bootstrap and delta methods to estimate standard errors, as well as regression-based methods to measure bias correction when calculating the probability of type I and type II errors in our statistical analyses.
By really examining what is core and what is optional you can focus on driving 99.99% reliability for the core workflows by having:
Then you can feature flag the optionals.
Thanks for reading! If you have any comments, questions or feedback please get in contact. Have a nice Sunday.
I'm Henry Moulton, a software design and development freelancer living in London, UK.
My portfolio will be online soon.