That is half 2 of the Testing at scale sequence of articles the place we requested business specialists to share their testing methods. On this article, Ryan Harter, Workers Engineer at Dropbox, shares how the form of Dropbox’s testing pyramid modified over time, and what instruments they use to get well timed suggestions.
With a couple of billion downloads, the Dropbox app for Android has to take care of a top quality bar for a various set of use instances and customers. With lower than 30 Android engineers, handbook testing and #yolo isn’t sufficient to take care of confidence in our codebase, so we make use of a wide range of totally different testing methods to make sure we will frequently serve our customers wants.
Since Dropbox makes it simple to entry your recordsdata throughout your whole gadgets, the Android app has to help viewing as a lot of these recordsdata as potential, together with media recordsdata, paperwork, pictures, and the entire variations inside these classes. Moreover, options like Digital camera Uploads, which routinely backs up your whole most vital pictures, require deep integration with the Android OS in ways in which have modified considerably through the years and throughout Android variations. All of this wants to repeatedly work for our customers, with out them having to fret in regards to the complexity, as a result of the very last thing anybody needs is to fret that they could lose their information.
Whereas the dimensions and distribution of the Android crew at Dropbox has modified all through the years, it’s crucial that we’re in a position to constantly construct and refine options throughout the app whereas sustaining the extent of belief from our customers that we’ve turn out to be recognized for. To assist underscore how Dropbox has been in a position to foster that belief, I’d wish to share some ways in which our testing methods have modified through the years.
Whereas automated testing has at all times been an vital a part of engineering tradition at Dropbox, it hasn’t at all times been simple on Android. Years in the past Dropbox invested in testing infrastructure that leaned closely on Finish-to-Finish (E2E) testing. Constructed on Android’s instrumentation checks, we developed check helpers for options within the app following the check robotic sample. This enabled a big suite of checks to be created that would simulate a consumer shifting all through the app, however got here with its personal important prices.
Like many Android tasks on the time, the Dropbox app began out as a monolithic app module, however that wasn’t sustainable in the long term. Work was carried out to decompose the monolith right into a extra modular structure, however the E2E check suite wasn’t prioritized on this effort because of the advanced interaction of dependencies. This left our E2E check suite as a monolith of its personal, leading to check code that didn’t exist alongside the function code it exercised, permitting them to simply be ignored and turn out to be outdated.
Moreover, the lengthy construct occasions that include monolithic modules with many dependencies combined with the checks being executed on emulators in our customized steady integration (CI) atmosphere meant that the suggestions cycle for these E2E checks was sluggish. This resulted in engineers feeling incentivised to take away failing checks as a substitute of updating them.
Because the Android ecosystem embraced automated testing increasingly more, with the introduction of useful libraries like Espresso, Robolectric, and help for unit testing constructed immediately into Gradle, Dropbox stored up with these adjustments by shifting from the heavy reliance on E2E checks in the direction of increasingly more unit checks, filling out the underside layer of the beforehand inverted testing pyramid. This was a big win for check protection throughout the app, and allowed us to roll out high quality assurance practices like code protection baselines, to make sure that we frequently improved the reliability of the product because it moved ahead.
Over time, as unit testing turned simpler and simpler and engineers turned increasingly more annoyed with the sluggish suggestions cycles of E2E checks, our testing pyramid turned lopsided within the different path. We had confidence in our unit checks and the infrastructure supporting them, however our E2E checks aged with out a lot help, changing into increasingly more unreliable, to the purpose that we largely ignored their failures. Checks that may’t be trusted find yourself changing into a upkeep burden and supply little worth, so we acknowledged that one thing wanted to alter.
Over the previous yr we’ve doubled down on our deal with reliability. We’ve invested in our check infrastructure to make sure that engineers aren’t solely in a position to, however incentivised to put in writing invaluable checks throughout all layers of the testing pyramid. Along with technical funding in code and tooling, that has additionally required that we take the time to judge the issues we check, and the way we check them, and ensure your entire crew has a greater understanding of which instruments to make use of when.
Unit testing
We proceed to spend most of our efforts writing unit checks. These are quick, targeted checks that present fast suggestions, and function our first line of protection in opposition to regressions. We write JUnit checks every time we will, and fall again to instrumentation checks when we have to. Robolectric’s interoperability with AndroidX Check has allowed us to maneuver a lot of our instrumentation checks to JVM-based unit checks, making it even simpler to fulfill our check protection targets.
Talking of check protection targets, the unit testing layer is the solely layer that we use to find out our code protection. By default we goal 80% check protection, although now we have a course of to override this goal for circumstances by which unit testing is both not invaluable, or infeasible.
- Be aware: Whereas we use customary JaCoCo tooling to judge our check protection, its lack of deep understanding of Kotlin presents some challenges. As an example, we haven’t but discovered a method to inform JaCoCo that the generated accessors, toString and hashcode of behaviorless information courses don’t require check protection. We’ve been experimenting and contemplating options to make sure that we’re not writing brittle checks that don’t present worth, however for now we’re caught with issuing protection overrides for these instances.
E2E testing
Over the previous a number of months we’ve been renewing funding in our automated E2E check suite. This check suite is ready to alert us to extraordinarily vital points that unit checks merely can’t establish, like OS integration points or surprising API responses. Subsequently we’ve labored arduous to enhance our infrastructure to make checks simpler for engineers to run domestically, we’ve audited and eliminated flaky or invalid checks, and labored on documentation and coaching to make sure that we help our engineers within the creation and upkeep of our E2E check suite.
Change in E2E check counts earlier than and after check suite enchancment effort.
As I discussed above, our E2E checks simulate a consumer shifting all through the app. Which means the duty of defining our E2E check instances is greater than merely an engineering downside. Subsequently, we developed steering to assist engineers work with product and design companions to outline check instances that characterize true use instances.
We just lately launched a follow of utilizing a correct Definition of Achieved for improvement work. This quantities to a guidelines of things that have to be accomplished to ensure that a venture to be thought of “carried out”, which is outlined and agreed upon at first of the venture. Our customary guidelines contains the declaration of E2E check instances for the venture, which ensures that we’re including check instances in a considerate method, considering the worth and goal of these checks, as a substitute of focusing on arbitrary protection numbers.
Screenshot testing
One other dimension of our checks that we’ve ramped up lately is screenshot testing. Screenshot checks enable us to validate in opposition to visible regressions, making certain that views render correctly in mild and darkish mode, totally different orientations, and totally different type components.
In unit checks we leverage Paparazzi for screenshot testing. This enables us to put in writing quick, remoted checks and we discover it’s greatest fitted to testing particular person view or composable layouts, together with our design system parts.
We additionally discover worth executing screenshot checks in additional full featured instrumentation checks. For this, we use our personal Dropshots library, which helps screenshot testing on gadgets and emulators. Since Dropshots executes screenshot checks on actual (or emulated) gadgets, it’s an effective way to validate system integrations like edge-to-edge show, the default window mode on Android 15 gadgets.
Guide testing
With the entire funding we’ve made into automated testing you’d be forgiven for considering that we do no handbook testing, however even in the present day that’s merely not possible. There are a lot of workflows for which automated checks would both be too arduous to put in writing, or too arduous to validate. For instance, now we have each unit and E2E checks to validate that the app behaves appropriately when rendering file content material, however it may be arduous to programmatically validate file content material, and screenshot checks can generally show too flaky.
For these instances, we use an internet primarily based check case administration device to take care of an entire set of handbook check instances, and a 3rd celebration testing service to execute the checks prior to every launch. This enables us to catch points for which we haven’t but written checks, or which require human judgement.
Testing has confirmed invaluable in figuring out high quality points earlier than they make it to customers, permitting us to earn our buyer’s belief. Provided that worth, we intend to proceed investing in testing to make sure that we will proceed to take care of top quality and reliability. There are some things that we’re wanting ahead to sooner or later.
I’m presently within the strategy of increasing the performance of Dropshots to help a number of gadget configurations, which is able to enable us to carry out screenshot checks throughout a broad vary of gadgets with a single set of checks. For the reason that Dropbox app works throughout many various type components, it is going to be invaluable for us to concurrently run our screenshot check suite on a wide range of gadgets or emulators to forestall regressions on much less frequent type components.
Moreover, we’re starting to experiment with Compose Preview Screenshot Testing, which permits our Compose Preview capabilities to serve double obligation by rushing up improvement cycles whereas additionally getting used to guard in opposition to regressions.
Lastly, we intend to proceed making certain that now we have an excellent stability of the precise sorts of checks. Balancing our testing pyramid to make sure that our funding in testing serves our reliability targets as a substitute of chasing arbitrary protection targets. We’ve already seen the worth {that a} wholesome check suite can present, and we’ll proceed investing on this space to make sure that we proceed to be worthy of belief.
No Comment! Be the first one.