A New Look At End-to-End Testing - Polymorphic and Fast
At the end of this post, there is a list of reasons why to work with end-to-end tests, but first please consider the post's idea on its own. After that, I'm glad to have discussions about alternative/better solutions to the described (or omitted) contexts.
Please, read it first ;)
Terms used
The below points are not intended to be full blown definitions, but rather pointers.
- End-to-End testing - as Nat Pryce has said in one of his presentations, the ends are always farther apart than one thinks they are. The purpose is to execute the tests through as much of the application stack as possible - from the front end at least till the storage mechanism.
- Polymorphism - I use it here mostly in the way demonstrated by the Liskov Substitution Principle - code written against an abstraction should be unaffected regardless of the concrete implementation of that abstraction is given to it.
Polymorphic tests - we are already doing it
Some of us already run the same tests on top of different code - we can have multiple build platforms (x86 and x64, Windows and Linux, Python 2 and 3, etc.), multiple configurations (SQLite and PostgreSQL), as well as multiple versions of our dependent libraries (stable, latest release, and latest). What's common though among these scenarios is that the polymorphism happens largely outside of our codebase, and we don't have to think much about it when writing tests.
An example of executing multiple drivers inside our code is the use of Selenium tests - the same tests are run against Chrome, Firefox, etc. While each of the drivers is testing on the same level (Web UI), the actual browser drivers have different implementations, exposed via a common abstraction level - DOM selectors and event invocations.
Of course, most test code uses some level of abstraction to separate the test logic from the actual page implementations.
Abstractions - Page objects
The Page Object pattern is used to help creating maintainable tests.
Instead of writing tests coupled to the implementations (go to this
concrete url, wait N
seconds for it to load, find and select the form
elements for username and password, etc.), these implementation details
are hidden behind well named methods (e.g.: open_login_form
,
login_with_credentials
, etc.), and thus are domain (client) friendly
and readable. And Page Objects can be composed together to build
Application Objects.
Similar abstraction is used by the various Acceptance Testing tools, such as FitNesse, Cucumber, and the other Gherkin tools - the spec texts contain terms and values important for the business domain, and there is separate code translating the spec's values and terms to call into the application and transforming its state into a format expected by the tool.
Stripped down tests - only the script
As seen above, the AT tools separate application logic from the test scenario's description.
Assertions have also been separated from test cases - either by developer
choice, choosing to use a separate Assertions library like Hamcrest,
instead of the unit testing library's own assertFoo
methods), or
explicitly (Mocha ships without an assertions library).
Thus tests can really be focused just on the scenario being tested.
Fast tests
The single biggest disadvantage of end to end tests is their speed. They are slow. And the more of them there are, the slower they are.
This is one reason why the Test Pyramid recommends not having too many of them. Many architectural approaches (hexagonal, DDD, etc.) suggest keeping a lightweight core application, and to attach the persistence and UI layers to it at its boundaries, leaving these ports and adapters lightweight too. Most of the testing then happens against the core, dependency independent code, making the tests fast.
Fast end to end tests
Drumroll... we'll do a bit of cheating, of course.
Not all the tests have to run every single time. Performance tests are usually not done when TDDing - that kicks in either later in the deployment pipeline, or runs daily. Teams organize their tests into fast, smoke, and slow suites. Locally (and as the first step in the build process) only the fast and smoke tests are run.
Putting all the above together means that writing systems with two self- contained cores (the app domain itself and the test scenarios) easily lends itself to end-to-end testing that can be run on multiple configurations, to give confidence that the app works with all its components and dependencies in production - yet enable fast feedback required for developers. The same tests can be run:
- directly against the core application with mocks, stubs, etc.
- through the app's (http) UI via the given frameworks/libraries testing
tools (e.g.:
django.test.client.Client
) with an in-memory database - through selenium against the full stack
And of course, we can mix and match - selenium against SQLite, etc.
While TDDing, one can run the tests only against the fast core, after that is complete, we can run the relevant tests with the end-to-end driver, fix any mistakes that occur, check in, and let the build server run all the integrated tests (using existing build practices to achieve speed)!
In Which Contexts Could It Make Sense?
Thank you for reading this far - assuming you didn't just scroll ahead :)
The below list is by no means exhaustive, and as mentioned in the introduction, there might be alternative approaches (please, let me know!) - it's not a coincidence this blog is called "Exercises in public learning"!
With that out of the way, here are some contexts where this approach could make sense:
-
Working with a team where the skills both for testing and for writing good code are (yet) missing (Chaotic team phase).
As the joke goes, the only way to eat the elephant is one bite at a time. Same goes for learning - people can be overwhelmed to make the mental jump from manual to automated testing - throwing in good programming practices can be too much.
Getting started with end to end tests that have decoupled driver methods (even if on the
TestCase
class itself) is a great start - by the time the tests become slow, if the team is bought into the idea of automated testing, it can be refactored towards a core domain - and inside that domain there still doesn't have to be proper clean code (one step at a time).In brief: for slow, gradual improvement.
-
The app actually has multiple interfaces for the same thing.
It can be due to A/B testing, or simply to accommodate the different needs of different users (e.g.: for a webshop - there is the public shop, the internal UI geared at the company's sales people, and the API), multi-platform application (e.g.: mobile and desktop web, iOS and Android), etc.
If you test the checkout process end-to-end, then running the same set of tests against each UI makes sense too - a single set of tests to maintain and you know immediately whether all features work across all the views.
-
Catching unexpected bugs.
There is a class of bugs that can be caught by rigor, but I do slip occasionally, ending up in a place where the unit tests are all green, but the application itself doesn't actually work.
Some real life such bugs I have run into:
- forgetting to place the actual input element on the page
- encoding-persistence issues - an utf-8 database with a column that is windows-1250 encoded is ... unexpected
- synchronizing data with another database where after the required mappings it turned out said other database truncates our data
All of the above can be addressed retroactively via adding targeted tests for that specific integration point, but if we are already testing the corner cases (length, encoding, etc.) in our code, it is nicer not to have to learn about these "unknown unknowns" from production problems...
-
Finally some related posts from other people:
- Ayende has multiple posts: on swapping out the infrastructure, separating assertions from tests, and about which tests add value in his opinion
- Sebastien Lambla on Vertical Testing
There is much more to be said about other benefits of end to end testing, but this post is already too long, so that will have to wait for another post (while waiting, you can read Jason Gorman's 101 Uses for Polymorphic testing)!
P.S.: I would like to thank (in (first name based) alphabetical order): Arjan Molenaar, Cirilo Wortel, Douglas Squirrel, Jeffrey Frederick, Kishen Simbhoedatpanday, Marco Emrich, Michael Feathers, and of course my colleagues at Paessler AG - I much appreciate that you all listened to me while I tried to figure out how to explain this and gave feedback both about the content and the format (*). Thank you!
(*) just to be crystal clear, this does not mean they endorsed it, just that they listened and gave feedback!