Tradeoffs are mathematically inevitable in software design
In this article we’ll demonstrate why there’s no such thing as a perfect software system. We’ll discuss the iron triangle, the CAP Theorem, and other “pick two” tradeoffs to illustrate why every design benefit implies a cost. Armed with this knowledge we can guide software teams to think in practical tradeoffs and avoid chasing unattainable ideals.
Time, features, quality — pick two
The iron triangle of software development has three vertices: time, features, and quality. In practice we can optimize for any two of these vertices, but never for all three. For instance if we have a short schedule and lots of features, quality will suffer. Or, if we wish to deliver lots of features at high quality, schedule will suffer. And if we wish to deliver high quality on a short schedule, we must reduce the feature set. We can therefore think of these three possible tradeoffs as choosing exactly one edge of the iron triangle (Fig. 1).
In practice, developers find the iron triangle to be true; but is there a deeper logic to it? There is. Remember rate, distance, and time from basic algebra? It gives us an intuition for what drives the iron triangle:
rate = distance / time
By way of example if you travel a distance of 60 miles in 1 hour, your rate is 60 miles-per-hour. If we squint hard enough we can think of rate as quality, distance as features, and time as itself. Since shorter schedules are inversely proportional to quality, we get the following equation:
defects = features / time
Therefore given a fixed number of features, if we shorten the schedule (i.e. decrease time) then defects will increase.
In light of these equations it becomes clear why designers can choose just two of the three vertices in the iron triangle: once two variables in a trivariate equation are fixed, so is the third.
As we’ll see below, such “pick two” patterns occurs throughout software engineering. In fact, they occur throughout engineering. We could just as easily have used Ohm’s Law — instead of rate, distance, and time — to model the iron triangle.
Consistency, availability, partition tolerance — pick two
The CAP Theorem demonstrates that a database can have no more than two of the following properties: consistency, availability, and partition tolerance. Here again we can understand the proof of the CAP Theorem as a model in three dependent variables. If we set any two of the variables, there remains but one solution for the third variable.
The CAP Theorem has been extensively treated elsewhere. We mention it only as an example of a “pick two” tradeoff. For an illustrated proof of the CAP Theorem, see Michael Whittaker’s proof. For an applied tour of the CAP theorem in practice—and to understand why it’s ultimately more subtle than “pick two”—See Eric Brewer’s Spanner paper.
Flexibility, speed, and scale — pick two
Kurt Brown, formerly Netflix’s Data Platform Director, observes that database systems tend to display at most two of the following characteristics: flexibility, speed, and scale.
Anyone who’s tried to scale MySQL, make Hive run faster, or execute complex queries over Cassandra has experienced the FLESS Theorem. To be clear, FLESS is only a theorem over the definitions given below. Similarly, CAP is a theorem over its definitions of consistency, availability, and partition tolerance. Rate = distance / time is a definition, and defects = features / quality is simply a model. Given that FLESS’s definitions of flexibility, speed, and scale are open for debate, it’s more accurate to think of FLESS as a model; so we’ll call it the FLESS triangle from now on.
Now let’s see if we can sharpen our intuition for why the FLESS triangle holds in practice.
Modeling FLESS
Let’s model a database system as a graph with N nodes, where N represents the scale of the system. The total execution time is a function of the execution of the nodes plus the coordination required to produce the final result. Further assume that a flexible database pays for its flexibility in the form of an execution time that is greater than or equal to a constant, K.
We say that a system has speed if it returns a result in a time less than or equal to K.
At a high level we can infer that total execution time is a function of scale and flexibility:
time = scale * flexibility
Now suppose that we have a flexible, scalable database, D. Since D is flexible it requires an execution time greater than or equal to K. Since D is scalable it requires a coordination time greater than zero. Therefore D has a total execution time greater than K and, by definition, is not fast. Similarly, a flexible system that is fast will scale poorly. And a fast system that scales will have poor flexibility.
Conclusion
Every design benefit implies an inevitable cost. We can model such tradeoffs as trivariate equations that force designers to optimize two, and only two, of the variables. Said another way, good design implies good compromises. By accepting that there is no free lunch we can choose which design characteristics we need, and which we’re willing to sacrifice.
Deciding what not to do is as important as deciding what to do. —Steve Jobs
Postscript
I am looking for deeper, more accurate mathematical models of the “pick two” tradeoffs. If you have ideas please comment.