Abstractions and Systems -- Part 3

Published on 2020-06-27

Content

Intro

In Part 2 of this series, we discussed how to model abstract systems, and what are some high level tradeoffs of adding an abstraction.

Now let’s go through examples of actual systems that utilize this concept.

Examples

Object Oriented Programming (OOP)

A typical example would be a farmer who has a list of fruits. We have a Farmer class which composes of a list of Fruit class. The Fruit class has a type (as enum) representing for example, a mango, banana, apple, etc.

Problem with this: Each time you add a new fruit type, you have to touch the fruitType enum as well as the cost function. The cost function would if/elif type code to return an action based on a fruitType. This option won’t scale (as more fruitTypes) are added, and break’s Single Responsibility Principle.

Solution: Add fruit abstraction (aka interface):

Compilers (LLVM IR)

Compilers convert data/code from one form (source) to another (destination) e.g. source code to binary code (that CPU can execute via kernel).

The term front-ends in compilers is used to represent the sources, and back-ends are destinations. So C++, Rust, Golang can be front-ends, and x86-64, ARM, MIPS, can be back-ends.

If every front-end were to support every back-end, the conversion combinations would look like this:

Problem: Let’s say we have M front-ends and N back-ends. The total number of combinations (conversions we’ll need to implement) would be M x N.

Solution: Have an abstract representation (called Intermediate Representation or IR in compiler language). That way, the total combinations would be M + N, and adding a front-end or back-end would be less costly each time. You can see now we have less arrows:

This is what LLVM also does.

Platform centric web services

In a service oriented architecture, you typically have a bunch of micro-services that talk to each other (via some API protocol like RPC, REST, GraphQL, etc.). The communication model is typically that of a client-server.

So let’s look at an example:

Here C1, C2, C3 are clients. Each clients talks to the App Server.

Problem: C1 wants feature set 1, C2 wants feature set 2, etc. Your app has to accommodate all features. There is strong coupling between client and server here. Server has to know about client specific requirements in order to serve requests. This is bad, and won’t scale as number of clients and their feature requests increase. Because given that the server will be maintained likely by a few engineers, they can’t possibly communicate to all clients.

Solution: Let’s think in a platform centric way, and have an abstract platform layer in between. That layer decouples client and servers. Server only has platform features not client features. Each time a client wants to onboard to use the server, they must utilize the platform feature. This means that inherently, client product features must map to platform features. If clients want new features, well then engineers craft a platform feature carefully such that it’s applicable to not only that client, but other clients as well. It’s more simple and maintainable. Here’s is how it will look like:

Log abstraction in Kafka

There is a brilliant post by Jay Kreps about The Log. This abstraction is used in a lot of databases, and also Kafka, a distributed streaming platform.

Here is an image¹ that illustrates what a messaging streaming system would look like without a central abstraction:

Problem: Similar to the compiler example above, for M sources and N destinations, the total number of combinations we’ll need to implement are M x N.

Solution: Instead, we can use a central abstraction, a Unified Log. Amongst other things that Jay discusses in the blog post above, this reduces the combinations we’ll need to implement to M + N, which is much simpler than before. Here is the new image¹:

Linux Kernel

Linux has several abstractions such as:

All these help in solving similar problems that we discussed above.

Taken from https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying ↩︎