Few Reactive Programming Concepts

Praveen Ray
7 min readJun 1, 2021

--

Reactive Pipeline is like Gears

I recently started working with Quarkus and one of the first concepts you encounter is their emphasis on Reactive way of doing things. This stems from the fact that Quarkus is built on top of Vertx and Vertx forces you to be have an event driven paradigm which in-turn leads to Reactive Programming concepts.

Reactive programming is still relatively new concept for many programmers out there and it's benefits and pitfalls are not well understood. Switching from imperative programming to Reactive requires a mind shift and it's not an easy exercise. To compound the issue, there are many concepts that overlap and seem confusingly related and sometimes contradictory:

  • Event driven framework
  • Callbacks based coding
  • Event Loop and I/O thread
  • ThreadPools vs Single Threaded execution
  • Futures based code
  • Reactive programming
  • Non-Blocking APIs
  • Coroutines
  • await and async

Try googling some of these and you’d face a somewhat contradictory picture soon. One must utilize all the cores of modern CPUs. Only way to do that is to use Threads. But Threads cause performance issues due to excessive context switching. One must use a single threaded EventLoop based system to gain performance! Dig slightly deeper and you start encountering (worker) ThreadPools hidden inside (single threaded)event loop frameworks! Combine these with Reactive concepts which immediately advise you to not use Blocking APIs in their pipeline and prefer non-blocking APIs or risk freezes in your execution pipeline!

If this is not confusing, I don’t know what is!

Let’s see if we can build a somewhat sane story around all of these concepts. I’m referring to JDK but concepts apply to any other platform as well.

All of these are trying to achieve high performance with lower cost. The premise is to fully utilize each core in your computer by not keeping a core idle at any time.

CPU Cores only work with Threads. There are hardware threads which are mapped to OS threads but the basic idea is that there is a “Thread” per core and if you need to get a CPU Core to execute code, you must first get access to it’s corresponding Thread and schedule your code to this Thread.

So, as you can see, Threads are fundamental unit of concurrency and can not be avoided.

Moving up the application stack. The OS threads are generally available in JDK and your code can be bundled as a Unit of work (called Runnable) . You simply create a ‘Thread Object’ in Java and give it your Runnable. JDK works with the OS to map this thread to the underlying OS/hardware Thread to execute your Runnable. You have no control over when your Runnable will get executed.

This works great until performance goals become stringent. You are informed that ‘Creating a JDK Thread’ and ‘mapping it to the OS thread’ is too much of an overhead. ThreadPools are introduced — ThreadPools simply pre-create these Threads and your job now is to simply submit your Runnables to the thread pool — avoiding the cost of creating threads on-demand. But when your Runnable starts to wait on a Blocking call, the bookkeeping cost of Thread Suspension and Resumption still needs to be paid. Only gain with ThreadPool is not paying the price of creating and destroying Threads continuously.

Going further on performance scale. Switching Runnables across threads in a ThreadPool still involves bookkeeping and is an overhead that can be optimized. Scheduling JDK threads onto CPU threads involves more book-keeping that should also be avoided. And any data shared between threads must be properly guarded with locks — leading to code complexity and more bookkeeping delays associated with lock management.

Facing all of this complexity, folks decided to go back to first principles and ditched Threads altogether! If we could get one thread to do most of our bidding and keep this (one) thread running always and there would be no need to pay the price of context switching and locks. Single threaded ‘Event Loops’ were invented (thanks nodejs!) where there is only one thread and all units of work is submitted to this one thread (called I/O thread, main thread or application thread). The I/O thread keeps a queue of ‘work to be done’ and simply runs through this queue infinitely.

This leads to the obvious question:

  • How do we achieve parallelism with only one Thread? Are we not regressing back to early 90s when PCs had only one CPU and one thread of Execution?

The answer is twofold:

  • Squeeze out every bit of performance from this one I/O thread. Make it so it’s never waiting for anything and is always executing some piece of useful code. Something as simple as to reading a byte from a file could lead to delays (even milliseconds are too much of a delay for today’s fast CPUs). You don’t want your (only) thread to be ‘waiting’ on file bytes to become available from the (slow moving) harddisk. A new set of ‘non-blocking’ APIs were invented to avoid this kind of delays. Non-Blocking File Read API, for example, doesn’t make the caller wait until disk has spun — it takes a function that it calls back when data from disk is available — freeing the caller to perform other tasks. This is similar to taking a number at your local deli counter and continuing with other work. The deli counter calls out your number when it’s ready.
  • Bring back a ‘controlled’ ThreadPool. Pre-create this Threadpool. Let your main I/O thread perform fast and non-blocking operations and if you must perform an Blocking action (because it’s non-blocking variant is not yet available), offload this to the ThreadPool. This way your main thread is always performing useful action and all ‘waiting’ happens on a ThreadPool. There is still context switching but that’s the price you pay for using Blocking APIs.

All other concepts (callbacks, futures, coroutines etc) are related to these two broad ideas.

  • Callbacks: This is how non-blocking APIs are implemented. If you are not going to wait for that file byte to become available, how does the JDK transfer that byte to your code? Answer is to supply a callback function to the non-blocking File API which is called back by the JDK when it has fetched that byte from the disk.
  • Callbacks lead to callback hell making code hard to read/maintain
  • Callback hell is avoided by making use of Reactive programming paradigm. Reactive libraries let you deal with non-blocking APIs with callbacks in a more structured way. Reactive programming literature also
    talks about being 'event-driven'. However, in my opinion, event-driven is simply a variant of getting results out of APIs. An API returning results can simply be viewed as emitting 'event'. It's helpful to think of Reactive model as being a way to deal with Results of APIs - the results can come from Blocking or Non-Blocking APIs.
  • Futures were invented before Reactive Libraries became popular. Futures are used with Blocking APIs and ThreadPools. There isn’t a whole lot of overlap between Futures and Reactive. Let's say you offloaded 10 units of work on 10 different threads to a ThreadPool. Meanwhile, your I/O thread needs the result of all of these 10 units of work to make progress. It needs to 'wait' for these 10 threads to finish. Such 'controlled' waiting can be achieved by using Futures. In other words. Futures let you offload a parallel unit of work to a thread and make progress on main thread doing something else. The main thread can 'wait' on the Future only when it absolutely needs the data from other unit of work and cannot make any progress otherwise.
  • Async/Await is a variant of the callback based system. Instead of supplying a callback to be called when Blocking API is ready with results, some languages provide you with 'async/await' keywords. Let's assume following sequence:
    - perform action blocking action A
    - use results from A
    - perform action blocking action B
    This code is executing on Thread A (which can be a I/O thread or any other thread in a threadpool). Normally when action A is encountered, your thread would be suspended. This would mean performing time-consuming book-keeping for suspension and resuming of the thread. There are two ways to prevent this suspension. Either change A to use a non-blocking call with a callback or use await if your language supports it.
    Upon seeing await, action A would be put on a queue to be executed later, but your main thread would not be suspended. It would simply be given some other chunk of work (by the framework - JVM or CLR for example). When results become available from blocking action A, the execution would move onto action B. Note that action B can happen in a different thread than action A. Action B can rely on the fact that A has completed (just like in single threaded world) - however, all the callback mechanism is taken care of by the framework. The only caveat being A and B can execute of different Threads.
  • Coroutines are similar to async/await in the sense that coroutines provide an abstraction to hide callbacks. Just like async/await, a coroutine gets ‘suspended’ at specific points (e.g. when a suspend function is called in kotlin). When suspended function returns, the coroutine starts running again. So, many coroutines can execute concurrently assuming they all suspend themselves at some point (to give others a chance). This is what makes them ‘co-operative’.
  • And Finally, the Reactive programming style fits into the callback non-blocking paradigm by providing yet another way to deal with nested callbacks. You tell the Reactive Library what to do next when a non-blocking callback returns by setting up a Reactive ‘Pipeline’. The difference between callbacks and Pipeline is subtle. Instead of the programmer wiring a callback function to a Non-Blocking API, the programmer tells what to do next with the callback’s results. The Reactive library makes the Non-Blocking API call, handles callback and when callback returns, it calls the ‘Next’ step in Reactive Pipeline that was setup by the Programmer.

So, which approach should you use for concurrent programming? In my opinion, most of us do not have to perform at Google scale. Pick simplicity and ease of programming over complex solutions. Error handling and debugging using any of the non-blocking approaches could get challenging very fast. Stick with ThreadPools if it serves your purpose. Only pay the price of going Reactive if you must squeeze every bit of performance from your code and hardware.

--

--