Copyright - 2017 Ⓒ 475 Cumulus Ltd. All rights reserved.

  • Twitter - White Circle
  • LinkedIn - White Circle


62 Medinat Ha-yehudim st. Herzliya 4673312, Israel

WSGI Is Not Enough Anymore - Part III

December 12, 2017

In the second part of this series we discussed the concurrency model and the event driven architecture. In this part we will discuss various Python implementations and how they're used to build asynchronous web applications. We will focus primarily on Asyncio and cover some example code as a tutorial to this library. 

 

 

Leaving the comforts of the WSGI standard is not an easy thing to do. But if you're reading the third instalment of this series the assumption is that you think it may be worth it.

Fortunately, there are some great Python libraries that are structured, documented and well understood, that make it easier to write web applications and services on a technology stack that is not bound to the WSGI standard. 

 

In order to understand asynchronous web applications. we need to take a closer look at writing concurrent code. While most of this post will focus on Asyncio, which is Python's very own event loop and concurrency library, we will begin by mentioning some of the other options which preceded Asyncio. It goes without saying that the libraries mentioned in this post are all published as open-source projects.

 

Twisted

 

Twisted is an an event-driven networking library written for Python. It has been around since 2002 and had been used in countless projects. Its main focus is to provide asynchronous network operations for various network protocols such as TCP, IMAP and HTTP.  The basic concept is that methods are provided with a callback that is called when the execution of the handler is completed. If you've used JQuery before you'd feel right at home with Twisted, which includes concepts such as Deferreds and Promises. It was written originally for Python 2. At the time of this writing great portions of Twisted have been converted to Python 3 as well. This great PyCon talk argues the place of Twisted in the age of Asyncio. 

Twisted is a great option if you're bound to Python 2. The awesome Channels project for Django is based on Twisted. 

 

Gevent

 

Gevent is another library which implements an event-loop. Unlike Twisted, it is based on co-routines rather than callbacks.  It provides a higher level API on top of the Libevent library. Unlike Asyncio (which we will cover shortly) Gevent provides implicit concurrency rather than explicit concurrency. This means that Gevent coroutines are not decorated or marked explicitly as such. 

 

Tornado

 

Originally developed by FriendFeed which was later acquired by Facebook, Tornado is is an asynchronous network library as well as a web framework. It provides all the tools to build asynchronous web applications which support HTTP, TCP and Websockets protocols, making it a superb option for full-duplex web applications. Before Asynchio made it into the Python standard library, Tornado was considered the best option for writing high performance web applications in Python. It has its own HTML template library and can also work with Jinja. The internet is full of blog posts, articles, tutorials and books on how to use Tornado for various tasks which other web frameworks such as Django and Flask are not suited for. 

 

Asyncio 

 

Asyncio has become a part of the Python standard library since version 3.4. It does not exist for Python 3 previous versions or for Python 2, except as an additional package installation when using Python 3.3.

According to its documentation Asyncio:

 

"provides infrastructure for writing single-threaded concurrent code using coroutines, multiplexing I/O access over sockets and other resources, running network clients and servers, and other related primitives".

 

Looking deeper into the documentation can learn that Asyncio provides:

  1. An event loop - Used for writing single-threaded applications that execute an event loop rather than multiple threads

  2. Support for various networking protocols such as TCP, which is used for HTTP and Websockets 

  3. Synchronisation primitives such as locks and semaphores for co-routine synchronisation 

  4. Futures, which are the cornerstone of writing co-routines, in other words, functions that can run concurrently. 

The first thing to state is that, unlike Tornado, which was previously mentioned, Asyncio is not a web framework or a web server. It can be used to develop web servers and frameworks which utilize the concurrent model of programming. 

 

 

Let's look at the most simple example and break it down:

 

 

The simplest example

 

 

This code sample shows the greet method, which takes a name and prints it to the console as a greeting. The function declaration is preceded by the async keyword  which denotes that the method is not a standard method but rather as a co-routine. Calling a co-routine does not execute it. Rather, it always returns a Future object which needs to be pushed into the application's event loop. The event loop will execute all co-routines, in order of insertion, jumping from one to another when co-routines await on other co-routines. 

The rest of the code demonstrate the creation of an event loop, running the event loop until it is completed, and closing the event loop. 

Obviously, using an event loop for such a simple program makes no sense at all. The whole purpose of using an event loop is to execute things concurrently. This means that while one co-routine is waiting for something to happen, another co-routine is executed. Let's look at another example:

 

 

Calling multiple co-routines

 

This sample code implements the wait_for_stuff co-routine which:

  1. Keeps a variable of the start time

  2. Runs a for loop counting from 0 to 5 and calling the special asyncio.sleep method to wait 1 second for each iteration.

  3. Prints the time that elapsed between the start and finish of this co-routine

Note that we use a special version of the sleep function implemented by Asyncio, rather than the standard sleep method of Python. Asyncio implements sleep as a co-routine, rather than a normal function. Which means that the sleep function returns a future object, when called, that is fed into the event loop, without actually blocking the entire thread. 

 

In this example the call to sleep is preceded by the await keyword, which is new in Python 3.5,  and tells Python to push the future returned by this co-routine into the event loop's queue, execute it, and continue execution of the calling method after it is done.

Futures work differently than, say ajax calls in Javascript, in the sense that the co-routine is not supplied with a callback , but is rather awaited.The following lines of code are executed after the co-routine's execution has completed. This provides a much easier-to-follow syntax and program flow. Co-routines can be awaited only within other co-routines. In this example the top-level co-routine is fed explicitly into the event loop while it awaits other co-routines within its own code. 

 

When running the example code above, the script would print to the console that the total time elapsed between start to finish was 5 seconds. We know that the for loop had 5 iterations of sleep for 1 second each. This means that the for loop was running sequentially and not concurrently. Oh !!

 

Then how is this code different from a standard Python for loop that uses the standard sleep method?

Well, not by a lot. But there is a difference. The for loop executed sequentially and not concurrently. But the calling method was running on the event loop. If there were other co-routines invoked and running at the same time, this method would not be blocking the entire event loop for a total of 5 seconds, and rather would allow other co-routines to run simultaneously. 

 

Still, this code makes little sense so let us try to make a better version of it:

 

 

Calling multiple co-routines together

 

 

In this code example, the for loop appends all the futures into a Python list. Once again, co-routines always return future objects.

Instead of awaiting on each co-routine the code collects all futures and waits on them together by calling asyncio.wait passing the array of future objects as the argument. 

Unlike the previous example, here all sleep methods are executed concurrently and not sequentially. 

The result of this code is a print to the console stating the time elapsed was 1 second. Since all sleep methods waited for 1 second, and they all ran concurrently, then the total elapsed time is still 1 second. 

 

Now that we know we can run co-routines concurrently, let us take a look at a real-world example. 

 

 

A real-world example

 

This code example illustrates how to make concurrent http calls to a backend server. It uses Github's job API to search for open positions, using multiple keywords. We use Aiohttp, a library built on top of Asyncio that provides both client and web-server non-blocking components. 

 

The example code demonstrates how to send 3 distinct http requests to the jobs API with 3 different keywords. We chose to look for jobs involved with 3 programming languages: Python, Javascript and Clojure. 

If we were to write this code sequentially, perhaps by using Python's requests library, each http call would block the thread while requesting data from the server. The total amount of time it would take to fetch all 3 requests would be the sum of the time elapsed between each http request.

We can utilize concurrency and the non-blocking aiohttp client to invoke all 3 requests simultaneously, let them run concurrently, and gather all results into a single list. 

 

Similarly to the previous example, we make 3 calls to our fetch_job_opportunities co-routine without awaiting on each one. Then we call asyncio.gather passing the list of futures as the argument. The gather method works very similarly to the wait method we used in our previous example. Unlike wait, it returns the list of results for each of the co-routines. Note that we await only once when calling gather, thus awaiting on all co-routines. Gather returns only when all co-routines have completed. This is similar to Promise.all in Javascript. 

Since we made 3 http calls, we get a list of 3 results. We then flatten the 3 lists into one, and display the total number of jobs that were fetched. 

 

Using concurrency and co-routines really shines when the code involves IO-bound operations such as fetching data over the network. It allows us to simultaneously run multiple requests, possibly even to multiple resources, for fetching or manipulating data. 

 

 

Periodic Tasks

 

We've seen how to use co-routines and non-blocking code to improve the performance and parallelise some of our code execution. But there are other advantages to using concurrency. 

Let us take a look at another example where we use co-routines to schedule periodic tasks. The following code can be used to periodically check that a given website is active and responding. We use the HTTP Head method to 'ping' the Google website and print to the console the response and the time. HTTP Head responds with status 200 when the request was successful. 

 

 

Let us point out the important bits of this code snippet:

 

  1. The check_status_periodically is a co-routine which implements a continuous loop that will only stop when the running flag is explicitly set to False. Inside the continuous loop we send an HTTP Head request to google and print the response status and the current time. We then pause the loop for 2 seconds by calling sleep

  2. Unlike previous examples, we call loop.run_forever rather than loop.run_until_complete to start our event loop. That is because the script needs to run continuously until it is asked to stop. run_forever is a blocking call, and no code following that will ever be executed. run_forever is especially useful when we call additional co-routines after the event loop was started. The most obvious case for this is a web-server. 

  3. When we previously used run_until_complete we passed, as an argument, the return value of the co-routine we wanted executed, which is a future. As run_forever never returns, it is not bound to a given set of co-routines. Therefore, we must feed the co-routine into the event loop before we start it. For that we call asyncio.ensure_future  passing the future returned by calling check_status_periodically. We cannot use await on this co-routine because the event loop hasn't started yet. ensure_future adds the future object to the event loop's queue. Once the event loop was started, by calling run_forever, the co-routine would be executed. ensure_future is a great way to add futures into the event loop's queue, making sure they get executed without actually waiting for the co-routine to complete. 

  4. Stopping the continuous loop is done by setting the running flag to false. We do this by intercepting the keyboard interrupt ctrl+c. However, its possible that when we set running to false, there are still futures waiting to be executed in the event loop's queue. In order to provide a graceful shutdown, we collect all remaining futures in the event loop's queue by calling asyncio.Task.all_tasks() . We then gather them and wait until they're completed before closing the loop. By running this code in the console we can see that, depending on the time we hit ctrl+c, it may take up to 2 seconds for the script to finish. That is the time we put the periodic check to sleep in between http calls. 

 

It is important to state that this poor-man's example of tasks running periodically and asynchronously, is by no means a replacement for production quality distributed task tools such as Celery

 

 

 

Summary

 

The code examples in this post were provided as a tutorial for how Asyncio operates and what are its capabilities. It was done outside the context of web applications to understand better how co-routines are used. In the next instalement of this series of posts we will learn about libraries which implement HTTP and Websockets servers, using Asyncio.

 

 

 

 

 

 

 

 

 

 

Share on Facebook
Share on Twitter
Please reload