Why Eliot?¶

Suppose we turn from outside estimates of a man, to wonder, with keener interest, what is the report of his own consciousness about his doings or capacity: with what hindrances he is carrying on his daily labors; what fading of hopes, or what deeper fixity of self-delusion the years are marking off within him; and with what spirit he wrestles against universal pressure, which will one day be too heavy for him, and bring his heart to its final pause.

— George Eliot, Middlemarch

The log messages generated by a piece of software tell a story: what, where, when, even why and how if you’re lucky. The readers of this story are more often than not other programs: monitoring systems, performance tools, or just filtering the messages down to something a human can actually comprehend. Unfortunately the output of most logging systems is ill-suited to being read by programs. Even worse, most logging systems omit critical information that both humans and their programs need.

Problem #1: Text is hard to search¶

Let’s say you want to find all the log messages about a specific person. A first pass of log messages might look like this:

Sir James Chettam was going to dine at the Grange to-day with another gentleman whom the girls had never seen, and about whom Dorothea felt some venerating expectation. … If Miss Brooke ever attained perfect meekness, it would not be for lack of inward fire.

You could do a text search for log messages containing the text “Dorothea”, but this is likely to fail for some types of searches. You might want to searching for actions involving dinner, but then you would need to search for “dine” and “dinner” and perhaps other words well. A library like structlog that can generate structured log messages will solve this first problem. You could define a “person” field in your messages and then you can search for all messages where person == "Dorothea" as well as other structured queries.

Problem #2: Referring to Entities¶

Every time a log message is written out you need to decide how to refer to the objects being logged. In the messages we saw above “Dorothea” and “Miss Brooke” are in fact different identifiers for the same person. Having structured messages doesn’t help us find all messages about a specific entity if the object is referred to inconsistently. What you need is infrastructure for converting specific kinds of objects in your code to fields in your structured log messages. Then you can just say “log a message that refers to this Person” and that reusable code will make sure the correct identifier is generated.

Problem #3: Actions¶

Most log messages in your program are going to involve actions:

Not long after that dinner-party she had become Mrs. Casaubon, and was on her way to Rome.

A marriage has a beginning and eventually an end. The end may be successful, presuming “until death do us part” is a form of success, or a failure. The same is true of all actions, much like function calls in Python are started and eventually return a result or throw an exception. Actions may of course span multiple function calls or extended periods of time.

Actions also generate other actions: a marriage leads to a trip to Rome, the trip to Rome might lead to a visit to the Vatican Museum, and so on. Other unrelated actions are occurring at the same time, resulting in a forest of actions, with root actions that grow a tree of child actions.

You might want to trace an action from beginning to end, e.g. to measure how long it took to run. You might want to know what high-level action caused a particular unexpected low-level action. You might want to know what actions a specific entity was involved with. None of these are possible in most logging systems since they have no concept of actions to begin with.

Problem #4: Cross-Process Actions¶

A story may involve many characters in many places at many times. The novelist has the power to present the internal consciousness of not just one character but many: their ways of thinking, their different perceptions of reality.

Similarly, actions in a distributed system may span multiple processes. An incoming request to one server may cause a ripple of effects reaching many other processes; the logs from a single process in isolation are insufficient to understand what happened and why.

The Solution: Eliot¶

Eliot is designed to solve all of these problems. For simplicity’s sake this example focuses on problems 1 and 3; problem 2 is covered by the type system and problem 4 by cross-process actions.

from sys import stdout
from eliot import Message, to_file
to_file(stdout)


class Place(object):
    def __init__(self, name, contained=()):
        self.name = name
        self.contained = contained

    def visited(self, people):
        Message.log(message_type="visited",
                    people=people, place=self.name)
        for thing in self.contained:
            thing.visited(people)


def honeymoon(family, destination):
    Message.log(message_type="honeymoon", people=family)
    destination.visited(family)


honeymoon(["Mrs. Casaubon", "Mr. Casaubon"],
          Place("Rome, Italy",
                [Place("Vatican Museum",
                       [Place("Statue #1"), Place("Statue #2")])]))

Here’s how the log messages generated by the code look, as summarized by the eliot-tree tool:

68c12428-5d60-49f5-a269-3fb681938f98
+-- honeymoon@1
    |-- people: [u'Mrs. Casaubon', u'Mr. Casaubon']

361298ae-b6b7-439a-bc9b-ffde68b7860d
+-- visited@1
    |-- people: [u'Mrs. Casaubon', u'Mr. Casaubon']
    |-- place: Rome, Italy

7fe1615c-e442-4bca-b667-7bb435ac6cb8
+-- visited@1
    |-- people: [u'Mrs. Casaubon', u'Mr. Casaubon']
    |-- place: Vatican Museum

c746230c-627e-4ff9-9173-135568df976c
+-- visited@1
    |-- people: [u'Mrs. Casaubon', u'Mr. Casaubon']
    |-- place: Statue #1

5482ec10-36c6-4194-964f-074e325b9329
+-- visited@1
    |-- people: [u'Mrs. Casaubon', u'Mr. Casaubon']
    |-- place: Statue #2

We can see different messages are related insofar as they refer to the same person, or the same thing… but we can’t trace the relationship in terms of actions. Was looking at a statue the result of visiting Rome? There’s no way we can tell from the log messages. We could manually log start and finish messages but that won’t suffice when we have many interleaved actions involving the same objects. Which of twenty parallel HTTP request tried to insert a row into the database? Chronological messages simply cannot tell us that.

The solution is to introduce two new concepts: actions and tasks. An “action” is something with a start and an end; the end can be successful or it can fail due to an exception. Log messages, as well as log actions, know the log action whose context they are running in. The result is a tree of actions. A “task” is a top-level action, a basic entry point into the program which drives other actions. The task is therefore the root of the tree of actions. For example, an HTTP request received by a web server might be a task.

In our example we have one task (the honeymoon), an action (travel). We will leave looking as a normal log message because it always succeeds, and no other log message will ever need to run its context. Here’s how our code looks now:

from sys import stdout
from eliot import start_action, start_task, to_file
to_file(stdout)


class Place(object):
    def __init__(self, name, contained=()):
        self.name = name
        self.contained = contained

    def visited(self, people):
        # No need to repetitively log people, since caller will:
        with start_action(action_type="visited", place=self.name):
            for thing in self.contained:
                thing.visited(people)


def honeymoon(family, destination):
    with start_task(action_type="honeymoon", people=family):
        destination.visited(family)


honeymoon(["Mrs. Casaubon", "Mr. Casaubon"],
          Place("Rome, Italy",
                [Place("Vatican Museum",
                       [Place("Statue #1"), Place("Statue #2")])]))

Actions provide a Python context manager. When the action or task starts a start message is logged. If the block finishes successfully a success message is logged for the action; if an exception is thrown a failure message is logged for the action with the exception type and contents. Not shown here but supported by the API is the ability to add fields to the success messages for an action. A similar API supports Twisted’s Deferreds.

Here’s how the log messages generated by the new code look, as summarized by the eliot-tree tool:

f9dcc74f-ecda-4543-9e9a-1bb062d199f0
+-- honeymoon@1/started
    |-- people: [u'Mrs. Casaubon', u'Mr. Casaubon']
    +-- visited@2,1/started
        |-- place: Rome, Italy
        +-- visited@2,2,1/started
            |-- place: Vatican Museum
            +-- visited@2,2,2,1/started
                |-- place: Statue #1
                +-- visited@2,2,2,2/succeeded
            +-- visited@2,2,3,1/started
                |-- place: Statue #2
                +-- visited@2,2,3,2/succeeded
            +-- visited@2,2,4/succeeded
        +-- visited@2,3/succeeded
    +-- honeymoon@3/succeeded

No longer isolated fragments of meaning, our log messages are now a story. Log events have context, you can tell where they came from and what they led to without guesswork. Was looking at a statue the result of the honeymoon? It most definitely was.