An oral history of Bank Python (2021)

tosh1 pts0 comments

An oral history of Bank Python

An oral history of Bank Python

November 2021

The strange world of Python, as used by big investment banks

High finance is a foreign country; they do things differently there

Today will I take you through the keyhole<br>to look at a group of software systems not well known to the public, which I<br>call "Bank Python". Bank Python implementations are effectively proprietary<br>forks of the entire Python ecosystem which are in use at many (but not all)<br>of the biggest investment banks. Bank Python differs considerably from the<br>common, or garden-variety Python that most people know and love (or hate).

Thousands of people work on - or rather, inside - these systems but there is<br>not a lot about them on the public web. When I've tried to explain Bank Python<br>in conversations people have often dismissed what I've said as the ravings of a<br>swivel-eyed loon. It all just sounds too bonkers.

I will discuss a fictional, amalgamated, imaginary Bank Python system called<br>"Minerva". The names of subsystems will be changed and though I'll try to be<br>accurate I will have to stylise some details and - of course: I don't know<br>every single detail. I might even make the odd mistake. Hopefully I get the<br>broad strokes.

Barbara, the great key value store

The first thing to know about Minerva is that it is built on a global database<br>of Python objects.

import barbara

# open a connection to the default database "ring"<br>db = barbara.open()

# pull out some bond<br>my_gilt = db["/Instruments/UKGILT201510yZXhhbXBsZQ=="]

# calculate the current value of the bond (according to<br># the bank's modellers)<br>current_value: float = my_gilt.value()

Barbara is a simple key value store with a hierarchical key space. It's<br>brutally simple: made just from<br>pickle and<br>zip.

Barbara has multiple "rings", or namespaces, but the default ring is more or<br>less a single, global, object database for the entire bank. From the default<br>ring you can pull out trade data, instrument data (as above), market data and<br>so on. A huge fraction, the majority, of data used day-to-day comes out of<br>Barbara.

Applications also commonly store their internal state in Barbara - writing<br>dataclasses straight in and out with only very simple locking and transactions<br>(if any). There is no filesystem available to Minerva scripts and the little<br>bits of data that scripts pick up has to be put into Barbara.

Internally, Barbara nodes replicate writes within their rings, a bit like how<br>Dynamo<br>and BigTable work.<br>When you call barbara.open() it connects to the nearest working instance of<br>the default ring. Within that single instance reads and writes are strongly<br>consistent. Reads and writes from other instances turn up quickly, but not<br>straight away. If consistency matters you simply ensure that you are always<br>connecting to a specific instance - a practice which is discouraged if not<br>necessary. Barbara is surprisingly robust, probably because it is so simple.<br>Outright failures are exceptionally rare and degraded states only a little more<br>common.

Some example paths from the default ring:

Path<br>Description

/Instruments<br>Directory for financial instruments (bonds, stocks, etc)

/Deals<br>Directory for Deals (trades that happened)

/FX<br>Foreign exchange divisions' general area

/Equities/XLON/VODA/<br>Directory for things to do with Vodaphones shar es

/MIFID2/TR/20180103/01<br>Intermediate object from some business process

Barbara also has some "overlay" features:

# connect to multiple rings: keys are 'overlaid' in order of<br># the provided ring names<br>db = barbara.open("middleoffice;ficc;default")

# get /Etc/Something from the 'middleoffice' ring if it exists there,<br># otherwise try 'ficc' and finally the default ring<br>some_obj = db["/Etc/Something"]

You can list rings in a stack and then each read will try the first ring, and<br>then, if the key is absent there, it will try the second ring, then the third<br>and so on. Writes can either always go to the first ring or to the uppermost<br>ring where that key already exists (determined by configuration that I have not<br>shown).

There are some good reasons not to use Barbara. If your dataset is large it<br>may be a good idea to look elsewhere - perhaps a traditional SQL database or<br>kdb+. The soft limit on (compressed)<br>Barbara object sizes is about 16MB. Zipped pickles are pretty small already so<br>this is actually quite a large size. Barbara does feature secondary indices<br>on object attributes but if secondary indices are a very important part of<br>your program, it is also a good idea to look elsewhere.

Dagger, a directed, acyclic graph of financial instruments

One important thing that investment banks do is estimate the value of financial<br>instruments - "asset pricing". For example a bond is valued as all the money<br>that you'll get for owning it, discounted a bit for the danger of the issuer of<br>the bond going bust. Bonds are probably (conceptually!) the simplest<br>instrument going and of much greater interest is the valuation of other,<br>"derivative",...

barbara python ring bank from default

Related Articles