An oral history of Bank Python
An oral history of Bank Python
November 2021
The strange world of Python, as used by big investment banks
High finance is a foreign country; they do things differently there
Today will I take you through the keyhole<br>to look at a group of software systems not well known to the public, which I<br>call "Bank Python". Bank Python implementations are effectively proprietary<br>forks of the entire Python ecosystem which are in use at many (but not all)<br>of the biggest investment banks. Bank Python differs considerably from the<br>common, or garden-variety Python that most people know and love (or hate).
Thousands of people work on - or rather, inside - these systems but there is<br>not a lot about them on the public web. When I've tried to explain Bank Python<br>in conversations people have often dismissed what I've said as the ravings of a<br>swivel-eyed loon. It all just sounds too bonkers.
I will discuss a fictional, amalgamated, imaginary Bank Python system called<br>"Minerva". The names of subsystems will be changed and though I'll try to be<br>accurate I will have to stylise some details and - of course: I don't know<br>every single detail. I might even make the odd mistake. Hopefully I get the<br>broad strokes.
Barbara, the great key value store
The first thing to know about Minerva is that it is built on a global database<br>of Python objects.
import barbara
# open a connection to the default database "ring"<br>db = barbara.open()
# pull out some bond<br>my_gilt = db["/Instruments/UKGILT201510yZXhhbXBsZQ=="]
# calculate the current value of the bond (according to<br># the bank's modellers)<br>current_value: float = my_gilt.value()
Barbara is a simple key value store with a hierarchical key space. It's<br>brutally simple: made just from<br>pickle and<br>zip.
Barbara has multiple "rings", or namespaces, but the default ring is more or<br>less a single, global, object database for the entire bank. From the default<br>ring you can pull out trade data, instrument data (as above), market data and<br>so on. A huge fraction, the majority, of data used day-to-day comes out of<br>Barbara.
Applications also commonly store their internal state in Barbara - writing<br>dataclasses straight in and out with only very simple locking and transactions<br>(if any). There is no filesystem available to Minerva scripts and the little<br>bits of data that scripts pick up has to be put into Barbara.
Internally, Barbara nodes replicate writes within their rings, a bit like how<br>Dynamo<br>and BigTable work.<br>When you call barbara.open() it connects to the nearest working instance of<br>the default ring. Within that single instance reads and writes are strongly<br>consistent. Reads and writes from other instances turn up quickly, but not<br>straight away. If consistency matters you simply ensure that you are always<br>connecting to a specific instance - a practice which is discouraged if not<br>necessary. Barbara is surprisingly robust, probably because it is so simple.<br>Outright failures are exceptionally rare and degraded states only a little more<br>common.
Some example paths from the default ring:
Path<br>Description
/Instruments<br>Directory for financial instruments (bonds, stocks, etc)
/Deals<br>Directory for Deals (trades that happened)
/FX<br>Foreign exchange divisions' general area
/Equities/XLON/VODA/<br>Directory for things to do with Vodaphones shar es
/MIFID2/TR/20180103/01<br>Intermediate object from some business process
Barbara also has some "overlay" features:
# connect to multiple rings: keys are 'overlaid' in order of<br># the provided ring names<br>db = barbara.open("middleoffice;ficc;default")
# get /Etc/Something from the 'middleoffice' ring if it exists there,<br># otherwise try 'ficc' and finally the default ring<br>some_obj = db["/Etc/Something"]
You can list rings in a stack and then each read will try the first ring, and<br>then, if the key is absent there, it will try the second ring, then the third<br>and so on. Writes can either always go to the first ring or to the uppermost<br>ring where that key already exists (determined by configuration that I have not<br>shown).
There are some good reasons not to use Barbara. If your dataset is large it<br>may be a good idea to look elsewhere - perhaps a traditional SQL database or<br>kdb+. The soft limit on (compressed)<br>Barbara object sizes is about 16MB. Zipped pickles are pretty small already so<br>this is actually quite a large size. Barbara does feature secondary indices<br>on object attributes but if secondary indices are a very important part of<br>your program, it is also a good idea to look elsewhere.
Dagger, a directed, acyclic graph of financial instruments
One important thing that investment banks do is estimate the value of financial<br>instruments - "asset pricing". For example a bond is valued as all the money<br>that you'll get for owning it, discounted a bit for the danger of the issuer of<br>the bond going bust. Bonds are probably (conceptually!) the simplest<br>instrument going and of much greater interest is the valuation of other,<br>"derivative",...