Carl de Marcken: Inside Orbitz -->
(Here's an email Carl de Marcken of ITA Software sent to<br>a friend, describing their experiences using Lisp in one of the software<br>industry's most demanding<br>applications.)
Date: Fri, 12 Jan 2001 15:42:34 -0500
From: Carl de Marcken
Geoffrey,
Here are some tidbits...
1. Right now Sabre, Galileo, Amadeus and Worldspan operate many<br>millions of dollars of IBM and Unisys mainframes each to<br>answer the vast majority of queries done by airline phone agents,<br>airport desk agents, travel agents, and travel web sites (other than<br>our own and our customers'). Their computers are housed in<br>bomb-proof, fire-walled (literally) complexes in Kansas City, Denver,<br>Germany and Atlanta, and mostly run assembly language code for<br>performance reasons. From what we can discern, their algorithms are<br>basic: until we pointed it out to them I don't think they had any<br>understanding of how hard the problem they're trying to solve is, or<br>how far their solutions are from optimal.
2. ITA Software is slowly replacing the industry's hardware and<br>software with Common Lisp code running on Linux PCs, that uses<br>relatively involved algorithms that show off our academic CS<br>background. The easiest place to see the code in action is on our web<br>site,<br>www.itasoftware.com.
3. The vast majority of our "thinking" code is in Common Lisp. We run<br>both<br>CMUCL and<br>Franz,<br>under Linux/Intel, HPUX/PA, and NT/Intel, and<br>have about 200,000 lines of Lisp in our base search engine. Our web<br>site page generation code is also largely written in Common Lisp,<br>though there's also fair bit of Java there.
4. Because we have about 2 gigs of static data we need rapid access<br>to, we use C++ code to memory-map huge files containing pointerless C<br>structs (of flights, fares, etc), and then access these from Common<br>Lisp using foreign data accesses. A struct field access compiles into<br>two or three instructions, so there's not really any performance.<br>penalty for accessing C rather than Lisp objects. By doing this, we<br>keep the Lisp garbage collector from seeing the data (to Lisp, each<br>pointer to a C object is just a fixnum, though we do often temporarily<br>wrap these pointers in Lisp objects to improve debuggability). Our<br>Lisp images are therefore only about 250 megs of "working" data<br>structures and code.
5. Every query that hits our site gets sent via tcpip to a Lisp<br>process running on an dual 800mhz x86 Linux box with 2g of ram ($3000,<br>vs about $1,000,000 for a similarly capable mainframe), and the<br>process devotes between 5 and 15 seconds of CPU time to it. One of<br>our customers will have 200 such boxes, each running 2 or 3 Lisp<br>processes. We save on ram by putting multiple processes on one box,<br>since the virtual memory system automatically shares our read-only<br>memory-mapped files between processes.
6. If you want to do a simple round-trip from BOS to LAX in two weeks,<br>coming back in three, willing to entertain a 24 hour departure window<br>for both parts, then limiting to "reasonable" routes (at most 3<br>flights and at most 10 hours or so) you have about 5,000 ways to get<br>there and 5,000 ways to get back. Listing them is a mostly trivial<br>graph-search (there are a few minor complications, but not many), that<br>anybody could do in a fraction of a second.
7. The real challenge is that a single fixed itinerary (a fixed set of<br>flights from BOS to LAX and a fixed set back) with only two flights in<br>each direction may have more than 10,000 possible combinations of<br>applicable "fares", each fare with complex restrictions that must be<br>checked against the flights and the other fares. That means that the<br>search space for this simple trip is of the order 5000 x 5000 x 10000,<br>and a naive program would need to do a _lot_ of computation just to<br>validate each of these possibilities. Suitably formalized, its not<br>even clear that the problem of finding the cheapest flight is<br>NP-complete, since it is difficult to put a bound on the size of the<br>solution that will result in the cheapest price. If you're willing to<br>dispense with restrictions on the energy in the universe, then it is<br>actually possible to formalize the cheapest-price problem in a<br>not-too-unreasonable way that leads to a proof of undecidability by<br>reduction to the Post correspondance problem :-).
8. Our Lisp code is running very clever algorithms that let us produce<br>in a reasonable time a data structure we call the "pricing-graph" from<br>which we can very efficiently answer a query of the form "give me the<br>k-th best solution (a validated set of flights and fares), ordered<br>according to the function f", assuming of course certain restrictions<br>on f, where the number of answers represented by the pricing-graph is<br>10^20 - 10^30 depending on the type of trip. In this way, we can<br>reasonably claim that in 10 seconds we can produce 10^30 answers, even<br>if we could not possibly enumerate the list of such answers.
9. We can do 10 seconds of Lisp computation on a 800mhz box and cons<br>less than 5k of data. This is because we...