How Big Banks Thread The Software Performance Needle

Tariq Hook Blog


While parallel programming on distributed systems is difficult, making applications scale across multiple machines – or hybrid compute elements that mix CPUs with FPGAs, GPUs, DSPs, or other motors – linked by a network is not the only problem that coders have to deal with. Inside each machine, the number of cores and threads have ballooned in the past decade, and each socket is as complex as a symmetric multiprocessing system from two decades ago was in its own right.

With so many cores and usually multiple threads per core to execute software, getting the performance out of software can be a tricky business. At the world’s hyperscalers, financial services behemoths, HPC centers, and database and middleware providers, the smartest programmers in the world are often off in a corner, with pencil and paper, mapping out the dependencies in the hairball of code they and their peers have created to find out the affinities between threads within that application. Having sorted out these dependencies, they engage in the unnatural act of pinning software processes or threads to specific cores in a physical system to optimize their performance.

Pinning threads is a bit like doing air traffic control in your head, and Leonardo Martins had such an onerous task a few years back. Martins got his start in the IT sector two decades ago as an engineer at middleware software makers Talarian and TIBCO before moving to Lehman Brothers to introduce Monte Carlo simulation systems for risk management to the bank. In 2004, he moved to Barclays Capital to introduce its first Linux-based systems as its senior middleware program manager and architect, and in 2010, he was the low latency senior architect at HSBC. While at HSBC, Martins was one of the wizards that would map out the applications and figure out how to pin their threads to specific cores in a system to maximize performance – a process that might take anywhere from two to eight weeks.

This is not big deal, right? Wrong. At the major financial institutions, the trading applications are updated at least monthly and sometimes as much as 200 times a year, so having the tuning process take weeks to months means code is never as optimized as it needs to be for a competitive edge. Martins looked around for a tool that would automate this thread pinning, and when he could not find one he found a few peers and set out to create one.

Martins founded Pontus Networks back in 2010 as a consultancy specializing in the tuning of latency sensitive applications, and was joined by Martin Raumann, an FPGA designer and specialist in low latency, high frequency trading hardware, and Deepak Aggarwal, another C, C++, C#, and Java programmer with deep expertise in distributed systems who built front office and back office systems for equities, foreign exchange, and fixed income asset trading at Barclays Capital, Credit Suisse, Citigroup, ABN, and Standard Chartered. They started work on the Pontus Vision Thread Manager and filed their first patents relating to the automated thread pinning in August 2014. The alpha version of Thread Manager debuted quietly at the end of November last year with its first customer, and the product is now available and has been acquired by three customers – all of whom are in the financial services sector. It is a fair guess that these companies are probably the ones where the founders of Pontus Networks used to work and do such painstaking thread pinning work, but that is just a guess.

Several other HPC-related users in government and university labs as well as a few Formula One racing teams are kicking the tires to see how Thread Manager might remove the human bottleneck and help get tuned software into production faster. In this latter case, Thread Manager is expected to help boost the performance of the mechanical engineering design and simulation programs as well as some of the post-processing that is done on designs to test them.) The company is also getting ready to do some performance tests on Hadoop clusters as well, and thinks that performance boosts on HDFS storage will be similar to what it has seen on Extract-Test-Load (ETL) applications that front end data warehouses. (Informatica is working with Pontus Networks on these tests.)

Read the rest of the article by Timothy Prickett Morgan over at The Next Platform.