to boldly go

Oct 9

It’s about Federation. You know, - The Federation.

Star Trek?

No? Nothing?

Okay, fine, having outed myself as a Trekkie (lite - I will gladly cop to having watched every episode of The Next Generation, but the rest is a little patchy). Anyway, I’ll move on.

Federated Learning. That’s what I’m getting at. What is it, I hear you rumble? At it’s simplest, you can think of it as distributed collaboration without data centralization.

Why is it important? Well, imagine that you work in an industry that is highly protective of it’s intellectual property or where data are extremely sensitive. In the service of your work, let’s say you build predictive models using Machine Learning (ML) strategies - but sharing data across organizations is verboten.

Federated Learning offers a mechanism through which such models can be shared and subsequently trained across many organizations, each of which can keep their data local (and hidden). So, instead of sending data to a central location for training, each participant trains the model on its own data, and only sends back updates to the model (weights or gradients). The coordinator then aggregates the updates to improve the global model. Nifty, eh?

I recently learnt of the work of Apheris during a presentation at the Computer-Aided Drug Design Gordon Research Conference (CADD GRC) earlier this year. They’ve built an infrastructure to enable collaboration across industrial members, in the service of the collective. It’s quite cool. Their AI Structural Biology (AISB) network is a wonderful example of industrial partners coming together to engage in efforts to improve OpenFold3 for instance.

Another recent example of Federated Learning, is Eli Lilly’s TuneLab, a platform built on top of the Rhino Federated Computing platform, allowing participants to use, and improve, Lilly’s historical ML models. The TuneLab landing page has a nice explanatory video that’s worth a watch. Interestingly, Rhino appears to be industry agnostic, while Apheris is a clear appeal to Life Science participants, at least for now.

Lilly have a wonderful track record of exploring innovative ways of working, although as we noted in the third article of our Cornetto Trilogy, much of their original Crowd Sourcing work is no longer available online. We hope that the long term maintenance and sustainability of these efforts is something they’ve considered.

I had an opportunity to briefly chat to the GRC speaker, and suggested one little tweak. Maybe there’s a way to let other folks play as well? The genesis of my interest in Crowd Sourcing was my experience with Kaggle, the community of machine learning experts who compete, often times for money, to make the best models they can. What if there was a way to add in those kind of dynamics to the efforts described above, would that take this to next level?

Federated Learning solves for the problem that no one organization has all the data. A competitive market place for model improvement that would sit on top of that solves for the problem that no one organization has all the smart people (a point recently revisited in the Economist).

Anyways, all exciting stuff, and closer and closer to the Vulkan principal of ‘Infinite Diversity in Infinite Combinations’.

Live Long and Prosper peeps 🖖.

David Thompson

to boldly go

jam or cream?