pairSEQ breaks down another barrier to fixing, well, everything.

There are a ton of things that make Adaptive a special place — but for me, the real magic is our team’s unique ability to blur the line between biology and computation. A year ago when I was trying to decide if I should join the company, it was Harlan describing pairSEQ that tipped me over the edge. I’ve wanted to write about it here ever since, and now that we’ve published the paper I can finally do it. Woo hoo!

pairSEQ is one of the key advancements driving our expansion into therapeutics. And that’s great, because it puts us one step closer to actually fixing immune-related diseases. But it’s also just a mind-blowing festival of geeky awesome brain crack. That’s what I’m focused on today. 😉

Single-chain immunosequencing is super-useful…

If you’ve been paying attention, you’ll remember that our core technology uses next-generation sequencing to determine the “thumbprints” for millions of adaptive immune cells in a sample of blood, bone marrow or other tissue. These thumbprints, and the way they group together, tell us some incredible things about the state of the immune system. We can use them to track the progression and recurrence of certain blood cancers, predict the likelihood that immunotherapy will be effective, and much more. It’s cool stuff.

But it’s also only part of the story. T-Cells and B-Cells are “heterodimers” … a word that kind of sounds dirty but really just means that they’re composed of a pair of distinct protein chains. T-Cells have “alpha” and “beta” (or “delta” and “gamma”) chains; B-Cells have “heavy” and “light”.

Our core immunosequencing process measures the gene sequences for these chains independently. That is, we can tell you which alpha clones are in a sample, and which beta clones are in the same sample, but we historically haven’t been able to tell you which alphas were paired with which betas.

For diagnostic purposes, this isn’t a big deal. The TCR Beta sequence is incredibly diverse, and its CDR3 “thumbprint” is more than enough to use as a marker for disease (and similar for heavy-chain sequences in B-Cells). This is our bread and butter and frankly we’re rocking the world with it.

… but sometimes you just gotta find the pairs.

Still, as useful as the individual chains are, they only get you so far. It turns out that the “shape” of a T-Cell receptor is determined by the alpha and beta chains together — and it’s that unique shape that enables the cell to precisely target one specific antigen.

This targeting is the basis of some of the most exciting work in immunotherapy: identify a T-Cell that attacks a particular bad guy, then copy that receptor shape to create a therapeutic (the idea behind CAR T-Cell Therapies). Simple in concept, but until you can identify paired chains, basically impossible.

Past attempts focused on the physical biology of single cells. For example: extract the genes from a single cell, paste them together using bridge PCR, and then sequence them as a unit. It works — but only one cell at a time. A more recent approach tries to automate the process by isolating single cells in tiny droplets of oil. This seems to work better, and has identified thousands of pairs, but is cumbersome to manage at scale.

Hooray for math!

pairseqThis is where the Adaptive magic — combining biology and computation — makes the difference. Harlan and his team realized that we didn’t need to isolate the individual cells at all. Because the sequences are so highly diverse, we can instead use probability and combinatorics to do the hard work for us. Here’s how:

  1. Take a sample and distribute it randomly across N (we used 96) wells.
  2. Amplify the alpha and beta chains within each well, just as we’d do for traditional immunosequencing.
  3. Use standard barcode adapters to tag each chain with a unique identifier corresponding to the well it was placed in.
  4. Mix the whole soup back together and run it through the sequencer.

Now, say we’ve found alpha sequence A in wells A1, B5 and E3. We then find beta sequence B in the same wells. Because we know the number of wells and the total number of cells we started with, we say with X% confidence that these chains must have come from a pair. Want to be more than X% sure? Just add more wells.

Of course, the math is a bit more complicated that than, because there are a bunch of confounding factors. Like, even though sequence B may have actually been present in a well, our PCR process may have missed it. So the paper is, like any good scientific piece, full of impenetrable equation porn.

But the basics are pretty simple — and incredibly effective. Our first run identified more than an order of magnitude more pairs than previous known methods, and did so using standard lab equipment and consumables. Therapeutics here we come.

This is why bringing both biology and computation to the party makes such a difference. We simply have double the weaponry at our disposal to attack hard problems. And dang if we don’t use those weapons really well. I’m super-proud to be a part of it.

Yeah, just another day at the coolest company around.

Advertisement

Noodling on samples, the real currency of the life sciences.

While it’s obvious in retrospect, until recently I didn’t really understand the one thing that fundamentally makes it difficult to move the life sciences forward. It comes down to exactly one word: samples.

We have no credible virtual models of the human body, so the only way we can figure things out is to measure “stuff” in real people. Stuff that usually requires cutting or poking into their bodies. Often many times, over the course of many years. After giving them drugs or other agents that we’re not really sure about. And not just any people, but those that match certain disease or other criteria that may be pretty rare. And not just a couple of people, but enough to be able to draw statistically-relevant conclusions.

Looked at in this way … it’s amazing we learn anything, actually.

We’ve tried to create mathematical models of biology. We’ll get there someday, but so far we’ve struggled. Which is why, despite all the ethical questions around humanely managing the practice, we do so much of our experimentation using animals — preferably animals like mice that reproduce quickly and can be engineered to approximate the conditions that affect humans.

But the gold standard, and the one required by civilized society before calling any new drug or intervention “good” … is human studies. And that is just a long, long, expensive road.

This reality has a surprising impact on just about every facet of our work. For example, over the last week or so I’ve been helping out a couple of our scientists set up a public data set we’ve created. It’s pretty awesome — immune sequencing data that we’ve created at significant expense — and we’re giving it away for free. (Of course, it’s not completely altruistic; we want folks to see for themselves the value of the assays we’ve developed and sell, but still.)

The route to publicizing data like this is to publish it in a “respectable” journal. To do that, you have to get through a gauntlet of peer review that passes judgment on its overall scientific value. And human nature being what it is, this is an EXTREMELY political process. It’s a perfect vehicle for cranky scientists to show just how much smarter they are than everyone else. Super annoying.

There must be a better way, I thought. Certainly in a connected world like this we have plenty of options to make people aware of the data. But our CSO pushed back on me — how would this really work? Folks need some confidence beyond “we say so” to believe that the data we share is valid and trustworthy, because all science is building one discovery atop another.

And here’s the rub — it is so expensive and time-consuming to run life science experiments; there is simply no practical way to validate them all in the real world. So we’ve adopted peer-review, an intentionally political process as a proxy to, hopefully, ensure that at least we aren’t actively lying and can back up our processes and methods.

Fields backed by well-understood math don’t have this problem. In theoretical physics, anyone (ok not anyone, but enough) with a computer or maybe even a pencil can validate published results with minimal investment. And they do, so bogus research gets noticed pretty quickly. Not so for us.

It’s frustrating when I can’t come up with a better solution for something that seems so obviously broken. But I guess that’s one more reason that the work we’re doing now is important … it gets us closer to understanding the systems, and with enough understanding we will begin to create the mathematical models that will accelerate progress and eliminate human ego from the game.

Wait, would that be the singularity?

Dude.