Big data for Thanksgiving

Got up late, went for a run, helped get things ready for Thanksgiving, ate a TON, and watched a bit of football, all with the entire Nolan West clan in attendance*. Not a bad way to spend my favorite holiday!

Before I doze off again, I wanted to share a quick story from yesterday at work that illustrates just how awesome Adaptive is, and why I am thankful to be part of it. “Big data” at Adaptive isn’t just empty marketing; it’s a tool we use to help real people in incredibly concrete ways.

One of my goals for early in 2015 is to help scale up clonoSEQ, our diagnostic test that helps detect relapse in blood cancer patients. As part of planning this, I checked in with our Director of Translational Medicine to understand the work he does to develop the narrative “interpretations” that accompany the numeric results of these tests.

As with many types of tests, we use thresholds to understand what is meaningful vs. not. In a very rough way, if a particular clone makes up more than 5% of a patient’s immune repertoire, we believe it’s a diagnostic marker for their cancer. When numbers are way higher or lower than this, life is easy. But what about when they’re bouncing right around the cutoff?

We know that there are some sequences that show up in many different people at relatively higher concentrations, just because they’re “easier” for the body to make (less mutation, etc.). Further, repertoires can fluctuate for lots of common reasons that we never even notice. So when we see these borderline sequences, they very well could be signal OR noise.

We use lots of tools to make these distinctions. But one that is incredibly cool is — effectively in real time, we can search the billions of sequences we have seen in the real world to understand if a borderline sequence has been seen in other patients. If it has, odds are high that the sequence is unrelated to their cancer.

Think about that for a second. We’re creating both the technology and the data sets to determine not just what ONE immune system looks like, but thousands and millions. Armed with this information we can start to detect population-level patterns and understand mechanisms that nobody has ever had the remotest chance of seeing before. And that means better diagnostics and treatments in the real world.

So this year, I’m thankful to be a part of a new company that is helping real people in amazing ways.

And of course, for my wicked awesome new nephew Asher!

Hope you all have a great holiday.

* Hmm, now that Ben, Kelly and Asher are in Boulder, does that mean “Nolan West” has new citizens? Still strictly on the wrong side of the Divide, but it’s pretty close. Will have to think on this.

PCR, aka Xerox for DNA

When folks hear about my gig here at Adaptive, their first questions are always about the sequencing — and indeed we have some wicked cool machines for this in the lab. But our special sauce really is what comes before, and after, the Illuminas do their thing.

The “before” magic comes in the form of chemicals — our proprietary “primer mix” that drives the PCR or “Polymerase Chain Reaction” step. PCR is a pretty amazing process that won a dude the Nobel Prize back in 1993 and drives everything from criminal forensics to the immunosequencing we do here.

For our purposes, in order to get a good representation of the T- or B-Cells in your repertoire, we need to “amplify” the samples we receive (blood, tissue, etc.) pretty dramatically. Remember, your immune system is incredibly diverse, and many of the clones may only appear a couple of times in the sample — like just a few actual cells. We need to multiply the heck out of these guys to get enough that we can detect them during sequencing. We use PCR to do that.

The process works pretty much exactly like the old Faberge commercial that very few of you are old enough to remember but you can watch on YouTube.  You start with one strand, and turn that into two, then four, then eight, and so on. A typical 20-cycle run of PCR will turn one strand into just over a million. Each cycle works like this:

  1. Mix your sample with an enzyme like Taq polymerase, a bunch of free-floating nucleotides and a bunch of “primer” DNA fragments. The primer matches a constant section of DNA you know is present on your sample, and kickstarts the DNA assembly process much like a seed crystal starts the awesome rock candy creation process.
  1. The enzyme induces the primer to attach at the right place to your sample DNA strand, and then adds matching nucleotides one by one from there along the sample. Once you’ve let that jiggle around for awhile, each strand will have been fully copied and bound to its mate.
  1. Now heat up the mix to about 95 degrees Celsius. This “denatures” the DNA strands, breaking the hydrogen bonds between them so they separate into individual strands again.
  1. Cool things down and start over with a new cycle.

pcr

So again pulling this back to Adaptive — remember from my last post that our task is to identify the key receptor sequences for T- and B-Cells that are created by randomly combining alleles from specific parts of your genome (“V” and “J” at the ends). While there are a bunch of possible alleles in play (and mutations that can happen too), we’ve been able to create primer libraries that capture the necessary sequences to hit pretty much the full complement. Those primers are a key part of our magic.

But of course, it’s never quite that simple. Our process is called “Multiplex PCR” because it uses a bunch of primers simultaneously — we have to do this in order to capture all the variants in your repertoire. But as it turns out, different primers can replicate at different rates. Oh crap.

This matters because we’re not measuring your original cells, we’re measuring a set that has been multiplied many times over. In order to be useful, the ratio of each sequence to the total has to stay constant — that is, if 10% of the cells in the amplified set are sequence X, then we need to be able to trust that 10% of the cells in the original set were too. But if things are multiplying at different rates, we’re obviously, well, screwed.

Happily, there is more Adaptive magic to the rescue. We’ve designed ways to track and measure this “amplification bias” for our primers and correct it with a combination of chemistry and computational tools. This is where the software starts to get pretty awesome too.

But it’s also where I’m going to quit for the day. Hope you have a great weekend!

Making the world a better place … through software-defined data centers for cloud computing

Best. Show. Ever. Go watch, then come back: https://www.youtube.com/watch?v=J-GVd_HLlps.

Back? OK.

Sooooo…. here at Adaptive, we’re also trying to make the world a better place … BY HELPING DISCOVER TESTS AND TREATMENTS TO STOP CRAP DISEASES FROM ACTUALLY KILLING PEOPLE.

It’s kind of awesome.

And we need more help. I’m looking to hire a number of people onto my engineering team over the next few months. Not a ton, and I’m happy to be patient — but if you’re interested in building software that is about as close to the edge of medicine as it gets, I’d love to hear from you.

My team is responsible for three key chunks of work:

  1. The “pipeline” is a body of code that takes the raw data files produced by next-generation sequencers and boils them up into usable, normalized data. This involves error correction, sequence alignment, gene identification, stuff like that. This is serious computation — lots of parallel processing, and code efficiency really matters.
  1. The “analyzer” is all about taking immunosequencing data and turning it into information. We’ve got a bunch of web-based charts and tools, but have only scratched the surface of what we really want to do. Our next steps here will involve more flexible visualization tools, some big bets on helping researchers collaborate and publish, and integrations with popular scientific platforms like R.
  1. We’re also committed to creating the best and most efficient place for researchers to conduct their experiments — this is all about great web-based logistics for ordering products and services and coordinating the exchange of physical samples and data. Sounds simple, but it’s anything but — making Multiplex PCR and high-throughput sequencing manageable? Not easy.

All three of areas would benefit from more focused brainpower. We work largely in Java plus a little Python; up and down the stack from front end to back. We need hands-on people with enough capacity to both design great solutions and make them real. And to be successful here, you need to take it personally — we all screw up (ask my team about the double-bug commit I made this morning), but when we do, we make it right. I’m proud of my work and expect you to be so as well.

We are differentiating Adaptive through our chemistry and through our software. If we execute well, it’s pretty much an unbeatable combination. So send me a resume — you can contact me using @importimmunity on Twitter, or drop me an email at sean-recruit AT adaptivebiotech DOT com.

Looking forward!

PS. We work out of offices on the southeast side of Lake Union in Seattle, just off the water and with a great rooftop view of the seaplanes. It’s a great location and we just pretty much tripled our office space to support our growth plans. I’m happy to consider relocating the right person here, but obviously it’s easier if you’re local already. Either way, let’s chat.