The Amazing that is PD-1 Blockade

The office is clearly closing up shop for the holiday; this’ll be my last task before heading out myself for a few days. I actually really like these quiet days at work; I can often pick up a few of the long-running side projects that are so hard to squeeze in when everybody’s here. This week it’s been some integration work with Illumina’s BaseSpace service, which is honesty pretty sweet.

cover_nature_V515_Number7528_0End of the year holiday time is also always good for a bit of reflection, and I’ve been appreciating the opportunity I’ve had to start working here with the folks at Adaptive. In particular, a paper was just published by a customer that just has me shaking my head at how incredible this space is.

PD-1 Blockade drugs like Nivolumab are the current face of immunology — these incredible therapies seem to be showing more success against cancers than anything we’ve tried in years. Rather than just trying to knock out tumor cells with radiation or toxic chemicals, PD-1 blockades unleash our own immune systems to do the job. Here’s the deal:

Our T- and B-Cells express a particular protein called “programmed cell death 1.” Like other receptors, PD-1 is anchored in the cell kind of like a blade of grass sits in the ground, with part of it inside the cell wall, and the rest dangling outside.

PD-1’s job is to slow down our immune response. It waits for special proteins (PD Ligands) to float by and bind to its receptor end, which can result in one of two behaviors. Normal immune cells just commit suicide; regulatory T-Cells do the opposite and get busy. The net effect is that when PD-1 is activated, your immune system starts to go quiet.

This is normally a good thing — for example, PD-1 keeps our immune system from attacking itself. But many types of cancer have “figured out” the game; they artificially accelerate the production of PD ligands themselves! This is amazing to think about — the cancers have actually evolved to suppress our own immune system so that we can’t fight back.

Once you see how this works, PD-1 blockade therapies seem pretty obvious — create a drug that stops PD-1 from binding with its ligands, and the immune system is freed up to go nuts. And, holy crap, this actually works!

But it’s also expensive, and it doesn’t always work. That’s where Adaptive comes in: we’ve now shown that the clonality* of a person’s immune repertoire can predict their response to PD-1 blockade. This makes sense if you think about it — all the drug is doing is opening the gates; there has to be an immune response ready to fight in the first place.

Well, it turns out that we basically invented the technology that can measure immune system clonality (amongst other things) using next-generation sequencing. Anybody else see value in a quick lab test that predicts the effectiveness of a miracle drug costing hundreds of thousands of dollars?


* When your immune system is gearing up to fight a particular bad guy, it creates millions of copies or “clones” of the specific sequences that recognize just that antigen. This is quite different when you’re healthy — your immune system then is much more diverse, and has smaller amounts of lots of different sequences, all hunting for invaders to show up.


Context is king!

Our marketing materials often quote the fact that our database contains billions of unique T- and B-Cell sequences. Seriously, that is an insane number. But in isolation, it can also be misleading. After all, your body pretty much creates sequences at random in its attempt to find ones that work — and a billion rolls of a die don’t tell you very much.

The magic happens when you add context to these sequences — tracking diagnoses and outcomes, demographics, medication use, other markers, really anything that can help complete the whole picture. This stuff combined with our sequence data is what’s (really) changing the world.

In fact, one of the toughest things about good science is picking the right attributes — traditionally they’re expensive and complex to track, and lots of them don’t even matter. This is the reason we try to help our customers with Analyzer features like projects and tagging. And over the next few years, we’ll continue to add more and more features that make it easier and quicker to figure out exactly what matters most in immunology.

Zooming out, though — what if it wasn’t so hard to collect and track this context? What if we could just track “everything” about our subjects and then use statistics to automatically figure out which attributes matter. Yes, this is the promise of big data we all love to talk about, but there are many more barriers to making it real than just more and bigger computers.

Companies like Patients Like Me and 23andMe have taken a really novel approach to this challenge. What if we enlist the subjects themselves to contribute data over time, from lots of different sources? What if researchers could re-contact those subjects to ask them new questions along the way? And what if the subjects gave consent for lots of different folks to use their information in flexible and informal ways, freeing up at least a little work from the slow-moving IRB process?

The tradeoff is a fascinating one — are we better off using small bits of highly reliable and curated data, or using tons and tons that we know is noisy? Well, actually it’s not a tradeoff at all. So long as you know what you’re working with, both can be incredibly productive ways to increase our understanding of the world.

We think about this a lot, and are making real investments to help us all better understand the adaptive immune system. Thanks for joining us on this really, really fun ride.

The definition of insanity…. works!

Not everything we do here is about fancy biology; sometimes it’s about fancy web engineering. Late last week was a good example — my favorite bug since starting at Adaptive. Fair warning, this post ranks pretty high on the geek scale.

Nothing hurts my stomach more than knowing my systems are misbehaving in some way I can’t explain. I just don’t get how folks can sit by and just ignore this; it’s way too much of a threat to my ego. Screw you, Skynet — I tell YOU what to do!

Anyways, here’s the setup. Quite frequently — not enough to reproduce it in a debugger, but often enough that we were getting a steady stream of user complaints — our web servers were sending garbled responses. This manifested in a bunch of different ways. Sometimes the browser would just render a bunch of un-interpreted HTML. Other times it would screw up AJAX logic and just make the pages act wonky. It wasn’t clear at first that these were all the same things — it just felt like the site was on fire, and we had no obvious leads to work from. But we had just propped new code before this started happening, so of course that was the obvious target.

If you want to get good at debugging, especially in distributed systems, here is the #1 thing you have to remember: KEEP LOOKING. Our local hardware store has one of those big signs they put pithy statements on, and one of their favorites is “The definition of insanity is doing the same thing and expecting a different result.” At least inasmuch as it applies to debugging, this is crap.

Again and again, it’s been made clear to me that good debuggers are the ones that keep looking at the data over, and over, and over, until the patterns finally pop out. Most people peter out and say “that’s impossible” or “there’s nothing to see here” … and that is simply WRONG. The pattern is always hiding in there somewhere, and if you keep looking you will find it.

In this case, I looked at the same logs dozens of times, and followed a bunch of dead ends, before the pattern finally peeked out. Not exactly at the same time, but really close to it, we were always seeing “HEAD” requests to the server right around the calls that would fail. I ignored these for hours because they shouldn’t have made any difference. But…..

OK, here’s where things get super-nerdy. Starting way at the beginning … your web browser talks to web servers using something called “HTTP” or Hypertext Transfer Protocol. In a nutshell, the first version of HTTP worked like this:

  1. The browser opens up a connection to the server computer. This is like dialing a phone and having the server answer.
  2. The browser sends a message over the connection that says “I’d like your homepage, please.”
  3. The server sends the HTML code that represents the site’s homepage and then hangs up the connection.

This worked great, except that step #1 was kind of slow — typically a browser will need to request not just one but many different pages and resources from the server, so “redialing” over and over was wasteful. So the protocol was updated with something called “keep-alive”, in which case the connection is kept open and used for multiple requests.

But this presented a small problem. The only way the browser knew the page was “done” was by noticing that the server had hung up the connection. If that connection stays open, how does the client figure this out? Very simply — in this new version, the server tells the browser how much data it’s going to send:

  1. The browser opens up a connection to the server computer.
  2. The browser asks for page #1.
  3. The servers says “ok, this page is 4,000 bytes long. Here you go.” And then sends the data.
  4. The browser reads out those 4,000 bytes and then using the same connection asks for page #2.
  5. The server says “ok, this one is 2,000 bytes long. Here you go.” And so on.

This is way more efficient. OK, so file that one away for a moment.

Another feature of HTTP is that the browser can ask for data in a few ways. The most common is “GET”, which just asks the server to send the data for the page, thank you very much. But sometimes the browser doesn’t need the actually data for a page, it just needs to see if it’s still there and check if it’s changed since the last time it looked. For this, it can make a “HEAD” request. The HEAD request works like this:

  1. The browser opens up a connection to the server computer, like normal.
  2. The browser makes a “HEAD” request for page #1.
  3. The server says “ok, this page is 4,000 bytes long, and it last changed on 12/1/2014.” But it doesn’t send the actual data … just general information like the size of the page.

These two concepts — “keep-alive” and “HEAD vs. GET” — were the key to this bug.

Last setup: our app is built on an open-source technology called the “Play Framework.” Play helps us match up browser requests to code, makes it easier to build pages, blah blah … not very important here. But what *is* important is that we don’t expose the Play application directly to browsers. We use a common technique called “proxying” that isolates the system a bit from the Internet. We do this with another open-source tool called the Apache web server. So our setup looks like this:

  1. Browser makes an HTTP request to Apache.
  2. Apache “relays” this request to Play.
  3. Play responds to Apache.
  4. Apache sends the response back to the browser.

Definition of Insanity

The key here is that those connections between Apache and Play just use plain old HTTP. And they use keep-alives, so that many different browser requests can “reuse” the same proxy connection between Apache and Play.

Back to those HEAD requests. When a browser makes one, Apache dutifully relays it to Play. And FINALLY, here is the bug: Play was answering “ok, this page is 4,000 bytes long, and it last changed on 12/1/2014.” BUT IT WAS ALSO SENDING THE PAGE DATA, even though this was a HEAD request. This is a violation of the HTTP protocol! So after Apache read off the first part, it just stopped reading, which left all the other stuff waiting, unread, in the connection buffer.

But remember, because of keep-alive, that connection is still open. So the NEXT time that a browser asks for a page, Apache again dutifully relays it to Play over that connection, and then tries to read the response. But because it never read out the contents from the first request, all it sees is what now looks like a bunch of garbage!

From here on out things can go a bunch of different ways, depending on the specific garbage that is sent back. But it doesn’t really matter, the damage is done. Until that connection gets reset, every browser request that uses it ends up being wonked up.

And guess what? This bug has been sitting in our code since the site launched, long before I even started working at Adaptive. But it was never really exposed, because HEAD requests are generally pretty rare. As it turns out, our operations team had (ironically) just turned on a new monitoring tool that, quite legitimately, used HEAD as one of its ways to see if the site was working properly. So the bug had nothing to do with that code prop. It was classic Heisenberg.

DAMN, SON. That was a long way to go for a stupid little bug.

But there was a point, and it’s worth saying again: KEEP LOOKING. Look at the logs, again. Try running the same request, again. Look at source, again. Look at network traces, again. Look at the code, again. It is the only way to break some of these logjams. Eventually, you will pick out the pattern.

If you’re good at this — I will hire you in a millisecond. You’re gold.

We made a thing!

kitboxI’ve had a lot of jobs, but they’ve always been about building software or services — virtual stuff. was fun because we shipped actual things, and visiting the warehouse was like a super-cool playground of awesome machines (pick-to-light was my absolute fav). But I’ve never actually been a part of making real, physical things for sale — until last week!

Check out our announcement of the immunoSEQ (TM!) hsTCRB kit. “hsTCRB” is apparently obvious secret code for “human t-cell receptors, beta chain,” i.e., the first of many versions of the assay that we’ll be selling in this form for research use.

This is cool because it basically explodes the volume of tests we can do. Traditionally, we’ve run a service business — folks send us physical samples (blood, tissue, etc.) and our lab deals with everything from there — DNA extraction and concentration, both amplification steps and sequencing. Only then does my team jump in, run the data through our processing pipeline and deliver results and visualizations through the immunoSEQ Analyzer.

Running a lab is a big deal — it takes equipment, sequencers, reagents, and perhaps most of all lots of people. I love our lab team and we’ll need them and more of them forever, but we simply wouldn’t be able to scale the physical processes fast and cost-effectively enough achieve our goals with an exclusively service-based business. Beyond that, lots of institutions just want to do their own chemistry, for reasons ranging from economics to privacy and environmental control.

The kit (mostly) frees us from these physical limitations and lets us scale up digitally, something I know pretty well. All those steps before the data pipeline can be done by our customers, then they use a little utility tool to send the raw sequencer output my way and we’re off to the races. Especially as we transition our pipeline up into the cloud (“on the line,” Steve!) … this gives us near infinite ability to accommodate new customers. Pretty sweet!

(You know what the limiting factor becomes? It’s the dirty secret of big data — bandwidth. Our first kit works exclusively with the Illumina MiSeq, which is a cool but mini-sequencer that generates about 1-4GB of data per run. Internally we mostly use HiSeqs, which generate 250GB of base call data alone. This stuff takes a long time to move! So much so that even on our internal networks we consider data transfer time when we’re forecasting how much we can process. Crazy.)

Anyways. Some fun issues:

  • Hey, the label printed about half a centimeter askew, and now you can’t read the unique number that is the KEY TO THE WHOLE PROCESS.
  • Wait, our customers aren’t all in Seattle. Does relative primer efficiency change at different ambient temperatures? Humidity? Altitude? Better do some tests ….. a LOT of tests.
  • Well, this is an interesting race between customs processing time and dry ice melt time…..
  • Wow, these screenshots we printed into the manual LAST YEAR don’t look right anymore…
  • I’m not actually sure if our supplier has a shelf big enough for this stuff.
  • Expiration date? Hmm. More tests.

And now, finally, our brand new sales team can get out there and sell the heck out of this thing. I’m thinking stocking stuffers for your whole family. Interested? Hit me up!

Big data for Thanksgiving

Got up late, went for a run, helped get things ready for Thanksgiving, ate a TON, and watched a bit of football, all with the entire Nolan West clan in attendance*. Not a bad way to spend my favorite holiday!

Before I doze off again, I wanted to share a quick story from yesterday at work that illustrates just how awesome Adaptive is, and why I am thankful to be part of it. “Big data” at Adaptive isn’t just empty marketing; it’s a tool we use to help real people in incredibly concrete ways.

One of my goals for early in 2015 is to help scale up clonoSEQ, our diagnostic test that helps detect relapse in blood cancer patients. As part of planning this, I checked in with our Director of Translational Medicine to understand the work he does to develop the narrative “interpretations” that accompany the numeric results of these tests.

As with many types of tests, we use thresholds to understand what is meaningful vs. not. In a very rough way, if a particular clone makes up more than 5% of a patient’s immune repertoire, we believe it’s a diagnostic marker for their cancer. When numbers are way higher or lower than this, life is easy. But what about when they’re bouncing right around the cutoff?

We know that there are some sequences that show up in many different people at relatively higher concentrations, just because they’re “easier” for the body to make (less mutation, etc.). Further, repertoires can fluctuate for lots of common reasons that we never even notice. So when we see these borderline sequences, they very well could be signal OR noise.

We use lots of tools to make these distinctions. But one that is incredibly cool is — effectively in real time, we can search the billions of sequences we have seen in the real world to understand if a borderline sequence has been seen in other patients. If it has, odds are high that the sequence is unrelated to their cancer.

Think about that for a second. We’re creating both the technology and the data sets to determine not just what ONE immune system looks like, but thousands and millions. Armed with this information we can start to detect population-level patterns and understand mechanisms that nobody has ever had the remotest chance of seeing before. And that means better diagnostics and treatments in the real world.

So this year, I’m thankful to be a part of a new company that is helping real people in amazing ways.

And of course, for my wicked awesome new nephew Asher!

Hope you all have a great holiday.

* Hmm, now that Ben, Kelly and Asher are in Boulder, does that mean “Nolan West” has new citizens? Still strictly on the wrong side of the Divide, but it’s pretty close. Will have to think on this.

PCR, aka Xerox for DNA

When folks hear about my gig here at Adaptive, their first questions are always about the sequencing — and indeed we have some wicked cool machines for this in the lab. But our special sauce really is what comes before, and after, the Illuminas do their thing.

The “before” magic comes in the form of chemicals — our proprietary “primer mix” that drives the PCR or “Polymerase Chain Reaction” step. PCR is a pretty amazing process that won a dude the Nobel Prize back in 1993 and drives everything from criminal forensics to the immunosequencing we do here.

For our purposes, in order to get a good representation of the T- or B-Cells in your repertoire, we need to “amplify” the samples we receive (blood, tissue, etc.) pretty dramatically. Remember, your immune system is incredibly diverse, and many of the clones may only appear a couple of times in the sample — like just a few actual cells. We need to multiply the heck out of these guys to get enough that we can detect them during sequencing. We use PCR to do that.

The process works pretty much exactly like the old Faberge commercial that very few of you are old enough to remember but you can watch on YouTube.  You start with one strand, and turn that into two, then four, then eight, and so on. A typical 20-cycle run of PCR will turn one strand into just over a million. Each cycle works like this:

  1. Mix your sample with an enzyme like Taq polymerase, a bunch of free-floating nucleotides and a bunch of “primer” DNA fragments. The primer matches a constant section of DNA you know is present on your sample, and kickstarts the DNA assembly process much like a seed crystal starts the awesome rock candy creation process.
  1. The enzyme induces the primer to attach at the right place to your sample DNA strand, and then adds matching nucleotides one by one from there along the sample. Once you’ve let that jiggle around for awhile, each strand will have been fully copied and bound to its mate.
  1. Now heat up the mix to about 95 degrees Celsius. This “denatures” the DNA strands, breaking the hydrogen bonds between them so they separate into individual strands again.
  1. Cool things down and start over with a new cycle.


So again pulling this back to Adaptive — remember from my last post that our task is to identify the key receptor sequences for T- and B-Cells that are created by randomly combining alleles from specific parts of your genome (“V” and “J” at the ends). While there are a bunch of possible alleles in play (and mutations that can happen too), we’ve been able to create primer libraries that capture the necessary sequences to hit pretty much the full complement. Those primers are a key part of our magic.

But of course, it’s never quite that simple. Our process is called “Multiplex PCR” because it uses a bunch of primers simultaneously — we have to do this in order to capture all the variants in your repertoire. But as it turns out, different primers can replicate at different rates. Oh crap.

This matters because we’re not measuring your original cells, we’re measuring a set that has been multiplied many times over. In order to be useful, the ratio of each sequence to the total has to stay constant — that is, if 10% of the cells in the amplified set are sequence X, then we need to be able to trust that 10% of the cells in the original set were too. But if things are multiplying at different rates, we’re obviously, well, screwed.

Happily, there is more Adaptive magic to the rescue. We’ve designed ways to track and measure this “amplification bias” for our primers and correct it with a combination of chemistry and computational tools. This is where the software starts to get pretty awesome too.

But it’s also where I’m going to quit for the day. Hope you have a great weekend!

Making the world a better place … through software-defined data centers for cloud computing

Best. Show. Ever. Go watch, then come back:

Back? OK.

Sooooo…. here at Adaptive, we’re also trying to make the world a better place … BY HELPING DISCOVER TESTS AND TREATMENTS TO STOP CRAP DISEASES FROM ACTUALLY KILLING PEOPLE.

It’s kind of awesome.

And we need more help. I’m looking to hire a number of people onto my engineering team over the next few months. Not a ton, and I’m happy to be patient — but if you’re interested in building software that is about as close to the edge of medicine as it gets, I’d love to hear from you.

My team is responsible for three key chunks of work:

  1. The “pipeline” is a body of code that takes the raw data files produced by next-generation sequencers and boils them up into usable, normalized data. This involves error correction, sequence alignment, gene identification, stuff like that. This is serious computation — lots of parallel processing, and code efficiency really matters.
  1. The “analyzer” is all about taking immunosequencing data and turning it into information. We’ve got a bunch of web-based charts and tools, but have only scratched the surface of what we really want to do. Our next steps here will involve more flexible visualization tools, some big bets on helping researchers collaborate and publish, and integrations with popular scientific platforms like R.
  1. We’re also committed to creating the best and most efficient place for researchers to conduct their experiments — this is all about great web-based logistics for ordering products and services and coordinating the exchange of physical samples and data. Sounds simple, but it’s anything but — making Multiplex PCR and high-throughput sequencing manageable? Not easy.

All three of areas would benefit from more focused brainpower. We work largely in Java plus a little Python; up and down the stack from front end to back. We need hands-on people with enough capacity to both design great solutions and make them real. And to be successful here, you need to take it personally — we all screw up (ask my team about the double-bug commit I made this morning), but when we do, we make it right. I’m proud of my work and expect you to be so as well.

We are differentiating Adaptive through our chemistry and through our software. If we execute well, it’s pretty much an unbeatable combination. So send me a resume — you can contact me using @importimmunity on Twitter, or drop me an email at sean-recruit AT adaptivebiotech DOT com.

Looking forward!

PS. We work out of offices on the southeast side of Lake Union in Seattle, just off the water and with a great rooftop view of the seaplanes. It’s a great location and we just pretty much tripled our office space to support our growth plans. I’m happy to consider relocating the right person here, but obviously it’s easier if you’re local already. Either way, let’s chat.

A very good place to start!

Lucky for me I long ago got over my fear of asking dumb questions — because as a software engineer suddenly responsible for code dealing in advanced biology, dumb questions come up a lot. And they start right at the basics — what are we looking for and why does it matter? I thought that especially for my traditional HIT homies it’d be interesting to start a post or two there.

Adaptive’s technologies quantify the adaptive immune system. That is, they allow us to measure specific features that help describe what the immune system is doing at a given point in time, and as things change over time. All well and good, but what are those features and why do they matter?

Our immune system is broken into two parts. The first is the “innnate” immune system, which can respond quickly but pretty generically to attackers. Barriers like skin, flytraps like snot and mucous, and general-purpose killer cells like macrophages are all part of our innate immune function.

The second part — the “adaptive” immune system — is slower to respond, but is way more targeted and effective against hardcore nasties. The adaptive system coordinates laser-focused attacks against specific antigens (e.g., chicken pox).

In order to respond with this kind of specificity, the cells of the adaptive immune system (in particular T-Cells and B-Cells) first have to be able to recognize specific antigens. This recognition is done by expressing proteins that work like a lock and key — they present a surface area that only a very specific antigen fits into. When something lands in the “lock” — it’s time to go to work.

So we make “receptors” that match specific antigens. But there are millions of possible bad guys, and worse they’re changing and evolving all the time. So how do we possibly keep up? This stuff just gets cooler and cooler.

As with everything, it starts with our genes. Our bodies are constantly just randomly “recombining” certain genes (called “V”, “D” and “J“) into new sequences and throwing them out into the wild. Some of these “naïve” cells actually attack our own healthy cells, so those get filtered out (assuming all is working properly). The rest wait around, hoping that eventually an antigen will fit into their “lock.”

When that happens — the activated cell starts replicating itself and recruiting other parts of the immune system to fight. Eventually (hopefully) the battle is won, and many of those specific cells fade away, but a few remain and create a “memory” for that specific antigen. When it shows up again, the system can respond much more quickly. If this sounds like a vaccine; it’s because that’s exactly what it is.

Tons and tons of detail left out, but that’s the basic story — we are constantly creating random receptor cells, each of which can match up with one specific antigen. When a match happens, that receptor gets cloned into an army to fight on our behalf. After the battle when the drawdown occurs, we leave a few scouts out in the field so we don’t get caught by surprise again.

With that, we can get back to Adaptive. Our stuff can pick out the millions of different receptor sequences floating around in your body and create a report that shows how much of each one you have in your “repertoire”. There are great things we can do right now with this information — like track the effectiveness of cancer treatments, and even greater things we’re working on with collaborators in our labs. It’s not so crazy to imagine that by matching specific sequences to specific antigens, we will one day be able to custom-create treatments to super-charge focused immune response. In fact, this exact idea is already showing promise in the real world.

WOO HOO indeed.

More to come.

How hard can it be?

Back in 1994 I joined the team that was building the “online” component of Windows 95 — a competitor to America Online that eventually became known as MSN. None of us really knew what we were doing, but trusted we could figure it out along the way, a state of mind my then-boss Jeff Lill immortalized with the following shirt:

As it turns out, this has been kind of a repeating theme for me — one of the best things about writing software is that it applies to everything, so there’s an endless supply of new problems to explore.

Cut to my new home at Adaptive Biotechnologies — a company that has developed insanely awesome new techniques to measure and quantify the adaptive immune system. Our tools isolate the sections of T-Cell and B-Cell DNA that bind to specific antigens, amplify and sequence those fragments using next-generation sequencing machines, error-correct and normalize the data, and finally make it available to researchers and clinicians for custom online analysis.

The resulting picture of the immune system is many, many times more granular than anything the world has seen before … which opens up the door to some pretty amazing applications. For example, our clonoSEQ diagnostic tool can monitor residual disease in post-treatment cancer patients far more accurately than traditional approaches — and that can save real lives.

The team at Adaptive also believes that software has to be a key differentiator for the company. There is a metric ton of additional value we can create on top of the raw data, from better error-correction at the front of the process to new ways of analyzing and sharing data at the end. It’s a wide open world — and the most exciting problem I’ve attacked in a long time. I may be a biology newbie, but hey, how hard can it be, right?

Anyways — the point of this new blog is to share some of the great things I’m learning here at Adaptive. There’ll be some biology-for-dummies, conversations about software challenges both unique to and shared across industries, and when it makes sense a bit of personal anecdote as well.

If you spent any time reading about my last gig at, you’ll be right at home here. You can also find me on Twitter as @importimmunity. Woo hoo!