R + immunoSEQ = Awesome!

We are really proud of the immunoSEQ Analyzer — it’s a great tool with a ton of charts and visualizations specifically tareted at immunosequencing. And since the field is new to many folks, it’s great to have an environment where we can share the approaches and techniques we’ve developed over the years. If you’re an Adaptive customer and you’re not using it — you’re missing out.

But at the same time, there are a zillion analysis and visualization tools out there, and each of them has their own strengths. Our goal is to make sure that immunoSEQ data shines in all of those environments, because better tools == faster discoveries == better treatments for real people.

I’m really excited today to show off our latest such integration. “rSEQ” is an installable package that makes it super-easy to work with immunoSEQ data with R. You can read all the gory details in this technote (free Analyzer login required), and I’ll walk through a simple example here so you can get an idea of just how awesome it is.

You’ll need the popular devtools package in order to install rSEQ. The technote provides some details on this, but in its simplest form all you need to do is:

  • packages("devtools")
  • library(devtools)
  • install_url("https://clients.adaptivebiotech.com/assets/misc/rSEQ.zip")


Now you’re ready to create an authenticated session to immunoSEQ. For this, you’ll use the same credentials you use to log into the web site; as in:

  • library(rSEQ)
  • rseq = rSEQ_init('you@example.com','yourpassword')


For this demo, though, we’ll use the public immunoSEQ demo account — so even if you don’t have your own immunoSEQ account, you can follow along:

  • library(rSEQ)
  • rseq = rSEQ_initDemo()


We hold onto that “rseq” variable because it’s the token that we’ll use in future calls into immunoSEQ. Now let’s look at a few calls:

  • w = rSEQ_workspaces(rseq)
  • s = rSEQ_samples(rseq, w[1,"Workspace.ID"])


Snipping from my RStudio environment window, you see that w and s both contain data frames. w lists all the workspaces I have access to (just one for the demo account), and s is the list of all samples in that workspace (the rSEQ_samples call takes a workspace ID, which I’ve cut out of the workspace data frame above).

w_s

Once I have samples with results, I can fetch the actual sequence data for those samples, and pretty easily do some neat things. One of our most common real-world use cases is tracking “minimum residual disease” in blood cancers — in short, identifying the dominant clones in a pre-treatment sample, then watching post-treatment to make sure that clone is (mostly) gone. Our demo workspace includes a few MRD cases, so let’s build a really simple clone tracker that identifies those dominant clones and then shows before and after values.

To be sure, this is a simplistic and only marginally-useful example — it only includes one “after” sample, and uses a very naïve threshold to decide what is “dominant.” But nonetheless, it’s a pretty sweet example of what you can do with immunoSEQ data in R with a VERY small amount of code.

First, let’s download the sequence data for a pair of before and after sequences. I’ll cut out sample IDs by name to do this:

  • seqDiag = rSEQ_sequences(rseq, s[s$Name=="TCRG_MRD_Pre_Case1","Sample.ID"])
  • seqMRD = rSEQ_sequences(rseq, s[s$Name=="TCRG_MRD_Day29_Case1","Sample.ID"])


These guys are downloading a lot of data, so be patient! (Fun exercise for the reader — why does seqMRD take so much longer to download than seqDiag? I’m not telling, you’ll have to figure that out for yourself.)

Go ahead and take a look inside these guys … there is a TON of useful information in there, ranging from the nucleotide and amino acid strings, to V and J gene usage, N insertions, and much more. These are the same columns that our “Sample Export” feature returns.

Right away, you can do some neat things; try a species histogram of CDR3 lengths for the Diagnostic sample:

  • hist(seqDiag[,"cdr3Length"])


Pretty neat. But back to MRD. What we’re interested here is just finding the clones that make up more than 5% of the repertoire pre-treatment. Paste this function into your R session:

  cloneTracker = function(seqDiag, seqMRD, thresholdPct) {
    diags = seqDiag[seqDiag$frequencyCount.... >
      thresholdPct,c("nucleotide","frequencyCount....")]
    track = merge(diags, seqMRD, by = "nucleotide")
    track = track[,c("nucleotide","frequencyCount.....x","frequencyCount.....y")]
    ct = t(data.matrix(track[,2:3]))
    colnames(ct) = track[,1]
    rownames(ct) = c("Diag","MRD")
    attr(ct, "thresholdPct") = thresholdPct
    return(ct)
  }

Then call it, passing in your sequences and a threshold of 5%:

  • ct = cloneTracker(seqDiag, seqMRD, 5)


This function first selects out the relevant clones from the diagnostic sample, joins those rows with the MRD sample and then massages them into a simple table that shows before and after frequencies for all identified clones (two in this case):

preposttable

Cool! Even better is showing these guys in a simple time series plot, using this function:


  plotCloneTracker = function(ct) {
    plot.ts(ct, plot.type="single", col=2:(ncol(ct)+1), axes=FALSE,
      ylab="Frequency Count (%)", xlab="",
      main=paste("Simple Clone Tracker (",ncol(ct)," clones)",sep=""))
    box()
    axis(2)
    axis(1, at=1:2, lab=c("Diag","MRD"))
    abline(attr(ct, "thresholdPct"), 0, col=1)
  }

And of course, call it:

  • plotCloneTracker(ct)


ctplot

FANCY! And in case it’s not obvious, I’m no R virtuoso by a longshot. I am really, really excited to see what our research customers can do with this stuff. Have a great function or plot? Let me know and I would love to share it here.

So. Much. Fun. And just wait until you see what’s coming next. Until then!

Advertisement

Flu Season, ugh ….

What better time to ponder the flu than during a winter plane trip, sitting next to a hipster 20-year-old you’ve never met coughing and sleeping on your shoulder? At least I had an aisle seat. And New Year’s with my new nephew and family was super-fun. And I have Purell — lots of Purell.

Anyways, it is that time of year, and according to the CDC it’s starting to look pretty ugly here in the USA. The key challenge seems to be that the annual vaccine isn’t working as well as it typically does. This isn’t a reason not to get a shot (it still appears at least 40-50% effective), but it is interesting. We create the Northern Hemisphere’s vaccine cocktail each year by analyzing what happened during our summer in the Southern Hemisphere. Unfortunately, one “flavor” of the flu has mutated too much since then — antibodies generated from H3N2 in the vaccine can’t fight off the strain that’s now in the wild.

fludata

Our ongoing war with Influenza is pretty amazing, actually. There’s new drama every year as public health teams try to keep ahead of a diverse and rapidly-changing enemy. My first exposure to the cycle was during the “swine flu” freakout in 2009-2010, when we worked with Emory to build tools to help people assess the best course of action for their families. But only now through my work at Adaptive am I starting to really grok the mechanics of what is going on.

Not surprisingly, it’s all about receptors again. A bunch of spiky receptors stick out from each flu “virion” (the cool name for a unit of virus). About 80% of them are hemagglutinin (“HA”), which has the job of binding to and breaking into the host cell where it can reproduce. The remaining 20% are neuramindase (“NA”), which helps it break back out of the cell to expand the infection. ***

There’s pretty much a perfect storm of factors making the flu so successful:

  • HA binds to cells that express sialic acid — which is a whole ton of our cells, especially in the respiratory system . So no matter how it enters your body, it’s likely to find a match.
  • As an RNA virus, flu mutates more quickly than our DNA can — so it has an advantage in the evolutionary arms race (this mutation-based evolution is called “antigenic drift”).
  • A host that contracts two flu strains at once can easily end up producing a brand new combination of H and N thanks to genetic reassortment… this “antigenic shift” is much more dramatic that drift and typically is the cause of pandemic events.
  • The mechanisms behind the flu work in other mammals too, and while it’s unusual for viruses to “jump” between species, it does happen — so we’ve got a bunch of helpers transmitting influenza strains around the world.

Because there are so many different strains of flu (with more being created all the time), it’s hard to create a vaccine that can hit them all. You have to create the right cocktail, and you have to predict it pretty far in advance so that we can make enough to cover the population.

So what about Tamiflu? More evidence that science is pretty cool, and at the same time that we don’t know that much. Vaccination aside, we haven’t yet been successful at stopping these viruses once they get inside the body. But remember the “NA” side of the equation! Neuramindase enables manufactured flu virions to escape the host cell they were created in, so they can move around the body and infect more cells. Tamiflu and its ilk inhibit the action of neuramindase, which limits just how much the infections can grow before our adaptive immune system finally catches up — shortening the duration and reducing the severity of symptoms. Cool stuff. It feels a little lame to be running along behind the virus trying to slow it down, but I’ll take it over nothing! The WHO tracks effectiveness of these NA inhibitors with various strains so that we can use them optimally.

I’m not sure yet if learning these details makes me more or less of a germophobe! But the complexity of the system is amazing, and I’m proud to be going into 2015 fighting for the good guys. Bottom line — get your shot, wash your hands, and stay home if you get sick. Always comes back to the basics.

Take care!

*** As an aside, the “H” and “N” here are why we call strains “H1N1”, “H3N2”, and so on. There’s something like eighteen basic classes of hemagglutinin (three that impact humans), and nine for neuramindase. But like all things in immunology, it’s not that simple. Even with the same H and N, strains can act very differently — for example, our current “2009” version of H1N1 is fundamentally different from the H1N1 that had come before, so nobody was ready for it. There’s a whole international classification convention for this stuff.