Blog

Introducing the oasis R package

oasis_logo

Introducing the oasis R package — a R package implementation of OASIS is Automated Statistical Inference for Segmentation (OASIS) currently hosted on github. The OASIS algorithm takes structural brain MRI studies with a FLAIR, T1, T2, and PD volume from patients with multiple sclerosis (MS) and produces OASIS probability maps of MS lesion presence, which can be thresholded into a binary lesion segmentation. With the oasis R package, you can make OASIS probability maps of lesion presence using the model described in the OASIS paper. You can also retrain an OASIS model using your own data and gold standard lesion segmentations and make predictions from this model. This blog post provides a detailed tutorial on how to use the oasis R package. The post is divided into four sections, namely:

  1. SETUP — setup for your workstation to use the oasis R package
  2. CREATING FEATURES FOR THE OASIS MODEL — using the oasis R package to create a dataframe(s) of OASIS feature vectors from an MRI study(ies) for training an OASIS model from your data
  3. TRAINING THE OASIS MODEL — training an OASIS model from your data
  4. OASIS PROBABILITY MAPS — using your trained OASIS model or the OASIS model from the paper to make an OASIS probability map of lesion presence

Many thanks to my package coauthors John Muschelli and Taki Shinohara, as well as Amanda Mejia who helped test the package. I especially could not have made this package without my coauthor John, who taught me how to make an R package and helped with each step along the way — John you are the best!

SETUP

The first thing that you need to do is install FSL, which we will be calling through R for image preprocessing and image smoothing using the fslr R package. It is a good idea to have the most up to date version of FSL possible. Instructions for installing FSL can be found here:

http://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FslInstallation

After installing FSL, load the oasis package from github. The package oasis depends on the packages oro.nifti and fslr, so both of these packages will be loaded when you load oasis.

library(devtools)
install_github('emsweene/oasis')
library(oasis)

Next, determine the location of FSL on your machine. You can set the path for FSL for fslr, as follows:

options(fsl.path=’/path/to/fsl’)

CREATING FEATURES FOR THE OASIS MODEL

To train your model and make OASIS segmentations you need a FLAIR, T1, T2, and PD volume for each MRI study. For training the model you will also need a binary masks of `gold standard’ lesion segmentations for each study, that are in the same space as the T1 study. The code below loads in the images from a single MRI study using the oro.nifti package.

FLAIR = readNIfTI(‘/path/to/flair’, reorient = FALSE)
T2 = readNIfTI(‘/path/to/t2’, reorient = FALSE)
T1 = readNIfTI(‘/path/to/t1’, reorient = FALSE)
PD = readNIfTI(‘/path/to/pd’, reorient = FALSE)
GOLD_STANDARD = readNIfTI(‘/path/to/gold_standard’, reorient = FALSE)

We can visualize the data by plotting the T1 volume with the ‘gold standard’ manual lesion segmentation overlayed onto the image.

overlay = GOLD_STANDARD
overlay[GOLD_STANDARD == FALSE] = NA
orthographic(T1, overlay, col.y = "orange")

Rplot_1

Next you can prepare the data for fitting the OASIS model using the function oasis_train_dataframe. This function returns a dataframe of the features for the OASIS model for voxels that are part of the voxel selection procedure. The output of the oasis_train_dataframe function can be used with the oasis_training function. In the case that you have not preprocessed the data, you can use the following code that will preprocess the data for you. The preprocessing steps are image inhomogeneity correction and rigid registration to the space of the T1 volume. Please note that if you have already preprocessed or skull stripped your data before analysis the preprocessing steps from FSL may fail. So it is best to only call this on data that has not yet been preprocessed.

study_1_dataframe = oasis_train_dataframe(flair = FLAIR,
t1 = T1,
t2 = T2,
pd = PD,
gold_standard = GOLD_STANDARD,
brain_mask = NULL,
preproc = TRUE,
normalize = TRUE,
cores = 4)

Here we have set brain_mask = NULL, which runs the FSL brain extraction tool with fslr and creates a brain mask. If you have already have skull stripped your data, running the brain extraction tool may produce poor results. We have also set preproc= TRUE, which means that the data will be preprocessed using FSL. We have set normalize = TRUE, which means that the z-score normalization over the brain mask will be performed on all images, as described in the paper.  Setting cores = 4 allows some of image preprocessing steps and feature extraction to be run in parallel.  The code for this package is written in a way so that no more than 4 cores are ever used, therefore setting cores any higher than 4 will not speed up the process.

names(study_1_dataframe)
##[1] "FLAIR" "T1" "T2" "PD" "FLAIR_10"
##[6] "T1_10" "T2_10" "PD_10" "FLAIR_20" "T1_20"
##[11] "T2_20" "PD_20" "GoldStandard"

Setting return_preproc = TRUE returns the preprocessed data and allows you to quality check the preprocessing. In the case where you set return_preproc = TRUE, the function oasis_train_dataframe will return a list containing a dataframe for use with the oasis_training function, the FLAIR volume, the T1 volume, the T2 volume, the PD volume, the brain mask for the subject, and the voxel selection mask.

study_1_dataframe_preproc = oasis_train_dataframe(flair = FLAIR,
t1 = T1,
t2 = T2,
pd = PD,
gold_standard = GOLD_STANDARD,
brain_mask = NULL,
preproc = TRUE,
normalize = TRUE,
return_preproc = TRUE,
cores = 4)

names(study_1_dataframe_preproc)
##[1] "oasis_dataframe" "flair" "t1" "t2"
##[5] "pd" "brain_mask" "voxel_selection"

names(study_1_dataframe_preproc[[1]])
##[1] "FLAIR" "T1" "T2" "PD" "FLAIR_10"
##[6] "T1_10" "T2_10" "PD_10" "FLAIR_20" "T1_20"
##[11] "T2_20" "PD_20" "GoldStandard"

Often, you may only have a subset of slices with gold standard segmentations. You can also pass in the slice numbers and the orientation of the slices using the slices and orientation option below.

study_1_dataframe_slices = oasis_train_dataframe(flair = FLAIR,
t1 = T1,
t2 = T2,
pd = PD,
gold_standard = GOLD_STANDARD,
brain_mask = NULL,
preproc = TRUE,
normalize = TRUE,
slices = c(80,90,100),
orientation = "axial",
cores = 4)

If you have your own brain mask in the space of the T1 volume and/or you have processed your own data, you can use the following code:

BRAIN_MASK = readNIfTI(‘/path/to/brain_mask’ , reorient = FALSE)

study_1_dataframe_brain_mask= oasis_train_dataframe(flair = FLAIR,
t1 = T1,
t2 = T2,
pd = PD,
gold_standard = GOLD_STANDARD,
brain_mask = BRAIN_MASK,
preproc = FALSE,
cores = 4)

TRAINING THE OASIS MODEL

In the case where you have data from just one subject, you can train the OASIS model as shown below. The oasis_training function works on objects returned from oasis_train_dataframe . Then function oasis_training produces a glm object, which can be used to make probability maps of lesion presence on future MRI studies.

oasis_model_one_subject = oasis_training(study_1_dataframe)

In the case where you have multiple studies, say study_1_dataframe, study_2_dataframe, and study_3_dataframe, which are all outputs of the function oasis_train_dataframe, then you can just run the function on all of the dataframes:

oasis_model_multi_subject = oasis_training (study_1_dataframe,
study_2_dataframe, study_3_dataframe)

When you have returned a list from oasis_train_dataframe that contains the preprocessed images, you need to set the option remove_preproc = TRUE on the oasis function, so that these images are removed before fitting the model.  For example, we returned the preprocessed images in study_1_dataframe_preproc, therefore we would run

oasis_model_one_subject = oasis_training (study_1_dataframe_preproc, remove_preproc = TRUE)

OASIS PROBABILITY MAPS

Making predictions from the OASIS model can be done using the oasis_predict function. To make predictions using the model from the paper, we can use the option model = NULL. Again, we have the option to preprocess the data, using the option preproc = TRUE.

oasis_maps_paper_model = oasis_predict(flair = FLAIR ,
t1 = T1,
t2 = T2,
pd = PD,
brain_mask = NULL,
preproc = TRUE,
normalize = TRUE,
model = NULL,
cores = 4)

We can also make predictions with our own model, using the model option.

oasis_maps_user_model = oasis_predict(flair = FLAIR,
t1 = T1,
t2 = T2,
pd = PD,
brain_mask = NULL,
preproc = TRUE,
normalize = TRUE,
model = oasis_model_one_subject,
cores = 4)

In addition, you can also return the binary lesion segmentation created from the OASIS probability map, by setting the parameter binary = TRUE. The default for this segmentation is created using the threshold of 0.16 in the OASIS paper (this threshold has a Dice Similarity Coefficient of 0.61 on the validation set in the paper). You can also supply your own threshold value using the threshold parameter.

oasis_maps_user_model_binary = oasis_predict(flair = FLAIR,
t1 = T1,
t2 = T2,
pd = PD,
brain_mask = NULL,
preproc = TRUE,
normalize = TRUE,
model = oasis_model_one_subject,
binary = TRUE,
threshold = 0.50,
cores = 4)

We now look at the overlaid binary segmentation on the T1 volumes,

names(oasis_maps_user_model_binary)
##[1] "oasis_map" "binary_map"

overlay = oasis_maps_user_model_binary$binary_map
overlay[oasis_maps_user_model_binary$binary_map == FALSE] = NA
orthographic(T1, overlay, col.y = "orange")

hi

This concludes the tutorial for oasis R package. If you have any questions or suggestions for functionality for the package, please leave them in the Comments section below.

Advertisements

Same Job, Different Seed

Often, it is desirable to submit the same job to a computing cluster, changing only a few variables each time. For example, you may wish to run a simulation where random variables are generated from the same distribution, but each time with a different seed. Here, we will be generating 100 observations from an exponential distribution with 100 different seeds to illustrate how to submit jobs with different seeds. All of the files from this post can be downloaded from my github.

First, move to the directory on the cluster in which the file you are running is. When you submit the jobs to the cluster, you can submit them with the following code:

#!/bin/bash
for seed in {1..100}
do
qsub -cwd batch.sh $seed
done

Here we have 100 seeds — in this example I am just using the numbers 1 through 100.  We use a for loop which passes a different seed when submitting each job, indicated by the $seed at the end of the qsub command. If you want to pass n variables to your job this can be done by simply writing $var1 $var2 … $varn at the end of the qsub command. The command ‘-cwd’ tells us that the output from our job should go in the current working directory.  Each job is submitted through the file batch.sh. The batch.sh file reads:

Rscript random_exponential.R $1

Rscript is a command mode of R with a terminal and can be used as a shell script.  This is followed by the name of the R file we wish to submit. The last part indicates that we will be passing the seed to the R job. If you wish to pass n variable to your R job, you can pass them by having $1 $2 … $n after the name of the file.

Now, lets look at the file random_exponetial.R

seed <- commandArgs(TRUE)
##set the seed
set.seed(seed)
##generate 100 exponential random variables with rate = 1
random.exp <- rexp(100, rate = 1)
##take the mean of the random variables
mean.exp <- mean(random.exp)
##save the mean
save(mean.exp, file = paste0(seed, 'output.Rdata'))

The command commandArgs() provides access to a copy of the command line arguments supplied when this R session was invoked. We have assigned this to the variable ‘seed’. Note that if you have passed n variables to your R job, ‘seed’ will be a vector of length n. The remainder of the R code generate the 100 exponential random variables, calculates the mean, and saves the mean.  Notice in the last line, the seed is used in the name of the output file, which will be extremely useful for the next step.

To combine the means, open and R session on the cluster and then run the following code from the combine.R file:

##set working directory to location of the output of random_exponential.R
setwd()
##create a vector to collect the means
mean <- c()
## load in the mean from each set
for (i in 1:100){
load(paste0(i, 'output.Rdata'))
mean[i] <- mean.exp
}
##plot a histogram of the means
pdf('Hist.pdf')
hist(means, main = 'Histogram of Means')
dev.off()

The file combine.R produces this plot (an illustration of the Central Limit Theorem):

Hist copy

 

Acknowledgments:  Sincerest thanks to Taki Shinohara who taught me how to do this!  And also many thanks to David Lenis and Jean-Philippe Fortin for the useful feedback on this post.

6 Writing Tips

This post was inspired by reading the book How to Write a Lot and John Muschelli‘s post on the book, fittingly entitled How to Write a Lot.  John provides an excellent summary of the book along with his reflections.  Here I share 6 tips for writing that have worked well for me:

1) Don’t outline aloneThe first time I wrote a paper, my coauthor and long time collaborator Taki Shinohara was generous enough to write the outline with me.  We discussed what belongs in each section, the best angle to present the paper from, and the figures for the paper. It was a tremendous help, as I had no idea what I was doing.  And after a thorough outlining, that paper pretty much wrote itself.

2) “Copy” someone else’s great paper.    This is great advice for scientific writing, although I bet it is terrible advice for creative writing.  Typically when I sit down to write a paper I find three or four papers that I deem to be of high quality that are similar to my paper and are from the journal that I plan to submit to (first…).  I keep these papers close and use them as a guide while outlining and writing my paper.  There is no sense or benefit in starting from scratch when so many people have already written great papers.

3) Always write.  I stole this great habit from Mandy Mejia.  Mandy and I are working on a project together and at every group meeting she always has a two or three page document detailing everything she has done the last few weeks.   Aside from making meetings much more productive, having documents like this helps the paper writing process go much more smoothly.  You can copy and paste text from these documents into the final paper.  And writing about the topic consistently puts you in the correct mind frame of how to begin writing the final paper.  It also stops you from forgetting what you have done.

4) Know yourself.  How to Write a Lot encourages scientific writes to carve out a time to write.  I find, it is really important to make that time fit nicely with your natural schedule.  Here is how I have successfully done so.  I wake up pretty early — when I am on a good schedule, this is somewhere between 4 and 5.  So when I am writing, I make a rule that I can’t do anything else (except for making a cup of coffee and a quick breakfast) until I’ve written for 4 hours.  By the time I come into school, I’ve already accomplished 4 hours of solid writing!  And while this would probably be torture for someone else, it is relatively painless for me.  A great blog post from James Clear details the daily routines of 12 famous writers — I’m a Kurt Vonnegut!

5) Do drink and draft.  Opening a blank document to start outlining and writing the final paper is really intimidating.  To overcome this, I always have a small glass of wine to ease me into my first writing session (And just for those who are paying attention, I will typically start a paper in the evening, after work — my life is not an episode of Mad Men!).  I find myself to be less critical of my writing and it is a great way to get something down that I can start working with.  There’s is even a bit of research to support this strategy.  Brunke and Gilbert  had 11 social drinkers perform a writing task and found that the writers produced a significantly larger amount of creative writing while intoxicated.  And while scientific wiring is not creative writing, I still find having a small drink to be a helpful strategy to start the writing process.

6) Trade papers.  Once you have finished writing (or sometimes even at a halfway point), it is great to trade papers with someone who is not a coauthor and see what they think.  I’m often trading papers with John Muschelli or Jean-Philippe Fortin to get feedback.  They key is to trade.  You have to find someone who also needs a paper to be read, because typically no one really wants to edit an entire paper unless they are an author.  I’m lucky that my Dad will often proofread papers for me (for nothing in return!).  There is a great story where, after 5 or so coauthors read the paper, my Dad found that we had used the word “asses” instead of “assess” throughout  the manuscript.  Nice save!

Coursera Motivation

Over the past year, I have signed up for over 10 Coursera classes.   My research involves working with Neuroimaging data and I want to take Coursera classes in order to learn some of the basics of Neuroscience.   I’m tired of being in meetings with collaborators and referring to parts of the brain by frantically gesturing to that point on my head.  And please do not tell anyone that I imagine the blood brain barrier as a trash bag covering the brain, because I have no better biological intuition to go on.

I love the idea of Coursera, but more specifically I really love the idea when I sign up for the courses months in advance.  Whenever the courses start I always come up with a slew of excuses.  But this time, something is different.  A few months ago, I emailed my dad about a course I though we would both enjoy: Drugs and the Brain by Henry A. Lester at the California Institute of Technology.   My dad is a retired chemist from a pharmaceutical company, so this is where our interests overlap.  And when the course started, I did what I always do and made up excuses. Until I go this email:

To: Elizabeth Sweeney
From: Daniel Sweeney

Subject: Graduate School at CalTech

Elizabeth,

I passed my quiz with a 17.5/22 for the first week at Caltech and I am now off to a bar with some of my brighter classmates to celebrate until Saturday.  I hope you and some of the other students manage to pass your quiz.

Best of luck,

Dad

Oh snap. Ever confident, I started watching the lectures and replied:

To: Daniel Sweeney
From: Elizabeth Sweeney
Subject: Re: Graduate School at CalTech

Dad,

I’m done with 3 of the 11 mini lectures.  Myself and the other more studious members of our class are taking our time with the material and intend to score a bit higher on this weeks quiz. Have fun at the bar with the rest of the C students!

Elizabeth

And today I am proud to say that I have finished my first week of Coursera lectures.   But I need to be a little more careful when I trash talk the C students in this course.  This week’s lectures and quiz proved difficult for me, as they required a background in Chemistry.  On my first quiz attempt I scored an 11.75/22.  After reviewing my mistakes and going over some of the material again, I took the quiz a second time and scored  a 20.50/22.  Not so bad (although all the questions were exactly the same…).  I am happy that I am picking up many of the broader topics from the course, which I think will be useful.  For example I now know that the blood brain barrier is made up of tight junctions between endothelial cells (not a trash bag).  Polar molecules and proteins cannot diffuse across the blood brain barrier. But non polar molecules can, such as nicotine. I ‘m planning to continue with the course, as I am sure to receive more motivating emails from my dad along the way!

What I am Reading: Part I

Just Kids (Patti Smith). This memoir details the close friendship and rise to fame of the author, singer-songwriter Patti Smith, and photographer Robert Mapplethorpe.   The pair met in Brooklyn, New York in the late 1960s.  I have to be honest and admit that I only read half the book.   I loved the beginning — a description of an intense romance and friendship where both partners inspire each other to create and explore their artistic depths in a variety of mediums: drawing, painting, poetry, photography, theater and even jewelry making.   The book is also a beautiful description of “starving artists” trying to make money to survive, while maintaining time to devote to their work.  It was interesting to read about lives that are so different from mine (for example, I have never had the experience of going hungry becauseI needed money to buy art supplies), but that I could still strongly relate to, as both Smith and Mapplethorpe are driven by intense passion and dedication to ones work.  But, when the two start to meet iconic people from the late 60s and  photo-22early 70s, I could not follow, and subsequently lost interest.  While I could catch the names like Andy Warhol, Janis Joplin, and Jimi Hendrix, there are dozens of other people they meet at the Hotel Chelsea and elsewhere that I just did not know.  And , sadly, understanding and enjoying the book really hinges on this knowledge.

An Unquiet Mind: A Memoir of Moods and Madness (Kay Redfield Jamison). A memoir of life with manic-depressive illness by an expert on the disorder; Kay Redfield Jamison is currently professor of psychiatry at Johns Hopkins who specializes in mood disorders. Unlike Just Kids, I devoured this book in less than a week.   Redfield describes the highs and lows of being an academic – intensified by the fact that she has manic-depressive disorder.   She details periods of amazingly productive and passionate manias, followed by periods of struggle, depression, and at one point a suicide attempt.  She also discusses how, even as a trained clinician, she was resistant to treatment with Lithium.  Like many patients, she found the drug to be dulling and did not want to stop having manic phases.   Redfield also discusses the experience of being an objective researcher of an illness she herself has and how difficult it was to publicly disclose her illness for fear of how her colleges would react.   This book is deeply personal, very honest and informative.  It is not just as a description of manic depressive disorder, but also of life as an academic.

Next on my list are Touched by Fire: Manic-Depressive Illness and the Artistic Temperament (Kay Redfield Jamison) and Naked to the Bone: Medical Imaging in the Twentieth Century (Bettyann Holtzmann Kevles).  I am especially excited about Naked to the Bone.  This book gives a history of a variety of medical imaging modalities (X-rays, CT, MRI and PET and ultrasound imaging), which should provide some historical context around my current area of research.

Converting .dcm to .nii Image Format in R

## This code reads a dicom image, converts it to a nifit and then writes the nifit file.
##  Requires the packages oro.dicom and oro.nifti.
##
## Elizabeth Sweeney
## August 26th 2013

library(oro.dicom)
library(oro.nifti)

dicom <- readDICOM(‘~/folder’, flipud = FALSE)
nifti <- dicom2nifti(dicom)
writeNIfTI(nifti, filename = “nifti”, verbose = TRUE, gzipped = TRUE)