Introducing the oasis R package

oasis_logo

Introducing the oasis R package — a R package implementation of OASIS is Automated Statistical Inference for Segmentation (OASIS) currently hosted on github. The OASIS algorithm takes structural brain MRI studies with a FLAIR, T1, T2, and PD volume from patients with multiple sclerosis (MS) and produces OASIS probability maps of MS lesion presence, which can be thresholded into a binary lesion segmentation. With the oasis R package, you can make OASIS probability maps of lesion presence using the model described in the OASIS paper. You can also retrain an OASIS model using your own data and gold standard lesion segmentations and make predictions from this model. This blog post provides a detailed tutorial on how to use the oasis R package. The post is divided into four sections, namely:

  1. SETUP — setup for your workstation to use the oasis R package
  2. CREATING FEATURES FOR THE OASIS MODEL — using the oasis R package to create a dataframe(s) of OASIS feature vectors from an MRI study(ies) for training an OASIS model from your data
  3. TRAINING THE OASIS MODEL — training an OASIS model from your data
  4. OASIS PROBABILITY MAPS — using your trained OASIS model or the OASIS model from the paper to make an OASIS probability map of lesion presence

Many thanks to my package coauthors John Muschelli and Taki Shinohara, as well as Amanda Mejia who helped test the package. I especially could not have made this package without my coauthor John, who taught me how to make an R package and helped with each step along the way — John you are the best!

SETUP

The first thing that you need to do is install FSL, which we will be calling through R for image preprocessing and image smoothing using the fslr R package. It is a good idea to have the most up to date version of FSL possible. Instructions for installing FSL can be found here:

http://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FslInstallation

After installing FSL, load the oasis package from github. The package oasis depends on the packages oro.nifti and fslr, so both of these packages will be loaded when you load oasis.

library(devtools)
install_github('emsweene/oasis')
library(oasis)

Next, determine the location of FSL on your machine. You can set the path for FSL for fslr, as follows:

options(fsl.path=’/path/to/fsl’)

CREATING FEATURES FOR THE OASIS MODEL

To train your model and make OASIS segmentations you need a FLAIR, T1, T2, and PD volume for each MRI study. For training the model you will also need a binary masks of `gold standard’ lesion segmentations for each study, that are in the same space as the T1 study. The code below loads in the images from a single MRI study using the oro.nifti package.

FLAIR = readNIfTI(‘/path/to/flair’, reorient = FALSE)
T2 = readNIfTI(‘/path/to/t2’, reorient = FALSE)
T1 = readNIfTI(‘/path/to/t1’, reorient = FALSE)
PD = readNIfTI(‘/path/to/pd’, reorient = FALSE)
GOLD_STANDARD = readNIfTI(‘/path/to/gold_standard’, reorient = FALSE)

We can visualize the data by plotting the T1 volume with the ‘gold standard’ manual lesion segmentation overlayed onto the image.

overlay = GOLD_STANDARD
overlay[GOLD_STANDARD == FALSE] = NA
orthographic(T1, overlay, col.y = "orange")

Rplot_1

Next you can prepare the data for fitting the OASIS model using the function oasis_train_dataframe. This function returns a dataframe of the features for the OASIS model for voxels that are part of the voxel selection procedure. The output of the oasis_train_dataframe function can be used with the oasis_training function. In the case that you have not preprocessed the data, you can use the following code that will preprocess the data for you. The preprocessing steps are image inhomogeneity correction and rigid registration to the space of the T1 volume. Please note that if you have already preprocessed or skull stripped your data before analysis the preprocessing steps from FSL may fail. So it is best to only call this on data that has not yet been preprocessed.

study_1_dataframe = oasis_train_dataframe(flair = FLAIR,
t1 = T1,
t2 = T2,
pd = PD,
gold_standard = GOLD_STANDARD,
brain_mask = NULL,
preproc = TRUE,
normalize = TRUE,
cores = 4)

Here we have set brain_mask = NULL, which runs the FSL brain extraction tool with fslr and creates a brain mask. If you have already have skull stripped your data, running the brain extraction tool may produce poor results. We have also set preproc= TRUE, which means that the data will be preprocessed using FSL. We have set normalize = TRUE, which means that the z-score normalization over the brain mask will be performed on all images, as described in the paper.  Setting cores = 4 allows some of image preprocessing steps and feature extraction to be run in parallel.  The code for this package is written in a way so that no more than 4 cores are ever used, therefore setting cores any higher than 4 will not speed up the process.

names(study_1_dataframe)
##[1] "FLAIR" "T1" "T2" "PD" "FLAIR_10"
##[6] "T1_10" "T2_10" "PD_10" "FLAIR_20" "T1_20"
##[11] "T2_20" "PD_20" "GoldStandard"

Setting return_preproc = TRUE returns the preprocessed data and allows you to quality check the preprocessing. In the case where you set return_preproc = TRUE, the function oasis_train_dataframe will return a list containing a dataframe for use with the oasis_training function, the FLAIR volume, the T1 volume, the T2 volume, the PD volume, the brain mask for the subject, and the voxel selection mask.

study_1_dataframe_preproc = oasis_train_dataframe(flair = FLAIR,
t1 = T1,
t2 = T2,
pd = PD,
gold_standard = GOLD_STANDARD,
brain_mask = NULL,
preproc = TRUE,
normalize = TRUE,
return_preproc = TRUE,
cores = 4)

names(study_1_dataframe_preproc)
##[1] "oasis_dataframe" "flair" "t1" "t2"
##[5] "pd" "brain_mask" "voxel_selection"

names(study_1_dataframe_preproc[[1]])
##[1] "FLAIR" "T1" "T2" "PD" "FLAIR_10"
##[6] "T1_10" "T2_10" "PD_10" "FLAIR_20" "T1_20"
##[11] "T2_20" "PD_20" "GoldStandard"

Often, you may only have a subset of slices with gold standard segmentations. You can also pass in the slice numbers and the orientation of the slices using the slices and orientation option below.

study_1_dataframe_slices = oasis_train_dataframe(flair = FLAIR,
t1 = T1,
t2 = T2,
pd = PD,
gold_standard = GOLD_STANDARD,
brain_mask = NULL,
preproc = TRUE,
normalize = TRUE,
slices = c(80,90,100),
orientation = "axial",
cores = 4)

If you have your own brain mask in the space of the T1 volume and/or you have processed your own data, you can use the following code:

BRAIN_MASK = readNIfTI(‘/path/to/brain_mask’ , reorient = FALSE)

study_1_dataframe_brain_mask= oasis_train_dataframe(flair = FLAIR,
t1 = T1,
t2 = T2,
pd = PD,
gold_standard = GOLD_STANDARD,
brain_mask = BRAIN_MASK,
preproc = FALSE,
cores = 4)

TRAINING THE OASIS MODEL

In the case where you have data from just one subject, you can train the OASIS model as shown below. The oasis_training function works on objects returned from oasis_train_dataframe . Then function oasis_training produces a glm object, which can be used to make probability maps of lesion presence on future MRI studies.

oasis_model_one_subject = oasis_training(study_1_dataframe)

In the case where you have multiple studies, say study_1_dataframe, study_2_dataframe, and study_3_dataframe, which are all outputs of the function oasis_train_dataframe, then you can just run the function on all of the dataframes:

oasis_model_multi_subject = oasis_training (study_1_dataframe,
study_2_dataframe, study_3_dataframe)

When you have returned a list from oasis_train_dataframe that contains the preprocessed images, you need to set the option remove_preproc = TRUE on the oasis function, so that these images are removed before fitting the model.  For example, we returned the preprocessed images in study_1_dataframe_preproc, therefore we would run

oasis_model_one_subject = oasis_training (study_1_dataframe_preproc, remove_preproc = TRUE)

OASIS PROBABILITY MAPS

Making predictions from the OASIS model can be done using the oasis_predict function. To make predictions using the model from the paper, we can use the option model = NULL. Again, we have the option to preprocess the data, using the option preproc = TRUE.

oasis_maps_paper_model = oasis_predict(flair = FLAIR ,
t1 = T1,
t2 = T2,
pd = PD,
brain_mask = NULL,
preproc = TRUE,
normalize = TRUE,
model = NULL,
cores = 4)

We can also make predictions with our own model, using the model option.

oasis_maps_user_model = oasis_predict(flair = FLAIR,
t1 = T1,
t2 = T2,
pd = PD,
brain_mask = NULL,
preproc = TRUE,
normalize = TRUE,
model = oasis_model_one_subject,
cores = 4)

In addition, you can also return the binary lesion segmentation created from the OASIS probability map, by setting the parameter binary = TRUE. The default for this segmentation is created using the threshold of 0.16 in the OASIS paper (this threshold has a Dice Similarity Coefficient of 0.61 on the validation set in the paper). You can also supply your own threshold value using the threshold parameter.

oasis_maps_user_model_binary = oasis_predict(flair = FLAIR,
t1 = T1,
t2 = T2,
pd = PD,
brain_mask = NULL,
preproc = TRUE,
normalize = TRUE,
model = oasis_model_one_subject,
binary = TRUE,
threshold = 0.50,
cores = 4)

We now look at the overlaid binary segmentation on the T1 volumes,

names(oasis_maps_user_model_binary)
##[1] "oasis_map" "binary_map"

overlay = oasis_maps_user_model_binary$binary_map
overlay[oasis_maps_user_model_binary$binary_map == FALSE] = NA
orthographic(T1, overlay, col.y = "orange")

hi

This concludes the tutorial for oasis R package. If you have any questions or suggestions for functionality for the package, please leave them in the Comments section below.

Advertisements

Same Job, Different Seed

Often, it is desirable to submit the same job to a computing cluster, changing only a few variables each time. For example, you may wish to run a simulation where random variables are generated from the same distribution, but each time with a different seed. Here, we will be generating 100 observations from an exponential distribution with 100 different seeds to illustrate how to submit jobs with different seeds. All of the files from this post can be downloaded from my github.

First, move to the directory on the cluster in which the file you are running is. When you submit the jobs to the cluster, you can submit them with the following code:

#!/bin/bash
for seed in {1..100}
do
qsub -cwd batch.sh $seed
done

Here we have 100 seeds — in this example I am just using the numbers 1 through 100.  We use a for loop which passes a different seed when submitting each job, indicated by the $seed at the end of the qsub command. If you want to pass n variables to your job this can be done by simply writing $var1 $var2 … $varn at the end of the qsub command. The command ‘-cwd’ tells us that the output from our job should go in the current working directory.  Each job is submitted through the file batch.sh. The batch.sh file reads:

Rscript random_exponential.R $1

Rscript is a command mode of R with a terminal and can be used as a shell script.  This is followed by the name of the R file we wish to submit. The last part indicates that we will be passing the seed to the R job. If you wish to pass n variable to your R job, you can pass them by having $1 $2 … $n after the name of the file.

Now, lets look at the file random_exponetial.R

seed <- commandArgs(TRUE)
##set the seed
set.seed(seed)
##generate 100 exponential random variables with rate = 1
random.exp <- rexp(100, rate = 1)
##take the mean of the random variables
mean.exp <- mean(random.exp)
##save the mean
save(mean.exp, file = paste0(seed, 'output.Rdata'))

The command commandArgs() provides access to a copy of the command line arguments supplied when this R session was invoked. We have assigned this to the variable ‘seed’. Note that if you have passed n variables to your R job, ‘seed’ will be a vector of length n. The remainder of the R code generate the 100 exponential random variables, calculates the mean, and saves the mean.  Notice in the last line, the seed is used in the name of the output file, which will be extremely useful for the next step.

To combine the means, open and R session on the cluster and then run the following code from the combine.R file:

##set working directory to location of the output of random_exponential.R
setwd()
##create a vector to collect the means
mean <- c()
## load in the mean from each set
for (i in 1:100){
load(paste0(i, 'output.Rdata'))
mean[i] <- mean.exp
}
##plot a histogram of the means
pdf('Hist.pdf')
hist(means, main = 'Histogram of Means')
dev.off()

The file combine.R produces this plot (an illustration of the Central Limit Theorem):

Hist copy

 

Acknowledgments:  Sincerest thanks to Taki Shinohara who taught me how to do this!  And also many thanks to David Lenis and Jean-Philippe Fortin for the useful feedback on this post.

Converting .dcm to .nii Image Format in R

## This code reads a dicom image, converts it to a nifit and then writes the nifit file.
##  Requires the packages oro.dicom and oro.nifti.
##
## Elizabeth Sweeney
## August 26th 2013

library(oro.dicom)
library(oro.nifti)

dicom <- readDICOM(‘~/folder’, flipud = FALSE)
nifti <- dicom2nifti(dicom)
writeNIfTI(nifti, filename = “nifti”, verbose = TRUE, gzipped = TRUE)