Overview of `surgeRy`

surgeRy is a package that helps the user organize and implement efficient, augmented training with ANTsRNet. the primary target audience is those who are working with volumetric data but 2D will work too. the basic idea is to make training more efficient through numpy-based I/O of the “infinite” training data that we can often generate.

the augmentation strategy will use:

geometric transformation of images (rigid, affine, etc)
gaussian noise
simulated bias field
histogram warping

to create new versions of images from an input image list. Segmentations will be augmented as well and there are options for treating the segmentations as points. One can convert points (in physical space) to segmentation images via ANTsR function makePointsImage or by converting physical points to an index via antsTransformPhysicalPointToIndex. If one uses the latter approach, one should encode each point by a unique label.

Setting up the data

Run on both threads when setting up

Environment variables control multi-threading for CPU in ANTs and tensorflow. See below for how to set these at the beginning on an experiment. We set the number of threads to 12 below.

Sys.setenv("TF_NUM_INTEROP_THREADS"=12)
Sys.setenv("TF_NUM_INTRAOP_THREADS"=12)
Sys.setenv("ITK_GLOBAL_DEFAULT_NUMBER_OF_THREADS"=12)

See the code below for how to assemble the images for augmentation.

Run on both threads when setting up

# record the GPU
gpuid = Sys.getenv(x = "CUDA_VISIBLE_DEVICES")
library( ANTsR )
library( ANTsRNet )
library( tensorflow )
library( keras )
library( tfdatasets )
library( reticulate )
library( patchMatchR )
library( surgeRy )
np <- import("numpy")
mytype = "float32"
nlayers = 4   # for unet
downsam = 4   # downsamples images
# collect your images - a list of lists - multiple entries in each list are fine
rids <- function( x, downsammer=4 ) {
  img = ri( x )
  resampleImage( img, dim(img)/downsammer, useVoxels=TRUE )
}
upperhalf = rids( 1 ) * 0 + 1 # this just gives us more segmentation labels
upperhalf[33:64,1:32] = 2
upperhalf[1:32,33:64] = 3
upperhalf[33:64,33:64] = 4
ilist = list(
  list( rids( 1 ), getMask( rids( 1 ) ) ),
  list( rids( 2 ), getMask( rids( 2 ) ) ),
  list( rids( 3 ), getMask( rids( 3 ) ) ),
  list( rids( 4 ), getMask( rids( 4 ) ) ),
  list( rids( 6 ), getMask( rids( 6 ) ) ) )

slist = list()
for ( k in 1:length( ilist ) ) {
  temp = thresholdImage( ilist[[k]][[1]], "Otsu", 3 )
  temp = temp * ( ilist[[k]][[2]] - iMath( ilist[[k]][[2]], "ME", 6 ) )
  slist[[k]] = thresholdImage( temp, 2, 2 ) * upperhalf
  plot( ilist[[k]][[1]], slist[[k]] )
}

Here are definitions for some variables important for training: the numbers defining the segmentation labels and which images are for training and testing.

Run on both threads when setting up

# identify the number of segmentation classes
segregions = sort( unique( slist[[1]]  ))[-1] # the seg numbers
isTrain = c( rep(TRUE,length(slist)-1), FALSE )

This example will do landmark learning. If doing segmentation, then include the zero label i.e. just sort( unique( slist[[1]] )).

Preparing to write the data to disk

We will generate (in this case) 5 files. For each, we generate a random file unique ID and then create file extensions that map to the type of augmented data we will be generating. In this case, we will generate augmented images, the associated masks, the points, coord conv output and a heatmap. These all derive from the image lists we defined above. This stuff will be run on CPU separately from the training loop.

Run on CPU thread when setting up

nFiles = 5
if ( ! exists( "uid" ) )
  uid = paste0("LM",sample(1:1000000,nFiles),"Z")
types = c("Images.npy", "Points.npy", "mask.npy", "coordconv.npy", "heatmap.npy" )
nmats = length(types) # 5 arrays out
trainTestFileNames = data.frame( matrix("",nrow=nFiles,ncol=nmats*2))
colnamesTT = c(
  paste0("train",gsub(".npy","",types)),
  paste0("test",gsub(".npy","",types)) )
colnames( trainTestFileNames ) = colnamesTT
for ( k in 1:nFiles ) {
  twogroups = paste0("numpy/",uid[k],c("train","test"))
  npextsTr = paste0( twogroups[1], types )
  npextsTe = paste0( twogroups[2], types )
  trainTestFileNames[k,]=as.character( c(npextsTr,npextsTe) )
}
write.csv( trainTestFileNames, "numpy/LMtrainttestfiles.csv", row.names=FALSE)

Augmenting the data

We generate the data based on default parameters. If you want to learn more about the possibilities here, then please change the parameters and take a look at the results. We will show an example that looks at some parameter variations below.

The call below will write a test file to numpy. We just write one in this case as we keep it constant during training. Write as many as you want for your application

Run on CPU thread when setting up

testfilename = as.character(trainTestFileNames[1,grep("test",colnames(trainTestFileNames))])
gg = generateDiskData(
    inputImageList = ilist,
    segmentationImageList = slist,
    segmentationNumbers = segregions,
    selector = !isTrain,
    addCoordConv = TRUE,
    segmentationsArePoints = TRUE,
    smoothHeatMaps = 3.0,
    maskIndex = 2,
    transformType = "scaleShear",
    noiseParameters = c(0, 0.05),
    sdSimulatedBiasField = 0.01,
    sdHistogramWarping = 0.01,
    sdAffine = 0.1, # limited
    numpynames = testfilename,
    numberOfSimulations = 8
    )
# visualize example augmented images
layout( matrix(1:8,nrow=2))
for ( k in 1:8 ) {
  temp = as.antsImage( gg[[1]][k,,,1] )
  plot( temp )
  }

We would typically run something like the code below in a continuous loop on a CPU thread separate from the GPU training loop. In practice, you may keep the parameters the same across train and test or set minimal augmentation parameters for test data.

Run on CPU thread when setting up

while( TRUE ) {
  for ( k in 1:nFiles ) {
    trnfilename = as.character(trainTestFileNames[k,grep("train",colnames(trainTestFileNames))])
    print( paste(k, trnfilename[1] ) )
    gg = generateDiskData(
        inputImageList = ilist,
        segmentationImageList = slist,
        segmentationNumbers = segregions,
        selector = isTrain,
        addCoordConv = TRUE,
        segmentationsArePoints = TRUE,
        smoothHeatMaps = 3.0,
        maskIndex = 2,
        transformType = "scaleShear",
        noiseParameters = c(0, 0.05),
        sdSimulatedBiasField = 0.01,
        sdHistogramWarping = 0.01,
        sdAffine = 0.1,
        numpynames = trnfilename,
        numberOfSimulations = 64
        )
    }
  }

Getting the above steps correct is very critical to successful training. This package currently hides some of that complexity for basic segmentation and landmark prediction. However, it is always possible to make mistakes if the training decisions are inconsistent with the underlying theory. A forthcoming publication will explain the theory associated with this package.

set up the unet

Below is the default set up for unet-based lanmdark prediction given the above augmentation choices.

Run on GPU thread when setting up

library( ANTsR )
library( ANTsRNet )
library( tensorflow )
library( keras )
library( tfdatasets )
library( reticulate )
library( patchMatchR )
library( surgeRy )
np <- import("numpy")
mytype = "float32"
nlayers = 4   # for unet
downsam = 4   # downsamples images
# set up the network - all parameters below could be optimized for the application
unet = createUnetModel2D(
       list( NULL, NULL, 1 ),
       numberOfOutputs = length( segregions ),
       numberOfLayers = nlayers, # should optimize this wrt criterion
       numberOfFiltersAtBaseLayer = 32, # should optimize this wrt criterion
       convolutionKernelSize = 3, # maybe should optimize this wrt criterion
       deconvolutionKernelSize = 2,
       poolSize = 2,
       strides = 2,
       dropoutRate = 0,
       weightDecay = 0,
       additionalOptions = c( "nnUnetActivationStyle" ),
       mode = c("regression")
     )
maskinput = layer_input( list( NULL, NULL, 1 ) )
posteriorMask <- layer_multiply(
  list( unet$outputs[[1]] , maskinput ), name='maskTimesPosteriors'  )
unet = keras_model( list( unet$inputs[[1]], maskinput ), posteriorMask )
unetLM = patchMatchR::deepLandmarkRegressionWithHeatmaps( unet,
  activation = 'none', theta=0 )
unetLM1 = patchMatchR::deepLandmarkRegressionWithHeatmaps( unet,
  activation = 'relu', theta=0 )

Reading the data from disk

Now we have to read the data from disk in a parallel thread. We handle this by looking at the file modification times and we, by default, choose to read the file that was modified 2nd most recently.

The functions below handle how to choose which image to read for training. Some experimentation may be needed to get the timing correct. You want to balance the time it takes to augment versus the time it takes to train through a batch.

Run on GPU thread when setting up

trtefns = read.csv( "numpy/LMtrainttestfiles.csv" ) # critical - same file name
trnnames = colnames(trtefns)[grep("train", colnames(trtefns) )]
tstnames = colnames(trtefns)[grep("test", colnames(trtefns) )]
loadfirst = chooseTrainingFilesToRead( trtefns[,trnnames] )
Xtr = surgeRy::loadNPData( loadfirst )
mybs = dim( Xtr[[1]] )[1]
Xte = surgeRy::loadNPData( trtefns[1,tstnames] )

spatial tensors are indexed in the following way:

first index: subject (or batch) index
second index: x coordinate
third index : y coordinate
fourth index : z or channels
last/fifth index (if available): channels

e.g in this example we see:

> dim( Xtr[[1]] )
[1] 64 64 64  1

which is the size of the batch (64) and then spatial dimension of the input image (64x64) with just one channel.

point coordinate tensors are indexed in the following way:

first index: subject (or batch) index
second index: which landmark we are going to index (eg the first, second etc)
third index : the point / landmark coordinate itself which will be a 2 vector in 2D and 3 vector in 3D

e.g.

> dim( Xtr[[2]] )
[1] 64  4  2

which is the size of the batch (64) and then number of landmarks (4 here) with 2 spatial dimensions.

set up the training history data frame

Run on GPU thread when setting up

mydf = data.frame()
epoch = 1
wtfn=paste0('lm_weights_gpu', gpuid,'.h5')
csvfn = paste0('lm_weights_gpu', gpuid,'.csv')

Using the data in a training loop for a neural network

below is a “standard” tensorflow training loop with some alterations that make some efficiencies / choices a little more explicit but still clear. at least, that is what’s intended.

Run on GPU thread when setting up

epoch = 1
for ( ptwt in c( 0.001, 0.005, 0.01, 0.05, 0.1 ) ) {
  if ( ptwt == 0.01 ) unetLM = unetLM1
  ptWeight = tf$cast( ptwt, mytype )
  num_epochs <- 100
  if ( ptwt == 0.1 ) num_epochs = 1500
  optimizerE <- tf$keras$optimizers$Adam(1.e-6)
  batchsize = 2
  for (epoch in 1:num_epochs ) {
    if ( (epoch %% round(mybs/batchsize) ) == 0 & epoch > 1  ) {
        # refresh the data
        locfns = chooseTrainingFilesToRead( trtefns[,trnnames] )
        print( locfns[1] )
        Xtr = surgeRy::loadNPData( locfns )
      }
    ct = nrow( mydf ) + 1
    mysam = sample( 1:nrow(Xtr[[1]]), batchsize )
    datalist = list()
    datalist[[2]] = array( Xtr[[2]][mysam,,], dim=dim(Xtr[[2]][mysam,,]) ) %>% tf$cast( mytype )
    for ( jj in c(1,3:5) )
      datalist[[jj]] = array( Xtr[[jj]][mysam,,,],
        dim=c(batchsize,tail(dim(Xtr[[jj]]),3)) ) %>% tf$cast( mytype )
    with(tf$GradientTape(persistent = FALSE) %as% tape, {
      preds = unetLM( datalist[c(1,3:4)] )
      lossht = tf$keras$losses$mse( datalist[[5]], preds[[1]] ) %>% tf$reduce_mean( )
      losspt = tf$keras$losses$mse( datalist[[2]], preds[[2]] ) %>% tf$reduce_mean( )
      loss = losspt * ptWeight + lossht
      })
    unet_gradients <- tape$gradient(loss, unetLM$trainable_variables)
    optimizerE$apply_gradients(purrr::transpose(list(
        unet_gradients, unetLM$trainable_variables )))
    mydf[ct,'train_loss'] = as.numeric( loss )
    mydf[ct,'train_ptloss'] = as.numeric( losspt )
    mydf[ct,'train_htloss'] = as.numeric( lossht )
    mydf[ct,'ptWeight'] = as.numeric( ptWeight )
    if( epoch > 3 & epoch %% 10 == 0 ) {
      with(tf$device("/cpu:0"), {
        preds = predict( unetLM, Xte[c(1,3:4)] )
        lossht = tf$keras$losses$mse( Xte[[5]], preds[[1]] ) %>% tf$reduce_mean( )
        losspt = tf$keras$losses$mse( Xte[[2]], preds[[2]] ) %>% tf$reduce_mean( )
        loss = tf$cast(losspt, mytype) * ptWeight + tf$cast(lossht, mytype)
      })
      # compute the same thing in test data
      mydf[ct,'test_loss'] = as.numeric( loss )
      mydf[ct,'test_ptloss'] = as.numeric( losspt )
      mydf[ct,'test_htloss'] = as.numeric( lossht )
      if ( mydf[ct,'test_ptloss'] <= min(mydf[1:epoch,'test_ptloss'],na.rm=TRUE) ) {
        print(paste("Saving",epoch))
        keras::save_model_weights_hdf5( unetLM, wtfn )
        gc()
      }
    }
  print( mydf[ct,] )
  write.csv( mydf, csvfn, row.names=FALSE )
  }
}

monitoring training progess

take a look at the training-test convergence curves.

Run on CPU thread

if ( ! exists( "mydf" ) ) mydf = read.csv( 'lm_weights_gpu.csv' )
mydfnona = mydf[ !is.na( mydf$test_loss),  ]
plot( ts( mydfnona ) )

show the predicted points in the test data

Visualize the predicted points.

layout(matrix(1:2,nrow=1))
wsub = 1
testimg = as.antsImage( Xte[[1]][wsub,,,1] ) %>%
  antsCopyImageInfo2( rids( 1 ) )
preds = predict( unetLM, Xte[c(1,3:4)] )
# these are in physical space
predPoints = as.array( preds[[2]] )[wsub,,]
truPoints = Xte[[2]][wsub,,] # true points
print(paste("Error",norm(truPoints-predPoints,"F")))
print(paste("%Error",norm(truPoints-predPoints,"F")/norm(truPoints,"F")*100,"%"))
truIndex = round( antsTransformPhysicalPointToIndex( testimg, truPoints ) )
predIndex = round( antsTransformPhysicalPointToIndex( testimg, predPoints ) )
# make point images
truPointsImage = predPointsImage = testimg * 0
for ( j in 1:nrow( truPoints ) ) {
  truPointsImage[truIndex[j,1],truIndex[j,2]]=j
  predPointsImage[predIndex[j,1],predIndex[j,2]]=j
  }
plot( testimg, iMath( truPointsImage, "GD",2) )
plot( testimg, iMath( predPointsImage, "GD",2) )

look at some of the underlying images

Visualize the predicted images.

layout(matrix(1:8,nrow=2))
wsub = 6
for ( wpt in 1:length(segregions) ) {
  preds = predict( unetLM, Xte[c(1,3:4)] )
  testimg = as.antsImage( Xte[[1]][wsub,,,1] )
  testmsk = as.antsImage( Xte[[3]][wsub,,,1] )
  testht = as.antsImage( Xte[[5]][wsub,,,wpt] )
  testcc = as.antsImage( Xte[[4]][wsub,,,2] ) # coord conv
  testhtp = as.antsImage( preds[[1]][wsub,,,wpt] )
  plot( testimg, testht )
  plot( testimg, testhtp )
  }

Inference in completely out of sample data

Inference follows the same patterns as above but there are many ways that one can implement it. Please implement inference on your own.

Run on CPU thread when done

Future work

The package will evolve with the associated needs of related projects. NOTE: this example is really very trivial and is not meant to “work” on a real problem. However, it has been tested in very close form on real non-trivial problems with very similar parameters.

surgeRy: basic example explained

2021-12-05

Overview of `surgeRy`

Setting up the data

Preparing to write the data to disk

Augmenting the data

set up the unet

Reading the data from disk

set up the training history data frame

Using the data in a training loop for a neural network

monitoring training progess

show the predicted points in the test data

look at some of the underlying images

Inference in completely out of sample data

Future work

surgeRy: basic example explained

2021-12-05

Overview of surgeRy

Setting up the data

Preparing to write the data to disk

Augmenting the data

set up the unet

Reading the data from disk

set up the training history data frame

Using the data in a training loop for a neural network

monitoring training progess

show the predicted points in the test data

look at some of the underlying images

Inference in completely out of sample data

Future work

Overview of `surgeRy`