REIMS Pre-processing

This workflow uses functions within the Turner class to pre-process a series of 1D REIMS files. It is designed to replace/complement Alvaro's original workflow which is unfortunately too hard to maintain indefinitely with the various bugs/requirements across the Python/R environments.

The basic pre-processing workflow consists of the following stages:

Basic checks to ensure that all code is available, and to update the paths relative to the data on your computer. Any errors here should will relate to missing code and incorrectly specified file paths/names.
File and metadata Ensure that filenames, sample and classes are entered in the metadata table, along with scan indices for each burn.
Average spectra The average spectrum for each file is determined, using the scans defined in the metadata table.
Recalibrate & determine average spectrum At least 2 reference ions need to be identified from the average spectra, which are to be used for recalibration of the raw data. These ions are likely specific to individual datasets. Using the offsets, the average spectrum for each file is imported. Misalignment of files should have been removed, which can be verified with PCA.
Peak picking Using the average of the average spectra, FT-based peak picking is performed. This provides a list of peaks which can then subsequently be extracted from the original files.
Extraction Using the defined list of peaks, intensities are extracted from the raw data, which is recalibrated on import. This data matrix can then be verified by means of a few QC checks, and then saved as a CSV files prior to further statistical analysis.

1. Check code requirements

When running this cell, all existing data are cleared, and checks are made for the code. It should complete without error; if any, then ensure that you have added the two repositories to the MATLAB path.

The assert function returns an error if the condition is not satisfied. In the two cases below, an error occurs if the required code cannot be found (to solve this, ensure that both folders are added to the MATLAB search path). If the code can be found, no error occurs and execution of the cell completes without any output to the command window.

clc;
clearvars;
assert(~isempty(which('Turner.m')),'Cannot find Turner code.');
assert(~isempty(which('Octopus.m')),'Cannot find Octopus code.');

2. Files and metadata

The files should be in a single folder, and the metadata CSV file should have the following columns: FILE, SAMPLE, CLASS, START_SCAN, END_SCAN. See the example metadata.csv file for a template.

% Data file location
dataPath = '/Users/jmckenzi/Dropbox/ICL/Collaborations/PP/Data/Wellplate/';
 
% Metadata (CSV file, for now)
csvPath = '/Users/jmckenzi/Dropbox/ICL/Collaborations/PP/Data/Wellplate/';
csvName = 'metadataMZ5.csv';
 
% Where to finally save the results?
exportPath = '/Users/jmckenzi/Dropbox/ICL/Collaborations/PP/Data/Wellplate_Output/';
exportName = 'Test_SelectedScans.csv';
 
% Check that the folders exist
assert(exist(dataPath,'dir') > 0,'Folder `dataPath` does not exist');
assert(exist([csvPath csvName],'file') > 0,'CSV file cannot be found');

3. Create instance of class

Construct instance of class, containing table and the default parameters. The likely causes of an error in this cell are related to the delimiter in the CSV file, which is ',' by default but can be changed to alternatives such as ';' which may solve the problem.

j = Turner(dataPath,[csvPath csvName],',');
 
% Check that the table has been imported correctly
disp(head(j.tab));
                 FILE                     SAMPLE          CELL               CLASS            START_SCAN    END_SCAN    SCANNUMBER
    _______________________________    _____________    _________    _____________________    __________    ________    __________

    {'2020_10_30_stef_HEK293T.mz5'}    {'HEK_TR1'  }    {'HEK'  }    {'QC'               }         3           13           11    
    {'2020_10_30_stef_HEK293T.mz5'}    {'HEK_TR2'  }    {'HEK'  }    {'QC'               }        15           26           12    
    {'2020_10_30_stef_HEK293T.mz5'}    {'HEK_TR3'  }    {'HEK'  }    {'QC'               }        28           38           11    
    {'2020_10_30_stef_HEK293T.mz5'}    {'HEK_TR4'  }    {'HEK'  }    {'QC'               }        40           50           11    
    {'2020_10_30_stef_HEK293T.mz5'}    {'HEK_TR5'  }    {'HEK'  }    {'QC'               }        52           62           11    
    {'2020_10_30_stef_LS180.mz5'  }    {'LS180_TR1'}    {'LS180'}    {'Colorectal Cancer'}         4           13           10    
    {'2020_10_30_stef_LS180.mz5'  }    {'LS180_TR2'}    {'LS180'}    {'Colorectal Cancer'}        15           25           11    
    {'2020_10_30_stef_LS180.mz5'  }    {'LS180_TR3'}    {'LS180'}    {'Colorectal Cancer'}        28           37           10    

4. Modify parameters

The Turner class allows some parameters to be changed prior to import.

The scanMethod property determines which scans are considered for each file. By default the selected scans are used, which are those defined in the CSV file (START_SCAN:END_SCAN). Alternatively, you can set it to maximum which uses only the single most intense scan between the specified start and end scans.

Other properties to consider are:

scanMethod: either 'maximum' or 'selected'
mzRange: which by default is [50 1500]
mzRes: which by default is 0.001

% Change to extract all selected scans
j.scanMethod = 'selected';

5. Import files

Run the openFiles method to open each file and read in the TIC traces.

j = j.openFiles;

6. Average spectra

Determine average spectra for each file. The mz range is already specified by default, but can be changed if required.

j = j.averageSpectra;

7. Plot average spectra

Plot of the spectra. Note that this will likely be slow due to the high interpolated resolution of the data.

mask = j.raw.mz > 885 & j.raw.mz < 889;

figure;

plot(j.raw.mz(mask),j.raw.sp(:,mask));

8. PCA of the average spectra

Useful to identify regions/peaks which may be misaligned. Because the average spectra are from interpolated profile mode data, there is more than one point in each peak. As such, the loadings scatter plot looks slightly odd (second axes); the circular nature of the loadings is broadly normal. Saw-tooth profiles within the third axes denote likely misalignment of the data, but can be caused by differences in peak shapes. Peak picking is performed later which removes much of the redundant data.

% Few methods currently implemented

normalisation = 'tic'; % 'tic'

logtransform = true; % true | false

% Calculate average spectra

j.pcaAverage('raw',normalisation,logtransform);

9. Plot ions

The plotIons method can be used to plot an array of specific ions for all files. This might be useful to identify peaks to use for recalibration, or assess how well a subsequent recalibration performed.

The required input arguments are:

ions: specify one or more m/z values to be plotted
data: plot either the 'raw' or 'recal' data
type: the data can be plotted as a 'plot' or 'image'

j.plotIons([255.23 885.55],'raw','plot');

10. Recalibrate

Using a series of reference peaks, determine the shift in ppm of each of these. These ppm deviations will then be used for determining an offset function to be applied to all spectra from that file.

Parameters to vary include:

recalpeaks (fundamental; must be specified)
recalmethod (default = 'linear')
recalunits (default = 'ppm')
recalmaxdev (default = 20 [ppm])

% These need to be specified uniquely for each dataset
j.recalpeaks = [255.2330 885.5499];
 
% Run the function to perform recalibration
j = j.recalSpectra;
Warning: Polynomial is not unique; degree >= number of data points.
Warning: Polynomial is not unique; degree >= number of data points.
Warning: Polynomial is not unique; degree >= number of data points.
Warning: Polynomial is not unique; degree >= number of data points.
Warning: Polynomial is not unique; degree >= number of data points.
Warning: Polynomial is not unique; degree >= number of data points.
Warning: Polynomial is not unique; degree >= number of data points.
Warning: Polynomial is not unique; degree >= number of data points.
Warning: Polynomial is not unique; degree >= number of data points.
Warning: Polynomial is not unique; degree >= number of data points.
Warning: Polynomial is not unique; degree >= number of data points.

11. PCA to visualise effects of normalisation

Assess if the recalibration procedured has improved the data with respect to possible misalignment. Large misalignment should be corrected in this stage, but there will still be circular traces within the central plot.

% Few methods currently implemented

normalisation = 'tic'; % 'tic'

logtransform = true; % true | false

j.pcaAverage('recal',normalisation,logtransform);

12. Plot the peaks

Again use the plotIons method to visualise how the recalibration has affected certain peaks. You can include other peaks (i.e. those not used for recalibration) as required.

j.plotIons([j.recalpeaks 281.25],'recal','plot'); %281.25

13. Peak picking

Using the normed average spectrum over all files, perform FT-based peak picking. Note that the m/z resolution should be 0.001 for the function to work as expected.

Parameters to vary include:

peakpickthreshold: essentially consider as a signal to noise ratio; higher values result in fewer peaks.

% Pick the peaks, with optional figure

plotFig = true;

j = j.peakPick(plotFig);

Number of peaks = 729

14. Peak extract

For each of the peaks identified in the previous stage, return to the raw+recal data and extract the intensities.

Parameters to vary include:

peakpicktol: the distance (in ppm) between a local maximum in a spectrum and one of the peaks identified during Turner.peakPick. Higher values increase the risk of picking an incorrect neighbouring peak; values that are too small fail to account for remaining m/z deviation and will result in a large number of zero intensity values.

% Extract the picked peaks
j = j.peakExtract;
Warning: Might need to open/import/recal files
Warning: Polynomial is not unique; degree >= number of data points.
Warning: Polynomial is not unique; degree >= number of data points.
Warning: Polynomial is not unique; degree >= number of data points.
Warning: Polynomial is not unique; degree >= number of data points.
Warning: Polynomial is not unique; degree >= number of data points.
Warning: Polynomial is not unique; degree >= number of data points.
Warning: Polynomial is not unique; degree >= number of data points.
Warning: Polynomial is not unique; degree >= number of data points.
Warning: Polynomial is not unique; degree >= number of data points.
Warning: Polynomial is not unique; degree >= number of data points.
Warning: Polynomial is not unique; degree >= number of data points.

15. PCA of extracted variables

Quick analysis to identify trends within the data, which might hint at further problems.

% Few methods currently implemented

normalisation = 'tic'; % 'tic'

logtransform = true; % true | false

% PCA and plotting function

j.pcaPeaks(normalisation,logtransform,'CLASS');

Warning: While saving an object of class 'matlab.graphics.chart.primitive.Line':
Recursion limit exceeded when saving instance of class to a MAT-file. This is either because the instance contains an extremely long chain of references, or the class definition contains an error.

Warning: While saving an object of class 'matlab.graphics.chart.primitive.Line':
Recursion limit exceeded when saving instance of class to a MAT-file. This is either because the instance contains an extremely long chain of references, or the class definition contains an error.

16. Export

Write the data to an output file, so that it might be able to be imported elsewhere. This needs to include the metadata as well.

% The quickest way is to convert data and metadata into a table, and export
% that...
if ~exist(exportPath,'dir')
    mkdir(exportPath);
end
 
% Format m/z values to be non numeric
mzFun = @(x) ['x' sprintf('%0.4f',x)];
mzLab = arrayfun(mzFun,j.peaks.mz,'UniformOutput',false);
 
% Convert to table and append the metadata at the front
tmp = array2table(j.peaks.sp,'VariableNames',mzLab);
at = cat(2,j.peaks.meta,tmp);
 
% Write to a file
writetable(at,[exportPath exportName]);