REIMS Pre-processing

This workflow uses functions within the Turner class to pre-process a series of 1D REIMS files. It is designed to replace/complement Alvaro's original workflow which is unfortunately too hard to maintain indefinitely with the various bugs/requirements across the Python/R environments.
The basic pre-processing workflow consists of the following stages:

1. Check code requirements

When running this cell, all existing data are cleared, and checks are made for the code. It should complete without error; if any, then ensure that you have added the two repositories to the MATLAB path.
The assert function returns an error if the condition is not satisfied. In the two cases below, an error occurs if the required code cannot be found (to solve this, ensure that both folders are added to the MATLAB search path). If the code can be found, no error occurs and execution of the cell completes without any output to the command window.
clc;
clearvars;
assert(~isempty(which('Turner.m')),'Cannot find Turner code.');
assert(~isempty(which('Octopus.m')),'Cannot find Octopus code.');

2. Files and metadata

The files should be in a single folder, and the metadata CSV file should have the following columns: FILE, SAMPLE, CLASS, START_SCAN, END_SCAN. See the example metadata.csv file for a template.
% Data file location
dataPath = '/Users/jmckenzi/Dropbox/ICL/Collaborations/PP/Data/Wellplate/';
 
% Metadata (CSV file, for now)
csvPath = '/Users/jmckenzi/Dropbox/ICL/Collaborations/PP/Data/Wellplate/';
csvName = 'metadataMZ5.csv';
 
% Where to finally save the results?
exportPath = '/Users/jmckenzi/Dropbox/ICL/Collaborations/PP/Data/Wellplate_Output/';
exportName = 'Test_SelectedScans.csv';
 
% Check that the folders exist
assert(exist(dataPath,'dir') > 0,'Folder `dataPath` does not exist');
assert(exist([csvPath csvName],'file') > 0,'CSV file cannot be found');

3. Create instance of class

Construct instance of class, containing table and the default parameters. The likely causes of an error in this cell are related to the delimiter in the CSV file, which is ',' by default but can be changed to alternatives such as ';' which may solve the problem.
j = Turner(dataPath,[csvPath csvName],',');
 
% Check that the table has been imported correctly
disp(head(j.tab));
FILE SAMPLE CELL CLASS START_SCAN END_SCAN SCANNUMBER _______________________________ _____________ _________ _____________________ __________ ________ __________ {'2020_10_30_stef_HEK293T.mz5'} {'HEK_TR1' } {'HEK' } {'QC' } 3 13 11 {'2020_10_30_stef_HEK293T.mz5'} {'HEK_TR2' } {'HEK' } {'QC' } 15 26 12 {'2020_10_30_stef_HEK293T.mz5'} {'HEK_TR3' } {'HEK' } {'QC' } 28 38 11 {'2020_10_30_stef_HEK293T.mz5'} {'HEK_TR4' } {'HEK' } {'QC' } 40 50 11 {'2020_10_30_stef_HEK293T.mz5'} {'HEK_TR5' } {'HEK' } {'QC' } 52 62 11 {'2020_10_30_stef_LS180.mz5' } {'LS180_TR1'} {'LS180'} {'Colorectal Cancer'} 4 13 10 {'2020_10_30_stef_LS180.mz5' } {'LS180_TR2'} {'LS180'} {'Colorectal Cancer'} 15 25 11 {'2020_10_30_stef_LS180.mz5' } {'LS180_TR3'} {'LS180'} {'Colorectal Cancer'} 28 37 10

4. Modify parameters

The Turner class allows some parameters to be changed prior to import.
The scanMethod property determines which scans are considered for each file. By default the selected scans are used, which are those defined in the CSV file (START_SCAN:END_SCAN). Alternatively, you can set it to maximum which uses only the single most intense scan between the specified start and end scans.
Other properties to consider are:
% Change to extract all selected scans
j.scanMethod = 'selected';

5. Import files

Run the openFiles method to open each file and read in the TIC traces.
j = j.openFiles;

6. Average spectra

Determine average spectra for each file. The mz range is already specified by default, but can be changed if required.
j = j.averageSpectra;

7. Plot average spectra

Plot of the spectra. Note that this will likely be slow due to the high interpolated resolution of the data.
mask = j.raw.mz > 885 & j.raw.mz < 889;
figure;
plot(j.raw.mz(mask),j.raw.sp(:,mask));

8. PCA of the average spectra

Useful to identify regions/peaks which may be misaligned. Because the average spectra are from interpolated profile mode data, there is more than one point in each peak. As such, the loadings scatter plot looks slightly odd (second axes); the circular nature of the loadings is broadly normal. Saw-tooth profiles within the third axes denote likely misalignment of the data, but can be caused by differences in peak shapes. Peak picking is performed later which removes much of the redundant data.
% Few methods currently implemented
normalisation = 'tic'; % 'tic'
logtransform = true; % true | false
 
% Calculate average spectra
j.pcaAverage('raw',normalisation,logtransform);

9. Plot ions

The plotIons method can be used to plot an array of specific ions for all files. This might be useful to identify peaks to use for recalibration, or assess how well a subsequent recalibration performed.
The required input arguments are:
j.plotIons([255.23 885.55],'raw','plot');

10. Recalibrate

Using a series of reference peaks, determine the shift in ppm of each of these. These ppm deviations will then be used for determining an offset function to be applied to all spectra from that file.
Parameters to vary include:
% These need to be specified uniquely for each dataset
j.recalpeaks = [255.2330 885.5499];
 
% Run the function to perform recalibration
j = j.recalSpectra;
Warning: Polynomial is not unique; degree >= number of data points.
Warning: Polynomial is not unique; degree >= number of data points.
Warning: Polynomial is not unique; degree >= number of data points.
Warning: Polynomial is not unique; degree >= number of data points.
Warning: Polynomial is not unique; degree >= number of data points.
Warning: Polynomial is not unique; degree >= number of data points.
Warning: Polynomial is not unique; degree >= number of data points.
Warning: Polynomial is not unique; degree >= number of data points.
Warning: Polynomial is not unique; degree >= number of data points.
Warning: Polynomial is not unique; degree >= number of data points.
Warning: Polynomial is not unique; degree >= number of data points.

11. PCA to visualise effects of normalisation

Assess if the recalibration procedured has improved the data with respect to possible misalignment. Large misalignment should be corrected in this stage, but there will still be circular traces within the central plot.
% Few methods currently implemented
normalisation = 'tic'; % 'tic'
logtransform = true; % true | false
 
j.pcaAverage('recal',normalisation,logtransform);

12. Plot the peaks

Again use the plotIons method to visualise how the recalibration has affected certain peaks. You can include other peaks (i.e. those not used for recalibration) as required.
j.plotIons([j.recalpeaks 281.25],'recal','plot'); %281.25

13. Peak picking

Using the normed average spectrum over all files, perform FT-based peak picking. Note that the m/z resolution should be 0.001 for the function to work as expected.
Parameters to vary include:
% Pick the peaks, with optional figure
plotFig = true;
j = j.peakPick(plotFig);
Number of peaks = 729

14. Peak extract

For each of the peaks identified in the previous stage, return to the raw+recal data and extract the intensities.
Parameters to vary include:
% Extract the picked peaks
j = j.peakExtract;
Warning: Might need to open/import/recal files
Warning: Polynomial is not unique; degree >= number of data points.
Warning: Polynomial is not unique; degree >= number of data points.
Warning: Polynomial is not unique; degree >= number of data points.
Warning: Polynomial is not unique; degree >= number of data points.
Warning: Polynomial is not unique; degree >= number of data points.
Warning: Polynomial is not unique; degree >= number of data points.
Warning: Polynomial is not unique; degree >= number of data points.
Warning: Polynomial is not unique; degree >= number of data points.
Warning: Polynomial is not unique; degree >= number of data points.
Warning: Polynomial is not unique; degree >= number of data points.
Warning: Polynomial is not unique; degree >= number of data points.

15. PCA of extracted variables

Quick analysis to identify trends within the data, which might hint at further problems.
% Few methods currently implemented
normalisation = 'tic'; % 'tic'
logtransform = true; % true | false
 
% PCA and plotting function
j.pcaPeaks(normalisation,logtransform,'CLASS');
Warning: While saving an object of class 'matlab.graphics.chart.primitive.Line':
Recursion limit exceeded when saving instance of class to a MAT-file. This is either because the instance contains an extremely long chain of references, or the class definition contains an error.
Warning: While saving an object of class 'matlab.graphics.chart.primitive.Line':
Recursion limit exceeded when saving instance of class to a MAT-file. This is either because the instance contains an extremely long chain of references, or the class definition contains an error.

16. Export

Write the data to an output file, so that it might be able to be imported elsewhere. This needs to include the metadata as well.
% The quickest way is to convert data and metadata into a table, and export
% that...
if ~exist(exportPath,'dir')
mkdir(exportPath);
end
 
% Format m/z values to be non numeric
mzFun = @(x) ['x' sprintf('%0.4f',x)];
mzLab = arrayfun(mzFun,j.peaks.mz,'UniformOutput',false);
 
% Convert to table and append the metadata at the front
tmp = array2table(j.peaks.sp,'VariableNames',mzLab);
at = cat(2,j.peaks.meta,tmp);
 
% Write to a file
writetable(at,[exportPath exportName]);