Analysis of English Vocabulary Data in Metabonomics Articles


Pre-processing
Generic term for methods to go from raw instrumental data to clean data for data processing.
Pre-treatment
Transforming the clean data to make them ready for data processing (scaling, centering, etc).
Processing
The actual data analysis (PCA, PLS etc.).
Post-processing
Transforming the results from the processing for interpretation and visualization.
Validation
All activities aimed at assuring the quality of the conclusions drawn from the data analysis.
Interpretation
Hypothesis generated, pathways affected, or visualization of the data.
Pre-processing
 Deconvolution
Resolving overlapping peaks in an NMR spectrum or GC or LC chromatogram using the second dimension (usually MS). In the case of GC or LC this generates a peak table where each metabolite is represented by one variable.
Peak-picking
Peaks in an NMR or GC or LC-MS chromatogram are selected that may represent signals. This results in a table with rt_m/z channels and corresponding intensities.
Alignment
Synchronizing the chromatograms or NMR spectra such that each metabolite signal has the same retention time or chemical shift in each sample.
Pre-treatment
Normalization
Operation performed within or across rows to make the row profiles comparable in size.
Centering
Operation across the rows to translate the center of gravity of the dataset.
Mean-centering
Commonly used method for centering in which each column is expressed in deviations from its mean (across the rows). Subtracts the mean of the column, thereby translating the center of gravity of the data to the origin.
Scaling
Operation performed within a column to make the column profiles more comparable.
Autoscaling
A form of scaling which mean-centers each value of the column followed by dividing row entries of a column by the standard deviation within that column. Also called UV (unit variance) or Z- scaling.
Pareto scaling
Mean-centering followed by dividing row entries of a column through the square root of the standard deviation within that column.
Transformations
Transformations to linearize or otherwise change the scale of the data, e.g., to remove heteroscedastic noise.
Missing values
Data in the table which are not available for the analysis.
Outliers
Data points (samples, variables or a specific combination of both) which deviate from the distribution of the majority of the data.
Processing
Model
The model selected for analyzing the data (PCA, PLS, OPLS etc.)
Parameter
Parameters in models/methods that have to be fitted to the data.
Meta-parameter
A parameter that helps define the structure and optimization of the model.
Post-processing
Back-transformation
Transforming the data back to the original domain (if a transformation was performed prior to the analysis).
Visualization
Plots that represent the original data or the results from the data analysis in a such a way that facilitates interpretation.
Validation
Training set
Subset of samples used to estimate the parameters.
Monitoring set
Subset of samples used to estimate the metaparameters.
Test set
Subset of samples used to establish the generalizability of the model/method.