What is Multivariate Analysis
Multivariate analysis is the best way to summarize a data tables with many
variables by creating a few new variables containing most of the information.
These new variables are then used for problem solving and display, i.e.,
classification, relationships, control charts, and more.
The new variables, the scores, denoted by t, are created as weighted linear
combinations of the original variables. Each observations has t-values.
PCA, the basic MV method, summarizes one data table.
Plotting the scores (t’s) gives an overview of the observations (objects)
PLS summarizes simultaneously 2 data tables (X the predictor variables) and
(Y the response variables) in order to develop a relationship between them
PCA and PLS are called Projection methods
What is a Projection?
Reduction of dimensionality, model in latent variables
– Summarizes the information in
the observations as a few new
(latent) variables
– The swarm of points in a K
dimensional space
(K = number of variables) is
approximated by a
(hyper)plane and the points
are projected on that plane.
Each obs has values of t (and u) – Each variable has values of p (and w and c)
t: the X scores; the new summarizing variables (coordinates in the hyper
plane of X-space)
u: the Y scores in PLS; the new summarizing variables (coordinates in the
hyper plane of Y-space, when Y is multidimensional)
p: the PC loadings. These are the weights that in PCA combine the original
variables in X to form the new variables, scores t.
w*: the PLS weights. These are the weights that in PLS combine the
original variables in X to form the new variables, scores t.
c: the weights used to combine the Y's to form the scores u.
Each obs has values of t (and u) – Each variable has values of p (and w and c)
One Component consists of one t and one p (PCA) or t, p, w, u, c (PLS).
The total number of components is A.
Model: The data are approximated by a plane or hyper plane, (the model)
with as many dimensions as components extracted.
DModX: also called Distance to the model, is the distance of a given
observation to the model plane.
T2: Hotelling’s T2, is a combination of all the scores (t) of all A components.
T2 measures how far away an observation is from the center of a PC or PLS
R2X: The fraction of the variation of the X variables explained by the model.
R2Y: The fraction of the variation of the Y variables explained by the model.
Q2X: The fraction of the variation of the X variables predicted by the model.
Q2Y: The fraction of the variation of the Y variables predicted by the model.
MVA – SIMCA Road Map
Methods available
Preprocessing; trimming and Winsorizing (take away extremes)
Principal Components Analysis (PCA; overview of data)
Projection to Latent Structures (PLS; relationships X↔Y)
Simca classification
PLS-discriminant analysis (classification)
Hierarchical PCA and PLS
Predictions and classification of new data using any model
MVA – SIMCA Road Map
Data set = all data; Work set = working copy of data
1. Start a project
File New
Read Data File
Specify Label Cols & Rows
2. Look at the data
Data set
Quick Info; Variables or Obs.
Preprocessing, Trim, etc.
3. Prepare a work copy
variables, observations
Preprocessing, Class spec.
4. Fit the model
or fast button
Work main menus from left
to right
5. Plot results
Scores, Loadings
Distance to Model
and pop-up menus from up
to down
6. Outliers in scores
Polish data
Prepare new workset
Graphically or via Workset
Plot / List allows you to plot or
list anything non-standard, not
found under Analysis
6. No outliers in scores
Interpret model (plots)
Relate to Objective
7. New data
Select Pred.set (observations)
T_pred, Y_pred, DModX, etc.
Steps in using SIMCA-P using the wizard
Start a new project and import the data set
Use the workset wizard to guide through building the workset and fitting the
Generate the report writer to walk through the model results and
When displaying Simca-P plots always use the Analysis adviser to guide
Workset wizard on
Workset wizard
Autotransform variables
To transform all variables if any needed, mark the check box
Automatic creation of classes for classification or
Selection and Fit of model
Report writer
Walks you through the model results with interpretation : File | Generate Report
Steps in Using SIMCA-P, Advanced Mode
Start a new project and import the data set
Explore and preprocess the data
Make working copy of selected data (workset) for model building
Specify model type and fit it to the workset
Review fit (plots, diagnostics, coefficients, etc.)
Generate Report
1a. File New
Starting a new project
Select the data file containing the raw data of the project
– directory, file type (XLS, DIF, TXT, …..), file name
A Wizard opens (see next page) allowing you to specify (optionally) the
row containing the Variable names, and (optionally) the columns with
the Obs. Numbers and Names
Here (Commands) you can also do additional things such as
– transposing the input data matrix
Use simple mode with workset wizard
At the last Wizard page, you can (optionally) specify another name and
directory for the project.
A map of the missing data is shown
The Wizard finishes and puts you in the Simca-window
A starting work set (M1, all data, all X-s, UV -scaled) is ready
1b. The second screen of the Wizard
2. Looking at the data
With the data set table open (Data set edit):
Quick Info (both var and obs windows can be open)
– variables
– observations
Moving the cursor in the data set table up and down, or sidewise, changes
the displayed variable and observation
In the quick info options you can specify what you want to look at
(histograms, auto-correlations, …), as well as which items should be the
basis for the plots
View variables or Observations, Trim, etc.
Quick Info
3. Prepare a work copy: The Workset
Simple Mode with guidance, or Advanced Mode
In Workset, you prepare a working copy of the part of the data you will
analyze, i.e., use as the basis of your model.
Here you specify transformation, scaling, and roles of variables (X or Y or
Also, you select the observations (your “training set”).
You can start with the previous workset (Workset / New as model xx) and
then modify it, e.g., excluding observations.
Whatever you do in Workset does NOT touch the raw data
Note that outliers are just specified as “not included” in the next workset (the
“polished” data). Outliers are NEVER removed from the raw data set.
Workset: two Modes, Simple and Advanced
4. Analysis
Fit the Model to the Workset Data
Either menu “Analysis / Autofit” or Fast Button
A model with appropriate number of components is found
– If nothing happens, get the two first components
(also menu or fast button)
A table appears showing the model, component by component.
More components can be added (menu or fast button)
Double click on a model to specify a title
5. Plot results
Analysis / menu (or fast buttons)
Summary / X/Y-Overview shows R2 and Q2 for all var.s
Scores – scatter plot, t1-t2 and t1-u1 & t2-u2 (PLS)
Loadings – scatter plot (p1-p2 fro PCA, wc1-wc2 for PLS)
Distance to Model – line plot
Contribution plots to interpret interesting observations, e.g. outliers, jumps,
For all plots, the right mouse button, properties allows choice of plot
markers, and more
The graphical tool box allows further modifications
6a. Outliers were seen in the score plot
(well outside the Hotelling ellipse)
Start another workset
(either from Workset / New as model xx, or using the graphical tool-box to
remove outliers from the score plot)
Note that outliers should NOT be deleted from the data by Edit/Data set
When the new workset is all-right, return to “4. Analysis” to fit a new model
to the new work set
(fast button or Analysis/Autofit)
6b. No outliers were seen in the score plots
(or they have been excluded, and the score plots now look all-right)
Now, interpret the model
Look at “patterns”, trends, etc., in the score plots
Inspect the loading plots to interpret the above patterns
Look at DModX
What do these patterns say about the objective of the investigation?
Analysis Advisor to understand and interpret model results
7. Predictions
New Data, Prediction Set
Under Predictions, specify the set of observations for which predictions will
be made, the prediction set
New data can be read in as a secondary data set
(File / Import) and predictions can be made for these
Prediction set / Complement WS, gives a prediction set with those
observations that were not in the training set
Predictions / Y-predicted, T-predicted, etc., calculates and displays the
predicted values accordingly
8. Generate the report, with customizable templates
Use of these slides
You may use any or all of these slides in your own presentations, provided
that you keep (and do not modify) the Umetrics logo and web reference
If you have any problems with the software, or with understanding of the
material, please e-mail us at
