Stepwise Discriminant Analysis

The Principle Component Analysis and Correspondence Analysis of the Prairie du Chien data have provided some useful information about the data set but have not enabled discrimination of distinct source areas within the field area. The next statistical tool to be employed for this purpose is Discriminant Analysis. First, however, there is a question of what variables (elements) to input into the process. During the above analysis of Principle Component data, nine elements were chosen to be most useful based upon their relative variability and influence on the distribution of the data set. Also, four element ratios were identified as being potentially useful. The nine elements with the largest influence in PCA will not necessarily be the most useful elements for Discriminant Analysis, however.

In order to see which variables have the greatest power for discriminating between sample locations, all of the elements and the four element ratios were entered into a Stepwise Discriminant Analysis process using the SAS procedure stepdisc. This technique is somewhat misleadingly named, since it does not actually perform a Discriminant Analysis of the data set. Instead, it calculates linear discriminant functions with which to discriminate the sample groups. An iterative process is performed in which it starts with a single variable (element), calculates the rate of success for discrimination, adds another variable, calculates the improvement or worsening in discrimination, adds another variable, and so on until there are no longer any variables which improve the results. In this stepwise manner, the program chooses which variables in a data set have the best discriminatory power and which ones are not useful.

In the above PC analysis of the data set, it was revealed that the overall Prairie du Chien geochemically clusters rather tightly, with much overlap between sample locations. This situation is expected to make the discrimination between locations very difficult. Therefore, the Stepwise Discriminant Analysis was first conducted with the instructions for the program to distinguish between each individual sample location, and the analysis was then run a second time with the instructions to distinguish between six geographic groups of sample locations (see Figure 8). The first analysis chose seven elements and three element ratios (see Table 7); all other variables were not considered additionally helpful by the program. The second analysis chose four elements and one element ratio (see Table 8), only one of which was not chosen in the first analysis. Together, there is a total of eleven variables chosen as being useful for discrimination between locations. Three of the elements (Sc, Sm, and Ce) were essentially chosen twice, however: once as individual elements, and once as part of an element ratio. Therefore the variables chosen for Discriminant Analysis were Al, Cr, Cs, Fe, Eu, Ce/Sm, La/Sc, and La/Sm. Notably, these are eight of the same variables which were found to be the most useful in the Principle Component Analysis.

 

Average

Squared

Variable

Variable

Number

Partial

F

Prob >

Wilks'

Prob <

Canonical

Prob >

Step

Entered

Removed

In

R**2

Statistic

F

Lambda

Lambda

Correlation

ASCC

1

Fe

1

0.4761

8.452

0.0001

0.52389641

0.0001

0.02380518

0.0001

2

Al

2

0.3775

5.61

0.0001

0.32612303

0.0001

0.04267943

0.0001

3

La/Sm

3

0.3275

4.481

0.0001

0.21931392

0.0001

0.05865093

0.0001

4

Cs

4

0.2923

3.778

0.0001

0.15521867

0.0001

0.07279750

0.0001

5

La/Sc

5

0.2823

3.579

0.0001

0.11140711

0.0001

0.08629080

0.0001

6

Ce/Sm

6

0.2436

2.915

0.0001

0.08426557

0.0001

0.09752780

0.0001

7

Cr

7

0.2719

3.362

0.0001

0.06134996

0.0001

0.10847646

0.0001

8

Sm

8

0.204

2.293

0.0022

0.04883581

0.0001

0.11813657

0.0001

9

Eu

9

0.225

2.584

0.0005

0.03784672

0.0001

0.12716064

0.0001

10

Ce

10

0.1704

1.817

0.0218

0.03139942

0.0001

0.13508111

0.0001

Table 7: Results of Stepwise Discriminant Analysis. These ten variables were found to be helpful in discriminating between sample locations.

 

 

Average

Squared

Variable

Variable

Number

Partial

F

Prob >

Wilks'

Prob <

Canonical

Prob >

Step

Entered

Removed

In

R**2

Statistic

F

Lambda

Lambda

Correlation

ASCC

1

Fe

1

0.2485

13.292

0.0001

0.75151593

0.0001

0.04969681

0.0001

2

Cr

2

0.2135

10.857

0.0001

0.59107602

0.0001

0.09210010

0.0001

3

Al

3

0.1678

8.026

0.0001

0.49188396

0.0001

0.12125901

0.0001

4

Sc

4

0.1033

4.561

0.0006

0.44108327

0.0001

0.13995664

0.0001

5

La/Sm

5

0.0962

4.196

0.0012

0.39863234

0.0001

0.15668807

0.0001

Table 8: Results of Stepwise Discriminant Analysis. These five variables were found to be helpful in discriminating between groups of sample locations.


ppoint4.gif (297 bytes)     NEXT     PREVIOUS     TABLE OF CONTENTS

Intro and Background     Fieldwork     Sample Prep     Data Analysis     PCA     Correspondence Analysis    Stepwise DA   

Discriminant Analysis     More PCA     Element Trends     Conclusions     Bibliography     Appendix A: Part 1 Part 2 Part 3 Part 4 Part 5     Appendix B     Appendix C


The views and opinions expressed in this page are strictly those of the page author.
The contents of this page have not been reviewed or approved by the University of Minnesota.