DATA ANALYSIS
This provenance study employed a number of statistical techniques to glean information from the geochemical data, including univariate Analysis of Variance, Correspondence Analysis, Canonical Correlation Analysis, Stepwise Discriminant Analysis, Discriminant Analysis, and Principle Component Analysis. All were useful, but the latter two were the main statistical tools employed, providing the bulk of the information for provenance investigation. Each provides a different means for describing, displaying, and interpreting the data set.
During the neutron activation process, the Prairie du Chien chert samples were analyzed for 55 different trace elements. Only nineteen of these, however, were consistently present in high enough concentrations to be measured accurately by the analyzing instruments (see Table 1). In the cases where a peak was not detected, the analyzing computer calculated the maximum amount of the element that might be present but undetected relative to background noise. These upper limits varied from sample to sample, depending upon the level of background static. Since the inherent uncertainty of such calculated values will significantly and negatively influence the quality of the statistical analyses, any elements not consistently detected throughout the samples were removed from consideration.
Of the nineteen elements that were consistently at detectable levels, four (Cl, Au, Ru, and Sb) were not present in the standard. Since they were not detected in the standard, the calculations for quantitative concentrations of these elements in the unknowns is not reliably accurate (Richard Cashwell, personal communication). Therefore, although these four elements were present in the unknowns, they were excluded from further consideration. After removing these elements, the resultant data set is a matrix containing 209 subjects (geologic samples) and 15 variables (elements) (see Appendix A for the full data set). The simple statistics for these fifteen elements can be seen in Table 2.
Element |
Mean |
Std. Dev. |
Minimum |
Maximum |
Al |
2876.65 |
955.43 |
757.00 |
11637.00 |
Br |
0.7318 |
0.5647 |
0 |
6.746 |
Ce |
0.6731 |
1.8125 |
0.154 |
26.138 |
Co |
0.3747 |
0.2789 |
0.165 |
2.285 |
Cr |
9.8513 |
3.0982 |
1.390 |
25.360 |
Cs |
0.1518 |
0.0621 |
0.052 |
0.360 |
Eu |
0.0834 |
0.4722 |
0.022 |
6.871 |
Fe |
438.96 |
749.18 |
31.90 |
7146.80 |
Hf |
0.1159 |
0.0767 |
0.035 |
0.534 |
La |
0.5229 |
3.6556 |
0 |
52.563 |
Mn |
17.961 |
55.141 |
0 |
574.430 |
Na |
125.40 |
66.47 |
23.50 |
393.32 |
Sc |
0.0676 |
0.0532 |
0.003 |
0.332 |
Sm |
0.1995 |
2.1012 |
0 |
30.411 |
U |
0.9412 |
0.9207 |
0 |
7.022 |
Table 2: Simple statistics of the data set, containing two-hundred nine samples.
During the INAA preparation process, samples #14, #57, and #62 were split, creating replicate samples to test the consistency of the analyses. With the exception of the cesium (Cs) values for samples #14 and #57, the elemental concentrations from these replicates were within one standard deviation of the mean for the sample locations to which they belong (see Appendix A for the elemental concentrations), indicating good consistency for the analyses.
With fifteen variables in the data set, it is impossible to create a simple graphical representation of the data. With only two or three variables, the data could be presented in a two- or three-dimensional graph, but fifteen variables (elements) would require at least a fifteen-dimensional space for traditional graphing methods. Multivariate statistical analytical methods provide the means to get around this difficulty. Such multivariate techniques are provide ways to reduce the "dimensions" of a data set down to an easily-represented and understood number, or else to supply meaningful statistics which describe or provide more easily understandable information about the multi-dimensional data set. Also, such techniques provide means of determining which variables (in this case, elements) are most and least helpful for discovering trends or information in the data matrix. In many cases, the unhelpful or redundant variables may then be removed from consideration.
There are numerous techniques which have been developed in order to accomplish these purposes, including methods such as Cluster Analysis, Principle Component Analysis, and Discriminant Analysis. Each provides a different approach to the interpretation of a multivariate data set and can provide somewhat different types of information. Discriminant Analysis and Principle Component Analysis are the most commonly used statistical tools for provenance studies, and they are the primary techniques used in this analysis of the Prairie du Chien geochemical data.
The Unix-based program SAS is the computer program which was used to execute the multivariate statistical analyses described above.
Normality of the Data Set
Multivariate statistical methods such as Principle Component Analysis (PCA), Discriminant Analysis (DA), and Cluster Analysis (CA) all have certain underlying requirements or assumptions concerning the input data set. For PCA and DA, one of these is the assumption of a multivariate normal distribution, which in turn requires that each individual variable have univariate normal distribution. The multivariate analyses used in this study in fact are quite robust at handling non-normal data sets, but better and more reliable results are achieved with multivariate normal input.
Unfortunately, it is probably not valid to take the entire 209-subject data matrix and test it for multivariate normality as a single body. This would only be valid if the entire data set were a representation of a random selection from a single homogeneous source. This whole study is aimed to discover whether or not portions of the Prairie du Chien Group are geochemically distinguishable from others; if so, then the overall data set may not be at all close to multivariate normal. In fact, it would be most useful if this latter case is true. Instead, the matrix must be examined to see if the data from each of the twenty sample locations have individual multivariate normality.
To test the assumption of normality, the SAS procedure univariate was used. This procedure produces simple descriptive statistics for each variable in each group, such as the mean and standard deviation. It also produces a box-and-whisker plot for each. These box-and-whisker plots are a simple way to visually compare the value and spread of data point distributions for each of the sample locations.
For the Prairie du Chien data, the element concentration values in general seemed to be skewed to the high end. This is not unusual for trace element data; in fact it seems to be a norm (e.g. Luedtke 1979; Luedtke and Meyers 1984). Most researchers have found that trace element data require a transformation to make the distributions more normal. Common transformations are to use log-scale values or X1/n (Rapp et al. 2000; de Bruin et al. 1976). The Prairie du Chien data set was converted to log values, and the univariate procedure was run again. In this case the box-and-whisker plots showed a much more normal, non-skewed distribution for all the variables. Therefore, for all of the subsequent multivariate tests the log values of the data set were used.
It was also noted during the tests for normality that there were two samples, one (#151) from the Lake City location and one (#115) from the Stillwater location which consistently appeared as major outliers from the data set. Sample #115 has elemental concentrations so extremely different from the other samples that it almost certainly is the result of an error in the chemical analytical process. Sample #151, however, is a much less obvious outlier. It might be the result of sample contamination, or perhaps the presence of a foreign mineral inclusion in the nodule from which the sample was taken. The multivariate statistical techniques used in this study are fairly robust at dealing with data sets of non-normal distribution, but better results are achieved when there are no extreme outliers. It was decided that retaining these two outlying samples would cause more trouble than aid for the statistical processes, so for all subsequent analyses they were removed from consideration. A few other samples have outlying values, but none are so consistently extreme for so many elements as #115 and #151.
The object of this geochemical provenance study is to discover lateral geographical variations within the Prairie du Chien. It is important, however, to remember that a geologic formation is not a two-dimensional body. There is also the possibility of vertical, stratigraphic variations in geochemistry which could seriously complicate the analysis. This is particularly true in light of the fact that the Prairie du Chien consists of two formations: the (lower) Oneota and (upper) Shakopee. The two formations do not have an unconformity between them, are treated as a single unit in archaeological studies, and are virtually identical in many respects. This project was undertaken with an hypothesis that the two would also be virtually identical geochemically, so that they can be treated as a single formation for provenance studies. This hypothesis must be tested to be sure it is reasonably valid.
Of the twenty sample collection locations, six (2, 4, 5, 6, 7, and 9) had good stratigraphic control. For each of these six locations, the samples were plotted using individual elements vs. stratigraphic position. Four examples of the 90 plots produced are shown in Figure 5. None of the elements showed any change in concentration between the Shakopee and Oneota, or any other coherent stratigraphic trends.

Figure 5: Examples of element concentrations plotted versus relative stratigraphic positions of the samples. Plots are shown from Location 5 (near Winona, MN). Open circles indicate samples from the Oneota Formation; crossed circles indicate samples from the Shakopee Formation.
Outcrop vs. Stream Deposits
Similar to the concern of differences between Shakopee and Oneota, there is another potential source of dichotomy in the sample collection. The majority of samples came from outcrop sources, but some were collected from stream deposits (see Figure 3). This is, in fact, a reflection of how lithic materials likely were acquired by prehistoric people. The necessity now arises to test for any differences between stream and outcrop samples, to see if they differ greatly in geochemical content. It is possible that the weathering process that freed chert nodules from their carbonate host and deposited them in secondary contexts may have leached or added mobile elements such as sodium and iron to them.
To assess this question, ten samples were collected from both outcrop and stream deposit sources at location #6. The source location consists of a deeply incised intermittent stream flanked with bedrock walls and with a streambed covered in chert cobbles. The SAS procedure univariate was used to generate box-and-whisker plots for the log concentrations of each subject element in the two groups of samples (Figure 6a and 6b). Many of the elements (Al, Mn, Na, Br, Co, Cr, Cs, Eu, Fe) are statistically equivalent between the two; that is, the mean of one group (or both) falls within one standard deviation of the mean for the other group. A number of others (Ce, Hf, La, Sc, Sm, U), however, show distinctly lower levels in the stream deposit samples.
This result is somewhat surprising given that most of the elements in the latter group are Rare Earth Elements, which are generally assumed to be immobile in weathering and other low-temperature processes (Murray, 1994; McLennan, 1989). In contrast, mobile elements such as sodium, manganese, and iron do not show depletion in the stream samples. A possibility is that the stream-deposited samples were transported from somewhere upstream, in which case the element concentrations reflect their origin from a slightly different location. This scenario is unlikely, however, because the source of the ephemeral stream in which they are found is less than a kilometer upstream, with no tributaries, and the most distant stream-side outcrops are less than a half kilometer upstream. None of the elements show extremely varying concentrations between the two types of sources. Still, the variations that exist should be remembered and taken into consideration throughout the rest of the data analysis and interpretation.

Figure 6a: Box-and-whisker plots from Location 6 (Wilson), comparing the concentrations of elements between outcrop and stream deposits. Crosses indicate sample means, dashed lines indicate sample median, circles indicate outliers.

Figure 6b: Box-and-whisker plots from Location 6 (Wilson), comparing the concentrations of elements between outcrop and stream deposits. Crosses indicate sample means, dashed lines indicate sample median, circles indicate outliers.
NEXT PREVIOUS TABLE OF CONTENTS
Intro and Background Fieldwork Sample Prep Data Analysis PCA Correspondence Analysis Stepwise DA
Discriminant Analysis More PCA Element Trends Conclusions Bibliography Appendix A: Part 1 Part 2 Part 3 Part 4 Part 5 Appendix B Appendix C