Hi Everyone-
Do you realize that we are now 2/3 of the way through the first semester and 1/3 of the way through the academic year! And wow! What an intense and dense lecture that was. Lots of material to learn. And we sure covered a wide variety of new equations and concepts. So let me see if I can summarize what you should have taken home from this lecture.
In this class we covered the detailed concept of dispersion or spread of a distribution.
(1) We reviewed the range and discussed why it is a very rough descriptor of the dispersion of a distribution.
(2) We introduced the concept of the IQV - Index of Qualitative Variation as an alternative descriptor of variation that is particularly useful when one wishes to compare between groups whose categories are nominal descriptors.
(3) We next introduced the concept of variance and standard deviation . We discussed the idea of deviation from the mean value and how summing the squares of the deviations of each value from the mean would be a useful measure of the total variation of the distribution. This gave us the concept of the sum squared deviation or SS . We then used this to demonstrate how to compute the variance and standard deviations. We talked about how there were different notations for when one is computing the variance of the total population versus a sample of the population. And we also covered some alternative formulae for computing the mean and variance because there are problems using the definitional equations.
(4) We then discussed the meaning of the variance/standard deviation in terms of area under the "bell curve" and how that is useful. We examined, using the bell curve, what happens when the standard deviation is small versus when it is large.
(5) We then looked at how to compute the standard deviation and the variance from grouped data . We observed that the formula for doing this is methodologically similar to the way we computed the mean for grouped data.
(6) Next we introduced the concept of the IQR - Interquartile Range . We defined it and discussed why it is a more stable range value than the actual range. First we considered how to compute the IQR using the exact data. We then considered how to construct the IQR by revisiting the formula for constructing the median quartile using grouped data. We did some examples where we computed Q25 and Q75 using the grouped data and we demonstrated how the formula for Q50 (median) could easily be altered to compute the "tile" data for any desired "tile".
(7) We next covered boxplots and modified boxplots . We discussed how they are constructed and how to interpret them.
(8) We addressed the question of how one ethically decides when a data point or points can be considered to be an outlier. To do this, we defined the constructs of upper fence and lower fence . We then stated the rule that a datapoint that falls outside the fence may be considered to be an outlier once you have checked to make sure that you have entered the data correctly from the data sheets and that there is no problem with the data in the database.
(9) We then discussed the problem of scales. We observed that some data can be on a different scale of magnitude than another. We observed this when we did clustered bar graphs and histograms and discovered that the frequency scales were quite different in our example. We talked about how one can "lie" with graphs and how to avoid this problem by scaling to percentages. We also introduced the concept of CV - Coefficient of Variation as a way of scaling the variation so that distributions that are at different scales can be compared.
(10) We closed our discussion with a Halloween scare from Tarynn who showed us the actual equations for calculating skewness and kurtosis . But we were reassured that we would not actually have to use them. We discussed how to interpret the skewness and kurtosis measures and what the standard error of skewness and kurtosis meant.
(11) In SPSS we went over some new features of the bar graph and we discussed how to create a boxplot . We also addressed how to enter data into an SPSS spreadsheet in a way that makes it more amenable for plotting boxplots. We also went over the remainder of the measures of dispersion in the Analysis component of the SPSS modules we have previously covered.
Don't forget to do your practice problems. Have a great week.
Tarynn