TSTD 6251 Fall 2014
SPSS Exercise and Assignment 1
20 Points
In this class, we are going to study descriptive summary statistics and learn how to construct box plot. We are still working with univariate variable for this exercise.
Practice Example:
Admission receipts (in million of dollars) for a recent season are given below for the
n = 30 major league baseball teams:
19.4 26.6 22.9 44.5 24.4 19.0 27.5 19.9 22.8 19.0 16.9 15.2 25.7 19.0 15.5 17.1 15.6 10.6 16.2 15.6 15.4 18.2 15.5 14.2 9.5 9.9
10.7 11.9 26.7 17.5
Require:
a. Compute the mean, variance and standard deviation.
b. Find the sample median, first quartile, and third quartile.
c. Construct a boxplot and interpret the distribution of the data.
d. Discuss the distribution of this set of data by examining kurtosis and skewness
statistics, such as if the distribution is skewed to one side of the distribution, and if the
distribution shows a peaked/skinny curve or a spread out/flat curve.
SPSS Procedures for Computing Summary Statistics:
- Enter the 30 data values in the first column of SPSS Data View
- Tab Variable View and name this variable receipts
- Adjust Decimals to 3 decimal points
- Type Admission Receipts ($ mn) in the Label column for output viewer
- Return to Data View and click Analyze on the menu bar
- Click the second menu Descriptive Statistics
- Click Frequencies …
- Move Admission Receipts to the Variable(s) list by clicking the arrow button
- Click Statistics … button at the top of the dialog box
- Now, you can select the descriptive statistics according to what the question requires. For this practice question, it requires central tendency, dispersion, percentile and distribution statistics, so we click all the boxes except for Percentile(s): and Values are group midpoints.
- Click Continue to return to the Frequencies dialog box
- Click OK to generate descriptive statistic output which is pasted below:
The first table provides summary statistics and the second table lists frequencies, relative frequencies and cumulative frequencies. The statistics required for solving this problem are highlighted in red.
Statistics
Admission Receipts
|
N |
Valid |
30 |
|
Missing |
0 |
|
| Mean | ||
|
|
18.76333 |
|
| Std. Error of Mean | ||
|
|
1.278590 |
|
| Median | ||
|
|
17.30000 |
|
| Mode | ||
|
|
19.000 |
|
| Std. Deviation | ||
|
|
7.003127 |
|
| Variance | ||
|
|
49.043782 |
|
| Skewness | ||
|
|
1.734 |
|
| Std. Error of Skewness | ||
|
|
.427 |
|
| Kurtosis | ||
|
|
5.160 |
|
| Std. Error of Kurtosis | ||
|
|
.833 |
|
| Range | ||
|
|
35.000 |
|
| Minimum | ||
|
|
9.500 |
|
| Maximum | ||
|
|
44.500 |
|
| Sum | ||
|
|
562.900 |
|
| Percentiles | ||
|
10 |
10.61000 |
|
|
20 |
14.40000 |
|
|
25 |
15.35000 |
|
|
30 |
15.50000 |
|
|
40 |
15.84000 |
|
|
50 |
17.30000 |
|
|
60 |
19.00000 |
|
|
70 |
19.75000 |
|
|
75 |
22.82500 |
|
|
80 |
24.10000 |
|
|
90 |
26.69000 |
Admission Receipts
|
|
|
Frequency |
Percent |
Valid Percent |
Cumulative Percent |
| Valid | |||||
|
9.500 |
1 |
3.3 |
3.3 |
3.3 |
|
|
9.900 |
1 |
3.3 |
3.3 |
6.7 |
|
|
10.600 |
1 |
3.3 |
3.3 |
10.0 |
|
|
10.700 |
1 |
3.3 |
3.3 |
13.3 |
|
|
11.900 |
1 |
3.3 |
3.3 |
16.7 |
|
|
14.200 |
1 |
3.3 |
3.3 |
20.0 |
|
|
15.200 |
1 |
3.3 |
3.3 |
23.3 |
|
|
15.400 |
1 |
3.3 |
3.3 |
26.7 |
|
|
15.500 |
2 |
6.7 |
6.7 |
33.3 |
|
|
15.600 |
2 |
6.7 |
6.7 |
40.0 |
|
|
16.200 |
1 |
3.3 |
3.3 |
43.3 |
|
|
16.900 |
1 |
3.3 |
3.3 |
46.7 |
|
|
17.100 |
1 |
3.3 |
3.3 |
50.0 |
|
|
17.500 |
1 |
3.3 |
3.3 |
53.3 |
|
|
18.200 |
1 |
3.3 |
3.3 |
56.7 |
|
|
19.000 |
3 |
10.0 |
10.0 |
66.7 |
|
|
19.400 |
1 |
3.3 |
3.3 |
70.0 |
|
|
19.900 |
1 |
3.3 |
3.3 |
73.3 |
|
|
22.800 |
1 |
3.3 |
3.3 |
76.7 |
|
|
22.900 |
1 |
3.3 |
3.3 |
80.0 |
|
|
24.400 |
1 |
3.3 |
3.3 |
83.3 |
|
|
25.700 |
1 |
3.3 |
3.3 |
86.7 |
|
|
26.600 |
1 |
3.3 |
3.3 |
90.0 |
|
|
26.700 |
1 |
3.3 |
3.3 |
93.3 |
|
|
27.500 |
1 |
3.3 |
3.3 |
96.7 |
|
|
44.500 |
1 |
3.3 |
3.3 |
100.0 |
|
|
Total |
30 |
100.0 |
100.0 |
|
Interpretation:
SPSS Output Table contains four measures of central tendency: mean, median, mode and sum. The Mean (18.763) is computed by dividing the sum (562.9) by the count (30).
The Median (17.3) is a measurement of position in a ranked set of data. It is the middle number in a data set with an odd number of values. In an even set of numbers, it is the value halfway between the two middle values.
The Mode (19) is a measurement of frequency, it is the most frequently occurring value. The mode is often used with grouped data. A frequency distribution with the highest number of occurrences is called the modal interval.
The output table contains several measures of variation. The Range (35) equals the Maximum value (44.5) minus the Minimum (9.5). Note: with some data sets the range can be a misleading measure of variation since it only contains the two most extreme values.
The Standard Deviation (7.003) is the most common measure of variation or dispersion. In a normal symmetrical set of data, about 68 percent of the data value will be within plus or minus one standard deviation of the mean, about 95 percent will be within plus or minus two standard deviations of the mean, and about 99.7 percent will be within plus or minus three standard deviations of the mean. However, this baseball admission data set is not normally distributed. We will find out why when we examine the Skewness statistics.
The Variance (49.044)is the standard deviation squared. SPSS output table shows the sample standard deviation and variance computed using n-1 in the denominator.
The Largest and the Smallest values are reported as the Maximum (44.5) and the Minimum (9.5) in the output table.
The Standard Error of Mean (1.279) is the standard deviation divided by the square root of the sample size. It is a measure of uncertainty about the mean, and is used for statistical inference (confidence intervals, regression belts, and hypothesis tests, etc.) It will be used extensively later in our exercises.
The two distribution statistics reported in SPSS output table are skewness and kurtosis.
Skewness (1.734) is a measurement of the lack of symmetry in a distribution. If there are a few extreme small values and the tail of the distribution runs off to the left, we say the distribution is negatively skewed and our skewness value would be negative. If there are a few extreme large values and the tail of the distribution runs off to the right , we say the distribution is positively skewed and the skewness value would be positive. In our example, the distribution of admission data is positively skewed due to a few extreme large values in the data set. SPSS computes the skewness value using the third power of the deviation from the mean.
Kurtosis (5.160) measures the degree of peakness in symmetric distributions of data values. If a symmetric distribution is more peaked than the normal distribution, that is, if there are fewer values in the tails of the distribution curve, the kurtosis measure is positive. If the distribution is flatter than the normal distribution, that is if there are more values in the tails of the distribution curve than a corresponding normal distribution, the kurtosis measure is negative. In our example, the distribution of data values tends to be squeezed into the middle of the distribution, or peaked.
Percentiles divide a set of observations into 100 equal parts and quartiles divide a set of observations into four equal parts. The first quartile, usually labeled Q1, is the value below which 25 percent of the observations occur and the third quartile, usually labeled Q3, is the value below which 75 percent of the observations occur. Logically, Q2 is the median. The values corresponding to Q1, Q2, and Q3, divide a set of data into four equal parts. SPSS output table reports percentile and quartile statistics in one list. So the first quartile, Q1 = 25th percentile = 15.35 and the third quartile, Q3 = 75th percentile = 22.83.
SPSS Procedures for Creating Boxplot
A box plot is a graphic display of data, based on quartiles, that helps us picture a set of data.
- Open your Admission Receipts file
- Click Graphs menu
- Click Chart Builder then click OK
- In the Chart Builder window, choose Boxplot under the label of Gallery. Then drag the third graph to the preview area.
- Drag Admission Receipts ($ mn) to the X-Axis target.
- Chick Title/Footnotes label and select Title 1. Then type Admission Receipts for 30 Baseball Teams in Content which is in the Element Properties window.
- Click Apply
- Click the Options tab to choose Chart Templates (procedures were explained in SPSS practice assignment: you will see Template Files dialog box in the middle of the screen. To give your chart different styles, you can click: Add… and then open: Looks. There are seven different chart templates for you to choose. Chart Look can be used to improve the appearance of charts for presentations and reports or to standardize features as color and type size). We choose Classic Interactive Graphics for this boxplot chart.
- Click OK and the following boxplot chart appears.( To Edit the chart, right-click any part of the chart. Click Edit Content. Then click In Separate Window.)
This is a vertical boxplot. The box shows that the heavier line inside the box is the median and the box contains the middle 50 percent of the admission receipts ranging from $15.35 and 22.83 million. The distance between the ends of the box, $7.48 million (22.83 – 15.35), is the interquartile range. So we can conclude that 50 percent of the teams made between $15.35 and $22.83 million through admission receipts during the season.
The two short horizontal lines (whiskers) below and above the box represent the smallest and largest observed values within 1.5 box lengths, in this data set: $9.5 and $27.5 million respectively. The circle at the top of chart is defined as an outlier, the value of $44.5 million in this data set (the number 4 underneath the circle indicates that this outlier is the fourth value entered in SPSS Data View). It is determined by the values between 1.5 and 3 box lengths from the upper or lower edge of the box. Cases with values of more than 3 box length from the upper or lower edge of the box are called extreme values, and they are designated with asterisks (*). There are no extreme values in this data set.
The location of the median inside the box and the location of the box within the whiskers can also reveal the distribution of the data values. The location of the median line may suggest skewness in data distribution if it is significantly shifted away from the center. If the box is moved noticeably to the low end, it provides a clue of positive skew. If the box is moved noticeably to the high end, the distribution of data is negative. In this case, the positive skewness is also confirmed by the skewness statistics (1.734) reported in the SPSS output table. We can conclude that the distribution of the data values is influenced by the outlier of $44.5 million, which is shown as a circle at the top part of the chart.
The statistics of kurtosis can be interpreted by the shape of the box. If the shape of the box is very thin, this reveals a positive Kurtosis – a high, thinner peak-shaped distribution. If the shape of the box is wide, this shows a negative Kurtosis – a flat, wider peak.
The following assignments will be submitted for grade:
1. A major airline wanted some information on those enrolled in their “frequent flyer” program. A sample of 48 members in the frequent program revealed the following number of miles flown, to the nearest 1,000 miles, by each participant: (7 points)
22 29 32 38 39 41 42 43 43 43 44 44 45 45 46 46 46 47 50 51 52 54 54 55 56 57 58 59 60 61 61 63 63 64 64 67 69 70 70 70 71 71 72 73 74 76 78 90.3
Require:
a. Compute and report the mean, variance and standard deviation.
b. Report the sample median, first quartile, and third quartile.
c. Describe the shape of the data distribution by interpreting kurtosis and skewness
statistics.
d. Construct a box plot and write a paragraph to discuss data distribution as displayed by
the box plot.
2. Working with Data
For this excise, you are given a data set on the information about the 30 Major League
Baseball teams compiled for 2003. The data file is posted in the Outline section of
Blackboard and it is in SPSS format. (Note: we will also use this data set for SPSS
and written assignments in later studies. A key to explain the coding of variables
is also posted.)
Use the Baseball 2003 data, which reports information on the 30 Major League
Baseball teams for the 2003 baseball season. (7 points)
For the variable team salary:
1. Find the mean, median and standard deviation
2. Determine the skewness. Is the distribution positively or negatively skewed?
3. Construct a box plot to show team salary distribution.
4. Write one-page summary of the distribution of baseball team salary in 2003.
3. Exercise.com (6 points)
For this exercise, we will visit: http://quote.yahoo.com to find historical stock prices for World Wrestling Entertainment Inc. (company produces and markets TV programming and events of wrestling sports, ticker symbol: WWE) and Marriott International (a diversified hotel chain with multiple brands in the world, ticker symbol: MAR) from June 3 to August 30, 2013, produce summary statistics for the stock performance of these two organizations and discuss the volatility of their financial performance during the study period.
Ø On Yahoo Finance home page, type WWE (lower case is fine) in the Enter Symbol target.
Ø Click Look up. A basic stock performance chart appears.
Ø Click Historical Prices on the left of the quote table. A table containing stock prices for World Wrestling Entertainment Inc. from early June to the present opens up
Ø Adjust the Start date to June 3, 2013 and End date to August 30, 2013.
Ø Keep the default Daily price choice
Ø Click Get Prices. A table containing daily stock prices from June 3 to August 30 opens up. Copy the Close price only for SPSS analysis (Note: The latest prices are listed on the top. So be careful with the order of the data when you transfer the data to SPSS.)
Require:
1. Perform descriptive analysis of World Wrestling Entertainment Inc. stock prices for the close of the day for these three months (June-August) to find the mean, standard deviation and variance.
2. Do the same for Marriott International stock prices for the same three months: find the mean, standard deviation and variance.
3. Write a paragraph to compare the financial performance of these two organizations and explain which organization’s financial performance was more consistent in the three months studied or did the two stock performances show similar variations?


0 comments