Join our WikiEducator discussion group or Register now for free skills training.

Histogram

From WikiEducator

Jump to: navigation, search


This glossary is far from complete. We are constantly adding math terms.
For instructions on adding new terms, please refer to Math Glossary Main Page



Definition

Histogram
  • A graph consisting of a series of vertical columns, such that the area of each column represents observed frequencies in a class interval (also referred to as interval or bin).
  • Used to display the shape of a frequency distribution.
  • Particularly useful when there are a large number of observations.
  • A type of bar graph.
  • Also called histograph.




image:icon_present.gif
Tip: When creating a histogram...
  • Leave no space between bars when the class intervals represent continuous data; bar graphs, drawn with a gap between bars, are used to represent discrete data (i.e., different categories).
  • Graph the observed frequency in each class interval (also called the class frequency) for datasets with a small to medium number of observations; label the y-axis "Frequency." Graph the relative frequency in each class interval for datasets with a large number of observations; label the y-axis "Proportion" or "Relative Frequency."


Contents

Examples

A typical histogram

Pulse rates, in beats per minute, were calculated for 192 students enrolled in a statistics course at the University of Adelaide.[1] The first step in creating a histogram is to create a frequency table.

Pulse Rate for a Sample of Students
from the University of Adelaide
Pulse Rate Count
(34-41] 2
(41-48] 2
(48-55] 4
(55-62] 19
(62-69] 40
(69-76] 53
(76-83] 30
(83-90] 27
(90-97] 10
(97-104] 5
Total 192



Using the class frequencies (the number of observations in each class interval) shown in the frequency table, the following histogram was created.[1]

Exploring the effect of class interval size

See West's histogram applet for an opportunity to experiment with different class interval sizes.

Creating a histogram in an OpenOffice spreadsheet

Old Faithful is a geyser in Yellowstone National Park in Wyoming, USA. To better understand the timing of the eruptions, the duration of the eruptions and the time since the previous eruption were measured over 23 consecutive days. The Old Faithful dataset contains 222 observations, including date of observation, duration of the eruption and the time since the previous eruption. A histogram of the duration of eruption will provide a graphical display showing the distribution (shape) of the data. Use the following instructions to create such a histogram.

Prepare the data

Download the Old Faithful dataset by choosing the "red dot" (Excel) version. Open the dataset with scalc.exe (the OpenOffice spreadsheet application).

Define the histogram's class intervals

In unused cells to the right of the data, enter the formulas for min, max and count. These values are needed to determine the number and size of the classes.

The Rice Rule with 222 observations yields 12 classes. Choose a starting value that is equal to or slightly less than the lowest value.[1] In this example set the starting value just below the minimum duration, 1.6. Calculate the size of each class interval by dividing the difference between the max and min by the number of class intervals. In this case, \frac{5.2-1.7}{12}=.29\, which rounds to .3\,.

With a starting value of 1.6, a max of 5.2, and a class size of .3, the upper threshold for each class will be 1.9, 2.2, 2.5, ... 5.2. Lower limits are not needed when using the OpenOffice FREQUENCY function. Adding an additional class above and below the extreme values indicates that all of the sample data was included.

  • In one of the columns near the data, enter these numbers: 1.6, 1.9, 2.2, 2.5, ... 5.2, 5.5.

Calculate the frequencies in each class interval

The frequency count in each class can be automatically counted using the FREQUENCY function in OpenOffice. The FREQUENCY function is an array function, returning values to a range of cells.

Hightlight the range of cells in the column adjacent to the class intervals, labeled "Duration Freq." Choose Insert > Function. Scroll down the list of functions to select FREQUENCY. Click on Next>>. The Function Wizard will display. The FREQUENCY function requires two arguments: data and classes.

Click on the Shrink button, image:OpenOfficeShrink.png, next to the data field. The dialog box will collapse such that the relevant data field can be highlighted. Click the Maximize button, image:OpenOfficeMaximize.png, to return to the full dialog box. Use the same procedure to fill in the range for the classes. The completed formula will display in the formula bar. Click OK.

The frequencies for each class interval display in the cells adjacent to the class limits.

Create the histogram

Choose Insert > Chart.... The Chart Wizard dialog box opens. For 1. Chart Type, choose Column. Click on Next>>. For 2. Data Range, click on the Shrink button, image:OpenOfficeShrink.png and then highlight full range of classes and frequencies. Check the box "First column as label" so the class levels are used as labels for the x-axis. If you include the header labels, check the box "First row as label."

Click on Next>>. No revisions are necessary on 3. Data Series. Click on Next>>. For 4. Chart Elements, enter a title, labels for the x and y axes, uncheck "Display grids" and uncheck "Display legend." Click Finish. The chart will be inserted into the open sheet.

To remove the space between the bars, as required for a histogram, double click on the graph to enter edit mode, right click on the bars and choose Object Properties.... Choose the Options tab, set the "Spacing" under "Settings" to 0%. Click OK. The histogram displays.

The histogram clearly indicates that the duration data are bimodal with one mode near 1.9 minutes and a second mode near 4.6 minutes.





Notes

Personal tools