Statisticians working in all areas of science know that you can describe
a
population's characteristics by describing a sample drawn at random
from that population.
Foresters, for example, might want to know the amount of lumber in a large tract of land. So measure a random sample of trees in that tract to estimate the board feet for the entire area so that they don't have to measure millions of trees. Doctors take a sample of blood, because they know they can test blood for diseases without the drastic and fatal step of taking all the patient's blood for the test!
Survey researchers use the same techniques, but they sample people and measure opinions, attitudes, beliefs, and self-reported behavior. The Minnesota Poll can - within certain limits of precision called the "margin of sampling error" - project the information from its samples to the population it is measuring. The Poll routinely surveys Minnesotans, but it also conducts polls in Minneapolis or St. Paul, and it does national surveys on occasion.
The samples used to measure population characteristics have lots of things
in
common with other types of sampling that aren't representative. Doing a poll
of one's friends, even hundreds of them, isn't a representative sample of
the
general population. Neither is a bunch of people responding to an Internet
survey, or political parties calling voter list for candidates all have things
in
common with a scientific survey. These lack at least one thing, perhaps more,
that are present in representative samples: the element of randomness.
Deciding on the sample size
Before researchers can do a random-sample poll, they first must decide how
many interviews they will need. Each poll is different, but generally between
600 and 1,200 interviews are needed. The larger the sample size, the more
precise one can be with the sample's estimates of the population, and the
greater the ability one has to analyze smaller groups within the sample -
and
population - such as men and women, or Democrats and Republicans. If the
Minnesota Poll needs to find and sample people who are relatively scarce in
the population, then it will need to interview a lot more than 1,000 to find
those people. The poll has surveyed such special populations as pastors (it
used a mail survey), people who are chronically sick, parents and teen in
household in which there's at least one teenager living full time, and evangelical
Christians.
Deciding what kind of sample
Once researchers have decided on sample size, then researchers must design
the type of sample that would be best to measure the population. For a
statewide Minnesota Poll, researchers generally use what sampling
statisticians call a "stratified area probability sample, proportionate to
strata
size." In practical terms, that means dividing the state into strata - in
most
cases counties - then drawing a random sample of potential residential
telephone numbers from each county. It also could use a simple random
sample or other more complex sample designs, but this design is a
commonly-used one that works well.
When the Star Tribune does a statewide poll, its researchers first ascertain
the
adult population of each county. Interviewers complete the number of
interviews in each county that is proportionate to its adult population. For
example, 26 percent of the state's adults live in Hennepin County, Minnesota's
most populous; consequently, in a sample of 1,000, a Minnesota Poll sample
would contain 260 interviews in Hennepin County. Thus, each county
represents a small, independent sample on its own. For a poll with a total
sample size of 800, there would be 208 interviews in Hennepin County.
Drawing a sample of telephone numbers
Once each county's sample size is determined, researchers must draw a
random sample of telephone numbers for interviewers to call. How's that
done?
North America has 10-digit phone numbers, which have an area code, a
prefix, and a suffix. Here's what a phone number looks like:
zzz-nxx-abcd
Area code-prefix-suffix
In this example, the zzz is the area code and the nxx is the prefix. The suffix
also has special numbers - the thousands (a), hundreds (b), tens (c) and ones
(d). The database containing all the residential area codes and prefixes for
all
telephone exchanges in the state, and researchers update it regularly to make
sure that area code changes and new telephone prefixes are included in its
samples. The poll's telephone number database contains only the banks of
thousands that are working.
First, the poll's survey specialist instructs the computer to draw enough
telephone numbers at random to complete the number of interviews needed in
each county. Next, the computer enumerates the area codes and telephone
prefixes in the county, and selects one area-code/prefix combination at
random. After that , it creates a three-digit random number to finish out
the
phone number. This last step creates the "random-digit-dial" (RDD) telephone
number that interviewers actually can call. Finally, the computer takes all
the
RDD telephone numbers and puts it in an electronic file.
Conducting interviews
Telephone interviewing is hard work. One has to read questions exactly as
the
researcher has written them, otherwise people who are interviewed (called
respondents) would hear different versions of the questions and might respond
to them differently. That would introduce bias into poll, and it would make
interpreting the responses difficult.
Consequently, good polling organizations take care to find, train and supervise
good interviewers. The Minnesota Poll uses interviewers at several market
research companies to conduct its polling, but the Market Solutions Group,
Inc., in Minneapolis, does most of the interviewing.
When it starts a new poll, researchers provide the interviewing company a
copy of the questionnaire, which programmers then load into a computer. This
computer (called a CATI system, for "computer-assisted telephone
interviewing") can dial the RDD telephone numbers for the interviewer and
supply the questions on the computer screen one at a time. It also records
the
respondents' answers to the questions after the interview is completed.
But before that, interviewers are briefed on the questionnaire: Supervisors
point out when it will be done, what it's about and other things, and they
go
over the questionnaire to show interviewers how questions should be read.
Interviewers practice with each other until they are familiar with the questions.
Interviewers also are provided scripts to read to respondents when they
encounter frequently asked questions. As a result of this and other training,
interviewers are ready to call when they go to their "CATI" station.
Interviewers are trained to do everything they can to keep response rates
high. They make appoints to call a respondent back if it's inconvenient to
do
the interview at the time of the initial call. They leave messages on answering
machines identifying who they are and why they're calling; they even provide
a
toll-free number for respondents to return the call. When respondents initially
refuse, a senior interviewer calls them back to give them another chance to
be
included in the poll's sample. But if people don't want to be called back,
interviewers respect those wishes.
Selecting respondents from within households
Interviewers call residential households at random, because randomness is
the
basis of getting a representative sample of people. But randomness has to
apply to selecting the person to be interviewed once the household is included
in the sample.
The Minnesota Poll uses the "most-recent-birthday" technique to choose one
adult from each household to be interviewed. "Informants," someone in the
household who answers the phone, hear this script from the interviewer
"Hello, this is _________ calling for the Star Tribune Minnesota Poll. We
are not selling anything.Today we are asking Minnesotans some questions
about various issues . May I please speak with the person in this
household who had the most recent birthday and is 18 years of age or
older."
There are other ways to choose respondents, but the poll's researchers have
found that they are too intrusive, and result in too many people refusing
to
conduct the interview. Consequently the Minnesota Poll has used this method
successfully for the past decade.
Data analysis
Once the interviewing is complete, the interviewing company e-mails the data
set containing the data for all the respondents to the Star Tribune's polling
unit.
The sample is weighted for age, gender and education, based on the 1996
Census estimates of the adult population. It also is weighted to take into
account factors contributing to unequal probability of selection, which are
the
number of telephones going into the household and the number of adults in
the
household. That way, the poll's estimates represent the opinions or attitudes
of
the entire adult population - rather than for households. For election polls,
the
data also are weighted for likelihood to vote.
How does weighting work? Let's look at an easy example - weighting only for
gender. We know that the adult population in Minnesota is about 50 percent
men and 50 percent women. But if a sample turns out to be 55 percent men
and 45 percent women, then it has to be weighted to make sure men count
for half of the responses and women count for half. That means, in this
example, men would have to be counted slightly less (0.91 to be precise) and
women would have to be counted more (1.11) so that the weighted data
would have half men and half women. (Do the math and see how it works:
0.91 * 0.55 = 50 percent men; 1.11 * 0.45 = 50 percent women.)
After the data are weighted, the analyst examines the results statistically
and
writes a short report that he uses to brief reporters and editors about the
key
findings. After the stories and graphics are written and proofed, the poll's
director and survey specialist scrutinize them again to make sure the numbers
and facts are right.
Then you see it in the newspaper, and on startribune.com.
© Copyright 2003 Star Tribune.