# How to Collect and Analyze Survey Data

## How to Collect Survey Data

One of the key sources of quantitative data in sociology is surveys. Try **online survey maker**. There are three main survey types based on their format:

- Telephone survey
- Street survey
- Online survey

### Telephone Survey

This type of survey is essentially a phone interview. It can be conducted by calling both landlines and mobile phones. Such surveys require a high level of phone penetration among population. Their time frames are quite tight, so the interviewer faces an increased workload. There is specialized software in the market that automatically determines the sample and dials chosen respondents.

#### Advantages of a telephone survey:

- Quick and cheap to carry out (less time and interviewers needed)
- Reduced impact of the interviewer on the respondent, no third-party presence effect
- Easy interviewer monitoring
- Specialized software available

#### Limitations of a telephone survey:

- Visual materials (images, audio, videos) cannot be displayed
- Behavioral reaction cannot be recorded
- The survey has to be short (no more than 10 minutes) and include simple questions with few hints
- The interviewers need to be experienced, as they only have their voice to keep the respondents focused
- Unrepresentative samples due to a high non-response rate
- The optimal call time is limited: Tuesday, Wednesday, Thursday 5:30 p.m. to 9:00 p.m.
- Lack of up-to-date phone directories for sampling

## Online Survey

With this type of survey, respondents answer questions on their personal device — a computer, tablet, or smartphone. Survey invitations can be sent out to an existing contact list or to participants of a specifically curated survey panel.

### Advantages of an online survey

- Low costs, low labor intensity required
- Prompt results: more than 60% of responses come within the first two days
- Wide audience coverage
- No interviewer influence
- Visual materials (images, audio, videos) can be displayed
- No time restraints to surveying
- To conduct a survey, a preset survey panel can be used based on specific criteria
- Sensitive topics (alcohol consumption, socially-objectionable behavior) can be studied
- The incoming data can be processed automatically

### Limitations of an online survey

- Low response rate
- No random sampling available
- Sampling correction tools need to be used
- Sampling mostly covers only specific categories of respondents
- Low control over responses
- Poor response quality due to respondent anonymity

## Street Survey

It is a type of personal interview conducted in crowded areas: on busy streets, in malls, etc. Respondents are recruited from those passing by.

### Advantages of a street survey

- Fast to conduct
- Easy to set precise respondents’ location
- Visual materials (images, audio, videos) can be displayed
- Behavioral reaction can be recorded

### Limitations of a street survey

- Random sampling
- Interview lasts no more than 10 minutes
- Hard to control interviewers’ work
- Influence of weather conditions
- Difficult to get a permit to conduct a survey in a mall

## How to Analyze Survey Results

The scale to measure a variable determines the procedures available. In general, there are **three types of scales used in sociology**:

- Nominal
- Ordinal
- Interval

### Non-Quantitative Scales

#### Nominal Scale

The nominal scale identifies whether an object has a certain attribute. On this scale, different attributes cannot be ranked or put in order, as they cannot be compared in terms of “bigger–smaller” or “better–worse”.

**Statistical procedures**

- Frequency
- Contingency table

#### Ordinal Scale

The ordinal scale helps to identify whether an object has one of ordered attributes.

**Statistical procedures:**

- Frequency
- Mode
- Median

### Quantitative Scales

#### Interval Scale

On the interval scale, attribute values are ordered with equal intervals between them, and have an assigned measurement unit.

Interval scales do not have a true zero. The zero value is arbitrary and does not mean that the object does not have a measured attribute. Examples include calendars with different starting points, or temperature scales with different zero values (Celsius, Fahrenheit, Kelvin).

Since the attribute is measured in certain units, its values can be added or subtracted. Multiplication and division operations, however, do not make sense due to the lack of a true zero that would represent the absence of the attribute. For instance, we can say that it is 7 degrees hotter today than yesterday, but not X times hotter.

#### Ratio Scale

The ratio scale has all the same properties as the other scales, plus a true zero that represents complete absence of an attribute. Thanks to this feature, ratio scales can be used to compare different objects. For example, a respondent can be 1.5 times older, go to malls 2 times less often, or make several times more expensive purchases than other respondents.

Intervals between values do not have to be equal, but they need to have distinct boundaries. For instance, if one interval on an age scale ends on 30 years, the next one has to start with 31 years.

**Statistical procedures for quantitative scales:**

- Frequency
- Mode
- Median
- Mean
- Variance
- Standard deviation

### Descriptive Statistics

The point of descriptive statistics is to describe the external features of the data.

#### Key indicators of descriptive statistics

**Absolute frequency** shows how many times a particular answer option has been chosen.

**Relative frequency** shows the share of a particular value in the total sample results.

PeakPoll survey builder automatically calculates absolute and relative frequencies for each answer option and presents information as pie charts and bar diagrams.

**Cumulative frequency** shows the share of answer options that do not exceed a certain value.

A **mean** is the sum of all answers divided by their quantity. It is used to determine the center or the midpoint of a row of data. It does not always objectively reflect the data structure, as it is easily influenced by outliers (uncharacteristically small or big values).

A **median** is a value that divides into halves a sequence of data presented in the ascending order. A median is another way to find the center of a data series which is not as sensitive to outliers as the mean.

A **mode** is the most frequent answer or the most typical value in a data set.

**Variance** shows the degree of answers’ deviation from the average value.

**Standard deviation** is a method of measuring the degree of deviation in a data set. In other words, it is the average interval from the average value. It is an important statistical indicator that should be analyzed along with the mean and the median. For instance, an average salary may not reflect the real situation at a company if the pay disparity is too high. That’s when standard deviation can come in handy.

#### Contingency Tables

With **contingency tables**, you can visualize the distribution of two variables and explore the relations between them. It is the most effective way to establish links between two nominal variables (for example, the gender and the consumption indicators for a certain product).

#### Correlation

When changes in the values of one variable coincide with changes in another one, it shows a correlation between them. However, it’s important to keep in mind that correlation does not mean causation.

When changes in one variable are directly proportional to changes in another one, it indicates **linear correlation**.

**Linear correlation **can be:

- Positive (both variables increase or decrease)
- Negative (one variable increases, and the other decreases)
- Strong (correlation coefficient over 0.7)
- Weak (correlation coefficient below 0.5)

The correlation coefficient range varies from -1 to +1. A zero correlation coefficient means that both variables are linear uncorrelated.

#### Correlation strength types

- Very weak (correlation coefficient below 0.2)
- Weak (correlation coefficient below 0.5)
- Average (correlation coefficient below 0.7)
- Strong (correlation coefficient below 0.9)
- Very strong (correlation coefficient over 0.9)

To prove correlation or lack of it between two attributes, different **statistical criteria** are used depending on the variable type. They are:

- Chi-squared test
- Contingency coefficient
- Kolmogorov–Smirnov (Lambda) test
- Spearman’s Rho
- Pearson correlation coefficient

#### Multidimensional Analysis

**Multidimensional analysis** enables researchers to explore correlation between two and more variables and test causation hypotheses.

In marketing, **factor** and **cluster analysis** procedures are the most common.

#### Factor Analysis

The key idea behind **factor analysis** is to boil down a large number of known variables to a fewer number of factors that define the differences between these variables.

##### The main objectives of factor analysis

- Reduce the number of variables
- Classify the data

For instance, you need to examine an employer’s brand. To do that, you ask respondents to rate the company based on more than 10 criteria. Then, using the factor analysis procedure, you group criteria into several key factors, such as “remuneration”, “labor conditions”, and “corporate culture”.

Another example is when you need to create a psychographic profile of a client. For that, you ask respondents to rate how much they agree with a set of statements on lifestyle. Then, you can apply the factor analysis procedure to find different client types — “innovative”, “progressive”, “conservative”.

#### Cluster Analysis

The main goal of **cluster analysis** is to group objects into several clusters. The objects within one cluster should be as similar as possible, while the clusters themselves should be as different as possible.

##### Cluster analysis outline

- Select variables for clustering
- Calculate the similarities and differences between answers
- Select a clustering method (the rule for grouping objects)
- Determine the optimal number of clusters
- Start a cluster analysis procedure
- Based on the produced results, define the profiles of representatives of each cluster

##### Areas of cluster analysis application

- Sociology: building socio-demographic respondent groups
- Marketing: segmentation of the target audience, grouping of competitors based on their competitive factors
- HR: grouping employees based on their motivation drivers