### Scatter Diagrams

#### What does bivariate data mean?

- A lot of statistics is about looking at how different factors, or variables change how data behaves
- Bivariate data is data which is collected on two variables and looks at how one of the factors affects the other
- Each data value from one variable will be paired with a data value from the other variable
- The two variables are often related, but do not have to be

#### What is a scatter diagram?

- A scatter diagram is a way of graphing bivariate data
- You may be asked to plot, or add to, a scatter diagram

- Scatter diagrams allow statisticians to look for relationships between the two variables
- Some scatter diagrams will show a clear relationship know as correlation (see below)
- Others will not display on obvious relationship
- If a scatter diagram shows a relationship you may be asked to identify outliers

#### Worked Example

The scatter diagram below shows the number of Save My Exams question packs completed by a group of students and the percentage score they received in their A-Level Statistics exam.

(i)State which of the variables is the explanatory variable and which is the response variable.

(iii)Another student completed 50 question packs and scored 80% on their A Level Statistics exam, add this data to the scatter diagram.

#### Exam Tip

- Learn the vocabulary for the types of variables as you could be asked a question on this. Make sure you check the scales carefully when plotting any points.

### Correlation

#### What is correlation?

- Correlation is how the relationship between the two variables is described
- Perfect linear correlation means that the bivariate data will all lie on a straight line on a scatter diagram
- Linear correlation can be positive or negative and it can be strong or weak
- Positive correlation describes a data set where both variables are increasing
- Negative correlation describes a data set where one variable is increasing and the other is decreasing

- When describing correlation you should say whether it is positive or negative and also say whether it is strong or weak
- If correlation exists then there could be outliers, these will be data points that do not fit the pattern seen on the graph
- There will likely be a maximum of one or two outliers on any scatter diagram
- You may be asked to identify the outliers

#### What is the difference between correlation and causation?

- It is important to be aware that just because correlation exists, it does not mean that the change in one of the variables is causing the change in the other variable
- Correlation does not imply causation!

- If a change in one variable causes a change in the other then the two variables are said to have a causal relationship
- Observing correlation between two variables does not always mean that there is a causal relationship
- Look at the two variables in question and consider the context of the question to decide if there could be a causal relationship
- If the two variables are temperature and number of ice creams sold at a park then it is likely to be a causal relationship
- Correlation may exist between global temperatures and the number of monkeys kept as pets in the UK but they are unlikely to have a causal relationship

- Observing a relationship between two variables can allow you to create a hypothesis about those two variables

#### Worked Example

The scatter diagram below shows the number of Save My Exams question packs completed by a group of students and the percentage score they received in their A-Level Statistics exam.

(i)Describe the correlation shown in the scatter diagram.

(ii)Decide if you think there could be a causal relationship between the two variables and explain your reasoning.

### Linear Regression

#### What is linear regression?

- If strong linear correlation exists on a scatter diagram, then a line of best fit can be drawn
- This is a linear graph added to the scatter diagram that best approximates the relationship between the two variables
- At GCSE this will have been drawn by eye as a line that fits closest to the data values
- The data can be used to calculate the equation of the straight line that represents the best fit of the relationship between the two variables
- You do not need to know how to calculate it but you will need to be able to interpret one

- The least squares regression line is the line of best fit that minimises the sum of the squares of the gap between the line and each data value
- This is usually called the regression line and can be calculated either be looking at the vertical or the horizontal distances between the line and the data values
- If the regression line is calculated by looking at the vertical distances it is called the regression line of y on x
- If the regression line is calculated by looking at the horizontal distances it is called the regression line of x on y
- The regression line of x on y is rarely used and you are unlikely to come across it at this level

#### How to use a regression line?

- Drawing a regression line is done in the same way as drawing a straight line graph, substitute some values from the independent data set to help you
- The regression line can be used to decide what type of correlation there is if there is no scatter diagram
- If b is positive then the data set has positive correlation and if b is negative then the data set has negative correlation

- The value of b can be used to interpret how the data is changing
- The regression line can also be used to predict the value of a dependent variable from an independent variable
- Predictions should only be made for values of the dependent variable that are within the range of the given data
- Making a prediction within the range of the given data is called interpolation
- Making a prediction outside of the range of the given data is called extrapolation and is much less reliable
- The prediction will be more reliable if the number of data values in the original sample set is bigger

#### Worked Example

#### Exam Tip

- Remember that the value of b is the gradient of the regression line, a greater value of b does not mean stronger correlation. When using a regression line to make a prediction make sure that the value you are predicting from falls within the range of the data used to calculate the regression line.

转载自savemyexams