Basic data and statistical analysis

Usually when it comes to writing, mathematics and statistics probably aren’t the first things that come to mind. However, quite the number of us have had to face an instance where a source based on research must be used in our writing. Whether it be for a literature review or research paper or anything in between, being unfamiliar with how to utilize data can be a nerve-wracking experience. Knowing how to use data to our advantage can even strengthen our papers and help us to explain the things that in some cases, words alone cannot.

Commonly used symbols

mu – μ, pronounces “mu,” represents the mean of the population
capital sigma - ∑, represents the sum
lowercase sigma – σ, represents the standard deviation of the population
s – denotes the standard deviation of the sample
x – the value of each independent variable individually
y – the value of each dependent variable individually
n – refers to the sample size
N – refers to the population size
n-1 – referred to as “degrees of freedom”

Standard deviation

The standard deviation is the value calculated to represent the average distance of each point from the mean (average). It is important because it explains how close the data is to the average. For example, if the standard deviation is small that means most of the values are close to the average, but if it’s larger then they’re farther and more spread out.

Variation

The variance explains how spread out the data is from the average as a whole, rather than each individual value.

This is important to note since that explains how widely the data may differ, otherwise known as the spread.

To find the variation, you would simply square the standard deviation.

Pearson’s correlation coefficient (r)

This number represents the strength of the correlation of a set of data points. The range that the value of r is measured at goes from -1 to +1. For example, -1 would be extremely negative, whereas 0 would be no correlation.

You can mention Pearson’s Correlation Coefficient if you want to tack on a numerical value to a relationship that is shown by given data.

What is skew?

Skew describes how symmetrical a set of data is.

Data are usually considered “normal” when the majority of the data points lie in the middle of the range. However, data are positively skewed when it shows to have more high values, and negatively skewed when it has more low values.

You may want to mention the skew of the data similarly to how you would use “r.” This would be mentioned in order to describe what side the mean and the majority of the data lies.

The hypotheses

The null hypothesis that states that there is going to be no significant difference between two populations that are being measured.

The alternative hypothesis, on the other hand, claims that there is a difference or a relationship between the populations.

These hypotheses are used in order to give a basis on what the purpose of the experiment or research is.

If we are to find in our data that a relationship is statistically proven, we simply accept the alternative and reject the null.

Types of errors

A Type I error occurs when there is a “false positive.” In other words, the null hypothesis is wrongly rejected when it is actually true. This would mean that a significant difference is claimed but there actually isn’t one.

A Type II error occurs when there is a “false negative.” This means that the null hypothesis is accepted when it is actually false, or a significant difference is not claimed to be observed but there actually is one.

It is typically more desired to avoid a Type I error rather than a Type II error. This is for reasons such as there being a medication that claims to have a positive effect on patients, but it really doesn’t. If this is the case, there could be drastic consequences for the patient, since they believe they are receiving proper treatment when they actually are not. In other cases, a Type II error is harmful. For example, a patient may somehow not be receiving treatment for an ailment due to the belief that a particular treatment would have no effect on treating the issue, when it actuality it does.

The p-value

The p-value represents the probability that a result would occur based on the null hypothesis.

Typically, the value that we use to make sense of the p-value is the significance level represented as alpha = 0.05, or 5% (but not always). If we find that the p-value calculated is less than 0.05 (or whatever fixed significance level is set), then we can reject the null and accept the alternative hypothesis.

Correlation does not imply causation

One very important thing to keep in mind is that although the data may imply or state that there is a correlation between two variables, that doesn’t necessarily mean that one directly causes the other.

For example, there may be a positive correlation between money spent on ice cream and attendance at water parks in the summer. However, it’s not particularly true that because more money is spent on ice cream, that people are more likely to go to the water parks. Rather, there is probably another variable that could present this relationship. In this example, that may be because since its summer time, these two variables are more likely to increase simultaneously.

Remember that statistics are interpretive and you can attempt to use them to your advantage.

For the example above, if it would help your case, you can make the suggestion that the increase of attendance at water parks can potentially increase the money spent on ice cream in the summer. This could be because the water parks have vendors who sell ice cream, therefore the exposure and availability of ice cream to individuals would allow them greater access to purchase it. This is an argument that can be made to one’s advantage if needed and if worded carefully. Data does not serve for one to make absolute claims. Rather, reasonable assumptions can be developed based on an observation given by the data.

References

Lane, D. M. (n.d.). Computing Pearson's r. http://onlinestatbook.com/2/describing_bivariate_data/calculation.html

O'Connor, S. (2011, December 18). Everything you need to know about statistics (but were afraid to ask). http://theconversation.com/everything-you-need-to-know-about-statistics-but-were-afraid-to-ask-4532

Simple Statistical Analysis. (n.d.). Retrieved January 18, 2018, from https://www.skillsyouneed.com/num/simple-statistical-analysis.html

Standard Deviation and Variance. (n.d.). Retrieved January 18, 2018, from http://www.mathsisfun.com/data/standard-deviation.html

Variance and Standard Deviation. (n.d.). Retrieved January 18, 2018, from https://www.sciencebuddies.org/science-fair-projects/science-fair/variance-and-standard-deviation

Variance: Simple Definition, Step by Step Examples. (n.d.). Retrieved January 18, 2018, from http://www.statisticshowto.com/probability-and-statistics/variance/