Interpreting research literature: A comprehensive guide

While reading research literature, ask and answer these questions to help yourself gain a better understanding of the reliability, generalizability, and significance of it.

Study design and level of evidence

Is the study qualitative or quantitative? Qualitative studies use subjective, non-numerical data to gather insight into the human experience (e.g., a survey asking MAPs about the stigma they have faced). Quantitative studies use numerical data to draw conclusions about a specific phenomenon (e.g., a study that compares suicide rates of MAPs to suicide rates of non-MAPs).
Is the study experimental? An experimental study compares a control group to an experimental group in order to establish a casual relationship between a variable and an outcome (e.g., comparing the outcomes from a new medication to the outcomes from a placebo). If a study is not experimental, then it cannot establish a causal relationship. Non-experimental studies can, however, establish correlational relationships.
What design does the study use, and what level of evidence does it provide? Different types of studies will provide different levels of evidence (see image below). Meta-analyses and systematic reviews synthesize data from many different studies to draw strong conclusions. Randomized controlled trials use randomization to assign control and experimental groups in an experimental study. Quasi-experimental trials are experimental studies but do not use randomization to establish control and experimental groups. Non-experimental studies include many different designs that do not compare a control group and an experimental group. Case studies provide in-depth information about relatively few participants and do not provide results that are generalizable. Both qualitative study and expert opinion can be important influences on quantitative study, but neither provide objective evidence.

Sampling

How many participants are included in the sample? The more participants a study includes, the more accurate their results will be (typically). If a study only includes a small sample, its results should be interpreted cautiously.
Is the sample representative of the population? If the sample has the same demographics as the population, then it is representative. Representativeness increases generalizability of research results because it increases the likelihood that the same results will be true in the population as were true in the sample (e.g, a study on MAPs includes 94% non-offending[1] MAPs and 6% offending MAPs; according to some research,[2] this proportion is representative of the population of MAPs). Unrepresentativeness is a common issue in MAP studies because many MAP studies disproportionately include MAPs who have been convicted of illegal sexual activity with a minor. Since the large majority of MAPs are haven't been convicted of illegal sexual activity,[2] those samples are not representative of the population, and the results they find are not generalizable. Many other MAP studies only include anti-contact MAPs, which can also cause the results to be ungeneralizable since not all MAPs are anti-contact.
Is the sample randomized? Randomization is when a sample is randomly selected from the population (or a larger sample of eligible participants). Randomization increases representativeness in the sample by giving everyone in the population an equal likelihood of being included in the sample. It also reduces bias because the researcher is not able to choose who will be included in the sample. Randomization increases the reliability of a study's findings (typically).

Results and statistical significance

Do the results of the study support or reject the hypothesis? The hypothesis of a study is a calculated estimate by the researcher (e.g., if MAPs with depression are given 80 mg of Prozac per day, then they will report less suicidality than MAPs with depression who are given 40 mg of Prozac per day). If the hypothesis is supported by the results, then more study is needed to confirm the findings (typically). If the hypothesis is rejected by the results, then the hypothesis may need to be adjusted, and more study is needed.
Are the results statistically significant? Statistical significance is usually measured by a P value. The P value represents the probability that the results happened by chance, so a lower P value is more statistically significant (“significance” is usually defined as P<0.05 or P<0.01). A P value of 0.50 would mean that there is a 50% probability that the results happened by random chance, making the results insignificant; whereas, a P value of 0.01 would mean that there is only a 1% probability that the results happened by random chance, making the results significant. There are a variety of different statistical calculations that will output a P value. There are also other ways to measure significance, such as a confidence interval.

Notes

[1] “Offending” is used to refer to MAPs who have been convicted of illegal sexual activity with minors. This term is not meant to stigmatize anyone; rather, it is used only for clarity in context.

[2] Bailey, J. M., Bernhard, P. A., & Hsu, K. J. (2016). An internet study of men sexually attracted to children: Correlates of sexual offending against children. Journal of Abnormal Psychology, 2016, 125(7), 989-1,000. https://doi.org/10.1037/abn0000213.

CC0 Public Domain Work (only applies to this article)