Week 11: Reproduction

Ben Marwick and colleagues have made the argument that an extremely effective way of teaching archaeology data science is to have students try to replicate the findings of archaeologists from the published journal articles. (You can read the preprint here).

Of course, this depends on archaeologists publishing both the code and the underlying data for their work, something that is unfortunately still comparatively rare. When the results of archaeological research can’t be replicated or examined, there is an argument to be made that the research is unethical.

In this exercise, we’re going to look at one student’s attempt at replicating research done by Marwick. We’re going to replicate the replication.

The student, Hope Loiselle, chose to replicate one aspect of Marwick’s published work detailed in this article. Loieselle used Marwick’s data which he put online at the data sharing site Figshare. (There are many places to find archaeological data online!).

We are going to redo the analysis, and compare our result with Loiselle and Marwick’s.

Open Marwick’s paper, and read his conclusions. Loiselle chose to focus on the lithic flakes from Tham Lod Area 1 and Ban Rai Area 3. Make a note about what Marwick says about these sites and how they compare. Then, following Loiselle, do the comparison for yourself:

# Hope Loiselle's Code
# your task: annotate it with what you think
# she's trying to do; and compare it with
# Marwick's original paper: what aspect of his analysis
# does this bit of code address, and do your findings
# match up with his?
library(curl)
library(stringr)
library(dplyr)
library(tidyr)

library(tidyverse)
flakes_TL <- read.csv(curl("https://raw.githubusercontent.com/benmarwick/teaching-replication-in-archaeology/refs/heads/master/analysis/supplementary-materials/submitted-assigments/Hope-Loiselle/Tham_Lod_Area_1_lithics-1.csv"), header = TRUE )
flakes_BR <- read.csv(curl("https://raw.githubusercontent.com/benmarwick/teaching-replication-in-archaeology/refs/heads/master/analysis/supplementary-materials/submitted-assigments/Hope-Loiselle/Ban_Rai_Area_3_lithics-1.csv"), header = TRUE) 

##- new cell

# dorsal cortex and dorsal scars
TL_dorsal <-
  flakes_TL %>%
  select(DORSAL_COR, DORSAL_SCA, SITE, EXCAVATION)

BR_dorsal <-
  flakes_BR %>%
  select(DORSAL_COR, DORSAL_SCA, SITE, EXCAVATION)

TL_BR_dorsal <- bind_rows(TL_dorsal, BR_dorsal)

##- new cell

ggplot(TL_BR_dorsal, aes(SITE, DORSAL_COR)) +
  geom_boxplot()

dorsal_cortex_proportion <-
  TL_BR_dorsal %>%
  group_by(SITE, EXCAVATION, DORSAL_COR) %>%
  tally() %>%
  mutate(DORSAL_COR = ifelse(DORSAL_COR == 0, "zero", "not zero")) %>%
  group_by(SITE, EXCAVATION, DORSAL_COR) %>%
  tally() %>%
  filter(!is.na(DORSAL_COR)) %>%
  spread(DORSAL_COR, n) %>%
  mutate(dorsal_proportion = zero / (`not zero` + zero))

ggplot(dorsal_cortex_proportion, aes(SITE, dorsal_proportion)) +
  geom_boxplot()

Having done that, read Loiselle’s conclusion (available in the repository here; it’s in Word format, so hit the ‘download’ icon to grab it then open it in Word).

Compare your results with Loiselle’s work and Marwick’s paper. Do you think his conclusion stands up? How about Loiselle’s?

Marwick, B., L. Wang, R. Robinson, H. Loiselle, (2019). Compendium of R code and data for “How to use replication assignments for teaching integrity in empirical archaeology”. Online at https://doi.org/10.17605/OSF.IO/DBSW9