R in school: the HoD – The wayward teachR

Let’s suppose you are a head of department in a non-selective high school. Let’s also suppose you arrived at your post through years of hard work and a desire to create a fairer society for the children you teach.

One day, you decided to ask yourself:

is there an over-representation of disadvantaged children in the lower sets?

Years of teaching has taught you that disadvantaged does not mean lower ability, but somehow you have this niggling feeling that there are more disadvantaged children languishing in the lower sets.

You are beginning to warm to the idea of re-setting the children under your care next year by redistributing disadvantaged children to each set based on the the school’s overall composition. A “niggling feeling”, though, is not enough and, being logically minded, you know you need something more tangible. You need compelling evidence before you shake things up.

You decide the best place to look for this phenomenon, if it exists, is the Y7 cohort, because they are untainted by years of high-schooling and the pervasively pernicious learned helplessness that comes with it.

You open up your school’s information management system (MIS) and export a list of pupil names, sets, and their disadvantaged status (pupil premium, PP) for one half of the Y7 cohort. It looks something like this:

Armed with this information, you do some quick back-of-the-envelope maths and find that there is in fact an apparent difference in the percentage composition of disadvantaged pupils in each set.

Doing what you did in one PPA lesson, you are rightfully proud of yourself. But you are not happy. Not yet. You can see there is a difference, but you are not sure if this difference is significant. You dust off your trusty old copy of Sheskin and find that the chi-squared test is what you are after.

At first you are elated, but then it quickly dawns on you that doing the chi-squared test by hand is tedious, time-consuming, and … well … spending time with your family is more important.

That is where R comes in.

`R` and the chi-squared test

The chi-squared test can be used to determine whether or not there is a statistically significant relationship between two categorical variables — in this case, disadvantage designation and set.

Though tedious by hand, calculation of the chi-squared statistic, $χ^{2}$ , is straight forward. It takes the form

$χ^{2} = \sum \frac{f_{o} - f_{e}}{f_{e}}$

where $f_{o}$ is the observed frequency and $f_{e}$ is the expected frequency.

If the probability of obtaining your calculated $χ^{2}$ statistic is greater than a desired $p$ -value (you find this in a look-up table), there is no statistically significant relationship between the two categorical variables.

The beauty of R is that this can all be done in just a few lines of R-code. The same code will also spit out the expected number of disadvantaged children in each set.

Let’s `R`

First, import the spreadsheet containing the information you exported from your school’s MIS. Assuming the spreadsheet is called my_exported_data.csv, type

# Import your data into R
df_sets <- read.csv("my_exported_data.csv")

Second, use the {gmodels} package to do the heavy lifting:

# Install gmodels
install.packages("gmodels")

# Use gmodels
gmodels::CrossTable(
  df_sets$PP,
  df_sets$Set,
  digits = 1,
  expected = TRUE,
  prop.t = FALSE,
  prop.r = TRUE,
  prop.c = TRUE,
  prop.chisq = FALSE,
  dnn = c("PP", "Set"),
  format = "SPSS"
)

This should give you the following crosstable:


   Cell Contents
|-------------------------|
|                   Count |
|         Expected Values |
|             Row Percent |
|          Column Percent |
|-------------------------|

Total Observations in Table:  90 

             | Set 
          PP |    Set 1  |    Set 2  |    Set 3  |    Set 4  | Row Total | 
-------------|-----------|-----------|-----------|-----------|-----------|
       FALSE |       23  |       20  |       13  |        5  |       61  | 
             |     20.3  |     16.9  |     13.6  |     10.2  |           | 
             |     37.7% |     32.8% |     21.3% |      8.2% |     67.8% | 
             |     76.7% |     80.0% |     65.0% |     33.3% |           | 
-------------|-----------|-----------|-----------|-----------|-----------|
        TRUE |        7  |        5  |        7  |       10  |       29  | 
             |      9.7  |      8.1  |      6.4  |      4.8  |           | 
             |     24.1% |     17.2% |     24.1% |     34.5% |     32.2% | 
             |     23.3% |     20.0% |     35.0% |     66.7% |           | 
-------------|-----------|-----------|-----------|-----------|-----------|
Column Total |       30  |       25  |       20  |       15  |       90  | 
             |     33.3% |     27.8% |     22.2% |     16.7% |           | 
-------------|-----------|-----------|-----------|-----------|-----------|

 
Statistics for All Table Factors


Pearson's Chi-squared test 
------------------------------------------------------------
Chi^2 =  11.0147     d.f. =  3     p =  0.01164667 


 
       Minimum expected frequency: 4.833333 
Cells with Expected Frequency < 5: 1 of 8 (12.5%)

NULL

And there it is.

The results indicate $p$ = 0.0116. Since this is a smaller value than 0.05, you can say with 95% confidence that there is a statistically significant difference in the number of disadvantaged children in each set than would be expected by chance.

Disclaimer

Before you send me hate mail for social engineering, please understand the following:

schooling iteself is a big social engineering experiment;
for non-selective schools to selectively set children is orwellian double-speak at best;
this is a hypothetical school with a hypothetically egalitarian head of department. In most schools, deep forces will intervene to prevent a level playing field.

Corrections

If you spot any mistakes or want to suggest changes, please let me know through the usual channels.

Citation

BibTeX citation:

@online{teachr2021,
  author = {teachR, wayward},
  title = {R in School: The {HoD}},
  date = {2021-07-18},
  url = {https://thewaywardteachr.netlify.app/posts/2021-07-18-r-in-school-hod/r-in-school-hod.html},
  langid = {en}
}

For attribution, please cite this work as:

teachR, wayward. 2021. “R in School: The HoD.” July 18, 2021. https://thewaywardteachr.netlify.app/posts/2021-07-18-r-in-school-hod/r-in-school-hod.html.

R and the chi-squared test

Let’s R

Disclaimer

Corrections

Citation

`R` and the chi-squared test

Let’s `R`