Simulating data using 2 parameter logistic item response theory model using R

July 01, 2023

# Install and load the required package

install.packages("ltm")

library(ltm)

# Set the item parameters

item_difficulty <- c(-1, 0, 1) # Item difficulties

item_discrimination <- c(0.5, 1, 1.5) # Item discriminations

# Set the latent trait distribution

latent_mean <- 0 # Mean of the latent trait distribution

latent_sd <- 1 # Standard deviation of the latent trait distribution

# Set the sample size

sample_size <- 1000

# Simulate latent trait values

latent_trait <- rnorm(sample_size, latent_mean, latent_sd)

# Simulate item responses

item_responses <- matrix(NA, nrow = sample_size, ncol = length(item_difficulty))

for (i in 1:sample_size) {

for (j in 1:length(item_difficulty)) {

# Calculate the probability of a correct response using the 2PL model

prob <- 1 / (1 + exp(-item_discrimination[j] * (latent_trait[i] - item_difficulty[j])))

# Generate a binary response based on the probability

item_responses[i, j] <- rbinom(1, 1, prob)

}

# Create a data frame with the simulated data

simulated_data <- data.frame(latent_trait, item_responses)

# Print the first few rows of the simulated data

head(simulated_data)

We start by installing and loading the "ltm" package, which provides functions for Item Response Theory analysis in R.

We set the item parameters: item_difficulty represents how difficult each item is, and item_discrimination indicates how well each item can distinguish between individuals with different latent traits.

Next, we define the distribution of the latent trait. In this example, we assume it follows a normal distribution with a mean of 0 and a standard deviation of 1.

We specify the sample size, which determines the number of individuals for whom we'll simulate data.

Using the parameters defined above, we generate random values for the latent trait. These values represent the underlying ability or proficiency of each individual in the sample.

We then create an empty matrix called item_responses to store the simulated item responses. The number of rows in the matrix corresponds to the sample size, and the number of columns matches the number of items.

We loop through each individual and each item to simulate their responses. For each combination, we calculate the probability of a correct response using the 2-parameter logistic model. The model takes into account the item parameters and the latent trait value for that individual.

Based on the calculated probability, we generate a binary response (either 0 or 1) using a random process that mimics the probability distribution.

Finally, we create a data frame called simulated_data to store the simulated data. It includes the latent trait values and the item responses. The head() function is used to display the first few rows of the simulated data.

In summary, the code generates simulated data based on Item Response Theory. It simulates item responses for a sample of individuals with known item parameters and a specified distribution of the latent trait. This simulated data can be used for various purposes, such as testing and evaluating statistical models and analysis methods.

Psychometrika

Simulating data using 2 parameter logistic item response theory model using R

Comments

Post a Comment

Popular posts from this blog

What does it mean to integrate out a variable?

How to use Classical test theory to bring different tests on the same scale?

Gumble Max trick and softmax using R