Simulating data using 2 parameter logistic item response theory model using R

# Install and load the required package
install.packages("ltm")
library(ltm)

# Set the item parameters
item_difficulty <- c(-1, 0, 1)  # Item difficulties
item_discrimination <- c(0.5, 1, 1.5)  # Item discriminations

# Set the latent trait distribution
latent_mean <- 0  # Mean of the latent trait distribution
latent_sd <- 1  # Standard deviation of the latent trait distribution

# Set the sample size
sample_size <- 1000

# Simulate latent trait values
latent_trait <- rnorm(sample_size, latent_mean, latent_sd)

# Simulate item responses
item_responses <- matrix(NA, nrow = sample_size, ncol = length(item_difficulty))
for (i in 1:sample_size) {
  for (j in 1:length(item_difficulty)) {
    # Calculate the probability of a correct response using the 2PL model
    prob <- 1 / (1 + exp(-item_discrimination[j] * (latent_trait[i] - item_difficulty[j])))
    
    # Generate a binary response based on the probability
    item_responses[i, j] <- rbinom(1, 1, prob)
  }
}

# Create a data frame with the simulated data
simulated_data <- data.frame(latent_trait, item_responses)

# Print the first few rows of the simulated data
head(simulated_data)

We start by installing and loading the "ltm" package, which provides functions for Item Response Theory analysis in R.

We set the item parameters: item_difficulty represents how difficult each item is, and item_discrimination indicates how well each item can distinguish between individuals with different latent traits.

Next, we define the distribution of the latent trait. In this example, we assume it follows a normal distribution with a mean of 0 and a standard deviation of 1.

We specify the sample size, which determines the number of individuals for whom we'll simulate data.

Using the parameters defined above, we generate random values for the latent trait. These values represent the underlying ability or proficiency of each individual in the sample.

We then create an empty matrix called item_responses to store the simulated item responses. The number of rows in the matrix corresponds to the sample size, and the number of columns matches the number of items.

We loop through each individual and each item to simulate their responses. For each combination, we calculate the probability of a correct response using the 2-parameter logistic model. The model takes into account the item parameters and the latent trait value for that individual.

Based on the calculated probability, we generate a binary response (either 0 or 1) using a random process that mimics the probability distribution.

Finally, we create a data frame called simulated_data to store the simulated data. It includes the latent trait values and the item responses. The head() function is used to display the first few rows of the simulated data.

In summary, the code generates simulated data based on Item Response Theory. It simulates item responses for a sample of individuals with known item parameters and a specified distribution of the latent trait. This simulated data can be used for various purposes, such as testing and evaluating statistical models and analysis methods.

Comments

Popular posts from this blog

What does it mean to integrate out a variable?

How to use Classical test theory to bring different tests on the same scale?

Gumble Max trick and softmax using R