Bayesian Model Assessment for Jointly Modeling Multidimensional Response Data with Application to Computerized Testing.

Advanced Search

Fang Liu, Xiaojing Wang, Roeland Hancock, Ming-Hui Chen

Author Information

Fang Liu: Northeast Normal University, Changchun, China.
Xiaojing Wang: University of Connecticut, Storrs, , CT, 06250, USA. xiaojing.wang@uconn.edu. ORCID
Roeland Hancock: University of Connecticut, Storrs, , CT, 06250, USA.
Ming-Hui Chen: University of Connecticut, Storrs, , CT, 06250, USA.

PMID: 35349031 DOI: 10.1007/s11336-022-09845-x

Computerized assessment provides rich multidimensional data including trial-by-trial accuracy and response time (RT) measures. A key question in modeling this type of data is how to incorporate RT data, for example, in aid of ability estimation in item response theory (IRT) models. To address this, we propose a joint model consisting of a two-parameter IRT model for the dichotomous item response data, a log-normal model for the continuous RT data, and a normal model for corresponding paper-and-pencil scores. Then, we reformulate and reparameterize the model to capture the relationship between the model parameters, to facilitate the prior specification, and to make the Bayesian computation more efficient. Further, we propose several new model assessment criteria based on the decomposition of deviance information criterion (DIC) the logarithm of the pseudo-marginal likelihood (LPML). The proposed criteria can quantify the improvement in the fit of one part of the multidimensional data given the other parts. Finally, we have conducted several simulation studies to examine the empirical performance of the proposed model assessment criteria and have illustrated the application of these criteria using a real dataset from a computerized educational assessment program.

DIC decomposition IRT models LPML decomposition computerized tests paper-and-pencil tests response times

Bolsinova, M., de Boeck, P., & Tijmstra, J. (2017). Modelling conditional dependence between response time and accuracy. Psychometrika, 82(4), 1126–1148. [PMID: 27738955]
Bolt, D. M., Wollack, J. A., & Suh, Y. (2012). Application of a multidimensional nested logit model to multiple-choice test items. Psychometrika, 77(2), 339–357.
Celeux, G., Forbes, F., Robert, C. P., & Titterington, D. M. (2006). Deviance information criteria for missing data models. Bayesian Analysis, 1(4), 651–673.
Chan, J. C., & Grant, A. L. (2016). Fast computation of the deviance information criterion for latent variable models. Computational Statistics and Data Analysis, 100, 847–859.
Chen, G., & Luo, S. (2018). Bayesian hierarchical joint modeling using skew-normal/independent distributions. Communications in Statistics-Simulation and Computation, 47(5), 1420–1438. [PMID: 30174369]
Chen, M. H., & Shao, Q. M. (1999). Monte Carlo estimation of Bayesian credible and HPD intervals. Journal of Computational and Graphical Statistics, 8(1), 69–92.
Chen, M. H., Shao, Q. M., & Ibrahim, J. G. (2000). Monte Carlo methods in Bayesian computation. Berlin: Springer.
de la Torre, J., & Patz, R. J. (2005). Making the most of what we have: A practical application of multidimensional item response theory in test scoring. Journal of Educational and Behavioral Statistics, 30(3), 295–311.
de Valpine, P., Paciorek, C., Turek, D., Michaud, N., Anderson-Bergman, C., Obermeyer, F. & Paganin, S. (2020). NIMBLE: MCMC, particle filtering, and programmable hierarchical modeling. https://doi.org/10.5281/zenodo.1211190
de Valpine, P., Turek, D., Paciorek, C. J., Anderson-Bergman, C., Lang, D. T., & Bodik, R. (2017). Programming with models: Writing statistical algorithms for general model structures with NIMBLE. Journal of Computational and Graphical Statistics, 26(2), 403–413.
Donkin, C., Averell, L., Brown, S., & Heathcote, A. (2009). Getting more from accuracy and response time data: Methods for fitting the linear ballistic accumulator. Behavior Research Methods, 41(4), 1095–1110. [PMID: 19897817]
Entink, R. K., Fox, J. P., & van der Linden, W. J. (2009). A multivariate multilevel approach to the modeling of accuracy and speed of test takers. Psychometrika, 74(1), 21–48.
Fox, J. P. (2010). Bayesian item response modeling: Theory and applications. Berlin: Springer.
Fox, J. P., & Marianti, S. (2016). Joint modeling of ability and differential speed using responses and response times. Multivariate Behavioral Research, 51(4), 540–553. [PMID: 27269482]
Fujimoto, K. A. (2018). A general Bayesian multilevel multidimensional IRT model for locally dependent data. British Journal of Mathematical and Statistical Psychology, 71(3), 536–560. [PMID: 29882212]
Geisser, S., & Eddy, W. F. (1979). A predictive approach to model selection. Journal of the American Statistical Association, 74(365), 153–160.
Gelfand, A. E., & Dey, D. K. (1994). Bayesian model choice: Asymptotics and exact calculations. Journal of the Royal Statistical Society: Series B, 56(3), 501–514.
Gelfand, A. E., Dey, D. K., & Chang, H. (1992). Model determination using predictive distributions with implementation via sampling-based-methods (with discussion). In A. P. D. J.M. Bernado J.O. Berger & A. Smith (eds), In bayesian statistics 4. Oxford: Oxford University Press.
Gilbert, J. K., Compton, D. L., Fuchs, D., & Fuchs, L. S. (2012). Early screening for risk of reading disabilities: Recommendations for a four-step screening system. Assessment for Effective Intervention, 38(1), 6–14. [PMID: 24478613]
Ibrahim, J. G., Chen, M. H., & Sinha, D. (2001). Bayesian survival analysis. Berlin: Springer.
Jeffreys, H. (1961). The theory of probability (3rd ed.). Oxford, UK: Oxford University Press.
Johnson, T. R. (2003). On the use of heterogeneous thresholds ordinal regression models to account for individual differences in response style. Psychometrika, 68(4), 563–583.
Karadavut, T. (2019). The uniform prior for Bayesian estimation of ability in item response theory models. International Journal of Assessment Tools in Education, 6(4), 568–579.
Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90(430), 773–795.
Li, Y., Yu, J., & Zeng, T. (2020). Deviance information criterion for latent variable models and misspecified models. Journal of Econometrics, 216(2), 450–493.
Lindley, D. V. (1965). Introduction to probability and statistics from a bayesian viewpoint. Cambridge: Cambridge University Press.
Loeys, T., Rosseel, Y., & Baten, K. (2011). A joint modeling approach for reaction time and accuracy in psycholinguistic experiments. Psychometrika, 76(3), 487–503.
Lu, J., Wang, C., Zhang, J., & Tao, J. (2020). A mixture model for responses and response times with a higher-order ability structure to detect rapid guessing behaviour. British Journal of Mathematical and Statistical Psychology, 73(2), 261–288. [PMID: 31385609]
Luce, R. D. (1991). Response times: Their role in inferring elementary mental organization. Oxford: Oxford University Press.
Man, K., Harring, J. R., Jiao, H., & Zhan, P. (2019). Joint modeling of compensatory multidimensional item responses and response times. Applied Psychological Measurement, 43(8), 639–654. [PMID: 31551641]
Merkle, E. C., Furr, D., & Rabe-Hesketh, S. (2019). Bayesian comparison of latent variable models: Conditional versus marginal likelihoods. Psychometrika, 84(3), 802–829. [PMID: 31297664]
Molenaar, D., & de Boeck, P. (2018). Response mixture modeling: Accounting for heterogeneity in item characteristics across response times. Psychometrika, 83(2), 279–297. [PMID: 29392567]
Rouder, J. N., Province, J. M., Morey, R. D., Gomez, P., & Heathcote, A. (2015). The lognormal race: A cognitive-process model of choice and latency with desirable psychometric properties. Psychometrika, 80(2), 491–513. [PMID: 24522340]
Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & Van Der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B, 64(4), 583–639.
Torgesen, J. K., Wagner, R., & Rashotte, C. (2012). Test of word reading efficiency: (TOWRE-2). New York, NY: Pearson.
van der Linden, W. J. (2009). Conceptual issues in response-time modeling. Journal of Educational Measurement, 46(3), 247–272.
van der Linden, W. J. (2017). Handbook of item response theory, volume three: Applications. Boca Raton: Chapman and Hall/CRC.
van der Linden, W. J., & Guo, F. (2008). Bayesian procedures for identifying aberrant response-time patterns in adaptive testing. Psychometrika, 73(3), 365–384.
van der Linden, W. J., & Hambleton, R. K. (2013). Handbook of modern item response theory. Berlin: Springer.
Visual Numerics, I. (2003). Imsl fortran library user’s guide math/library. San Ramon, CA: Visual Numerics Inc.
Wang, X., Saha, A., & Dey, D. K. (2016). Bayesian joint modeling of response times with dynamic latent ability in educational testing (Vol. 3; Tech. Rep.). Department of Statistics, University of Connecticut, Storrs, Connecticut, USA
Watanabe, S. (2010). Asymptotic equivalence of bayes cross validation and widely applicable information criterion in singular learning theory. Journal of Machine Learning Research, 11, 3571–3594.
Zhang, D., Chen, M. H., Ibrahim, J. G., Boye, M. E., & Shen, W. (2017). Bayesian model assessment in joint modeling of longitudinal and survival data with applications to cancer clinical trials. Journal of Computational and Graphical Statistics, 26(1), 121–133. [PMID: 28239247]
Zhang, F., Chen, M. H., Cong, X. J., & Chen, Q. (2021). Assessing importance of biomarkers: A bayesian joint modelling approach of longitudinal and survival data with semi-competing risks. Statistical Modelling, 21(1–2), 30–55. [PMID: 34326706]
Zhang, X., Tao, J., Wang, C., & Shi, N. Z. (2019). Bayesian model selection methods for multilevel IRT models: A comparison of five DIC-based indices. Journal of Educational Measurement, 56(1), 3–27.

Bayes Theorem

Psychometrics

Models, Statistical

Computer Simulation

Probability

Journal Article Research Support, U.S. Gov't, Non-P.H.S. Research Support, Non-U.S. Gov't

OpenLB
Open Library of Bioscience