[1]宋丽红,汪文义.多维标准参照测验下分数报告质量评价指标[J].江西师范大学学报(自然科学版),2019,(04):368-375.[doi:10.16357/j.cnki.issn1000-5862.2019.04.07]
 SONG Lihong,WANG Wenyi.The Quality Evaluation Index for Score Reporting in Multidimensional Criterion-Referenced Tests[J].Journal of Jiangxi Normal University:Natural Science Edition,2019,(04):368-375.[doi:10.16357/j.cnki.issn1000-5862.2019.04.07]
点击复制

多维标准参照测验下分数报告质量评价指标()
分享到:

《江西师范大学学报》(自然科学版)[ISSN:1006-6977/CN:61-1281/TN]

卷:
期数:
2019年04期
页码:
368-375
栏目:
心理与教育测量
出版日期:
2019-08-10

文章信息/Info

Title:
The Quality Evaluation Index for Score Reporting in Multidimensional Criterion-Referenced Tests
文章编号:
1000-5862(2019)04-0368-08
作者:
宋丽红1汪文义2
1.江西师范大学初等教育学院,江西 南昌 330022; 2.江西师范大学计算机学院,江西 南昌 330022
Author(s):
SONG Lihong1WANG Wenyi2
1.Elementary Education College,Jiangxi Normal University,Nanchang Jiangxi 330022,China; 2.College of Computer Information Engineering,Jiangxi Normal University,Nanchang Jiangxi 330022,China
关键词:
多维项目反应理论 分数报告 决策规则 分类准确性 分类一致性
Keywords:
multidimensional item response theory score reporting decision rule classification accuracy classification consistency
分类号:
B 841.7
DOI:
10.16357/j.cnki.issn1000-5862.2019.04.07
文献标志码:
A
摘要:
标准参照测验主要关注学生在特定内容、知识或技能上的掌握程度和表现水平.分数报告中表现水平的分类信度和效度,通常采用分类一致性和分类准确性进行评价.首先介绍多维测验下的分类决策规则; 然后介绍多维项目反应理论模型下3类分类一致性和分类准确性指标,一类是基于总分量尺的指标、另外2类分别是基于似然函数和信息矩阵定义在能力量尺的指标; 同时还介绍了这些指标的作用; 最后指出分类一致性和分类准确性可以用于评价标准参照测验子分数的分类信度和效度,还可以指导计算机分类测验选题和组卷.
Abstract:
For criterion-referenced tests,classification consistency and accuracy are important indicators for evaluating the reliability and validity of classification results in scores reporting.Numerous procedures have been proposed to estimate these indices in the framework of unidimensional item response theory(UIRT).Multidimensional item response theory(MIRT)has been devoted to models that include more than one latent trait to account for the multidimensional nature of complex constructs.MIRT has been successfully employed to analyze many criterion-referenced tests.Because MIRT has enjoyed tremendous growth,the purpose of this study will give a brief review of decision rules and three types of classification consistency and accuracy.The first one is the classification accuracy and consistency based on total sum scores,the second is the likelihood-based consistency and accuracy,and the last is the information-based consistency and accuracy.Finally,two practical implications of this research have been identified.First,it is easily to estimate classification consistency and accuracy indices for subscores or composite scores in each knowledge,content or skill area when the true cut scores were on the total score or latent ability scale.Second,they might be useful for developing test construction method in a multistage testing which is a form of computerized adaptive classification testing for making classification decisions.

参考文献/References:

[1] 戴海琦.心理测量学[M].北京:高等教育出版社,2010.
[2] 甘良梅,余嘉元.标准参照测验分数体系的探讨研究[J].心理学探新,2006,26(3):79-83.
[3] 辛涛,李勉,任晓琼.基础教育质量监测报告撰写与结果应用[M].北京:北京师范大学出版集团,2015.
[4] Duncan A.Address by the secretary of education at the 2009 governors education symposium:states will lead the way towards reform[EB/OL].http://www2.ed.gov/news/speeches/2009/06/06142009.pdf.
[5] Douglas K M,Mislevy R J.Estimating classification accuracy for complex decision rules based on multiple scores[J].Journal of Educational and Behavioral Statistics,2010,35(3):280-306.
[6] 陈平,李珍,辛涛,等.标准参照测验决策一致性指标研究的总结与展望[J].心理发展与教育,2011(2):210-215.
[7] Lee W C,Brennan R L,Wan L.Classification consistency and accuracy for complex assessments under the compound multinomial model[J].Applied Psychological Measurement,2009,33(5):374-390.
[8] Guo Fanmin.Expected classification accuracy using the latent distribution[J].Practical Assessment,Research and Evaluation,2006,11(6):1-6.
[9] Lathrop Q N,Cheng Ying.Two approaches to estimation of classification accuracy rate under item response theory[J].Applied Psychological Measurement,2013,37(3):226-241.
[10] Lathrop Q N,Cheng Ying.A nonparametric approach to estimate classification accuracy and consistency[J].Journal of Educational Measurement,2014,51(3):318-334.
[11] Lee W C.Classification consistency and accuracy for complex assessments using item response theory[J].Journal of Educational Measurement,2010,47(1):1-17.
[12] Wyse A E,Hao Shiqi.An evaluation of item response theory classification accuracy and consistency indices[J].Applied Psychological Measurement,2012,36(7):602-624.
[13] Rudner L M.Expected classification accuracy[J].Practical Assessment Research and Evaluation,2005,10(13):1-4.
[14] Yao Lihua.Classification accuracy and consistency indices for summed scores enhanced using mirt for test of mixed item types[EB/OL].[2018-12-16].http://www.bmirt.com/8220.html.
[15] LaFond L J.Decision consistency and accuracy indices for the bifactor and testlet response theory models detecting heterogeneity in logistic regression models[EB/OL].[2018-12-21].https://ir.uiowa.edu/etd/1346.
[16] Debeer D,Buchholz J,Hartig J,et al.Student,school,and country differences in sustained test-taking effort in the 2009 pisa reading assessment[J].Journal of Educational and Behavioral Statistics,2014,39(6):502-523.
[17] Makransky G,Mortensen E L,Glas C A W.Improving personality eacet scores with multidimensional computer adaptive testing:an illustration with the Neo Pi-R[J].Assessment,2012,20(1):3-13.
[18] Rijmen F,Jeon M,von Davier M,et al.A third-order item response theory model for modeling the effects of domains and subdomains in large-scale educational assessment surveys[J].Journal of Educational and Behavioral Statistics,2014,39(4):235-256.
[19] Yao Lihua,Boughton K A.A multidimensional item response modeling approach for improving subscale proficiency estimation and classification[J].Applied Psychological Measurement,2007,31(2):1-23.
[20] Zhang Jinming.Calibration of response data using MIRT models with simple and mixed structures[J].Applied Psychological Measurement,2012,36(5):375-398.
[21] Cai Li.High-dimensional exploratory item factor analysis by a metropolis-hastings robbins-monro algorithm[J].Psychometrika,2010,75(1):33-57.
[22] Reckase M D.Multidimensional item response theory[M].New York:Springer,2009.
[23] 刘红云,骆方,王玥,等.多维测验项目参数的估计:基于SEM与MIRT 方法的比较[J].心理学报,2012,44(11):121-132.
[24] 杜文久,肖涵敏.多维项目反应理论等级反应模型[J].心理学报,2012,44(10):1402-1407.
[25] 康春花,辛涛.测验理论的新发展:多维项目反应理论[J].心理科学进展,2010,18(3):530-536.
[26] 涂冬波,蔡艳,戴海琦,等.多维项目反应理论:参数估计及其在心理测验中的应用[J].心理学报,2011,43(11):1329-1340.
[27] 许志勇,丁树良,钟君.高考数学试卷多维项目反应理论的分析及应用[J].心理学探新,2013,33(5):438-443.
[28] 詹沛达,王文中,王立君,等.多维题组效应Rasch 模型[J].心理学报,2014,46(8):1208-1222.
[29] 汪文义,宋丽红,丁树良.复杂决策规则下MIRT的分类准确性和分类一致性[J].心理学报,2016,48(12):1612-1624.
[30] Wang Wenyi,Song Lihong,Ding Shuliang,et al.Estimating classification accuracy and consistency indices for multidimensional latent ability[EB/OL].[2018-10-12].https://link.spriger.com/chapter/10.1007%2F978-3-319-38759-8-8.
[31] Wang Wenyi,Song Lihong,Ding Shuliang.An extension of rudner-based consistency and accuracy indices for multidimensional item response theory[EB/POL].[2018-12-11].www.doc88.com/p-3149195293902.html.
[32] Chalmers R P.MIRT:a multidimensional item response theory package for the r environment[J].Journal of Statistical Software,2012,48(6):1-29.
[33] Henderson-Montero D,Julian M W,Yen W M.Multiple measures alternative design and analysis models[J].Educational Measurement:Issues and Practice,2003,22(2):7-12.
[34] Chester M D.Multiple measures and high-stakes decisions a framework for combining measures[J].Educational Measurement:Issues and Practice,2003,22(2):32-41.
[35] McBee M T,Peters S J,Waterman C.Combining scores in multiple-criteria assessment systems:the impact of combination rule[J].Gifted Child Quarterly,2014,58(1):69-89.
[36] Carroll P E,Bailey A L.Do decision rules matter?A descriptive study of english language proficiency assessment classifications for english-language learners and native english speakers in fifth grade[J].Language Testing,2016,33(1):23-52.
[37] Abedi J.The no child left behind act and english language learners:assessment and accountability issues[J].Educational Researcher,2004,33(1):4-14.
[38] Chang Huahua.Making computerized adaptive testing diagnostic tools for schools[C]∥Lissitz R W,Hong Jiao.Computers and their impact on state assessment:recent history and predictions for the future.Charlotte,NC:Information Age Publisher Inc,2012:195-226.
[39] Wang Chun.On latent trait estimation in multidimensional compensatory item response models[J].Psychometrika,2015,80(2):428-449.
[40] Ackerman T A.Full-information factor analysis for polytomous item responses[J].Applied Psychological Measurement,1994,18(3):257-275.
[41] Yao Lihua,Schwarz R D.A multidimensional partial credit model with associated item and test statistics:an application to mixed-format tests[J].Applied Psychological Measurement,2006,30(6):469-492.
[42] Chang Huahua.The asymptotic posterior normality of the latent trait for polytomous irt models[J].Psychometrika,1996,61(3):445-463.
[43] Samejima F.Estimation of latent ability using a response pattern of graded scores[J].Psychometrika,1969,34(1):1-97.
[44] Chang Huahua,Stout W.The asymptotic posterior normality of the latent trait in an irt model[J].Psychometrika,1993,58(1):37-52.
[45] Cheng Ying,Liu Cheng,Behrens J.Standard error of ability estimates and the classification accuracy and consistency of binary decisions[J].Psychometrika,2015,80(3):645-664.
[46] 辛涛,谢敏.群体水平领域分数及其估计方法[J].心理发展与教育,2010(4):416-422.
[47] Yao Lihua.Multidimensional linking for domain scores and overall scores for nonequivalent groups[J].Applied Psychological Measurement,2010,35(1):48-66.

备注/Memo

备注/Memo:
收稿日期:2019-02-17
基金项目:江西省教育科学“十二五”规划一般课题(13YB032)资助项目.
作者简介:宋丽红(1981-),女,江西新干人,副教授,博士,主要从事教育测量研究.E-mail:viviansong1981@163.com
更新日期/Last Update: 2019-08-10