参考文献/References:
[1] 戴海琦.心理测量学[M].北京:高等教育出版社,2010.
[2] 甘良梅,余嘉元.标准参照测验分数体系的探讨研究[J].心理学探新,2006,26(3):79-83.
[3] 辛涛,李勉,任晓琼.基础教育质量监测报告撰写与结果应用[M].北京:北京师范大学出版集团,2015.
[4] Jiang Yu,Zhang Jiahui,Xin Tao.Toward education quality improvement in China:a brief overview of the national assessment of education quality[J].Journal of Educational and Behavioral Statistics,2019,44(6):733-751.
[5] Carroll P E,Bailey A L.Do decision rules matter?A descriptive study of english language proficiency assessment classifications for english-language learners and native english speakers in fifth grade[J].Language Testing,2016,33(1):23-52.
[6] Douglas K M,Mislevy R J.Estimating classification accuracy for complex decision rules based on multiple scores[J].Journal of Educational and Behavioral Statistics,2010,35(3):280-306.
[7] Fein M.Test development:fundamentals for certification and evaluation[M].Danvers:ASTD Press,2012.
[8] 汪存友,余嘉元.标准参照测验及格线设定研究中的模拟实验法[J].心理学探新,2009,29(2):81-85.
[9] Bock R D,Thissen D,Zimowski M F.IRT estimation of domain scores[J].Journal of Educational Measurement,1997,34(3):197-211.
[10] Pommerich M,Nicewander W A.Estimating average domain scores[J].Journal of Educational Measurement,1999,36(3):199-216.
[11] Fu Jianbin,Qu Yanxuan.A review of subscore estimation methods[EB/OL].[2019-10-13].https://onlinelibrary.wiley.com/doi/pdf/10.1002/ets2.12203.
[12] Wainer H,Sheehan K M,Wang Xiaohui.Some paths toward making praxis scores more useful[J].Journal of Educational Measurement,2000,37(2):113-140.
[13] Welborn C A,Lester D,Parnell J.Using act subscores to identify at risk students in business statistics and principles of management courses[J].Journal of Education for Business,2015,90(6):328-334.
[14] Reckase M D,Xu Jingru.The evidence for a subscore structure in a test of english language competency for english language learners[J].Educational and Psychological Measurement,2015,75(5):805-825.
[15] Yao Lihua.Reporting valid and reliable overall scores and domain scores[J].Journal of Educational Measurement,2010,47(3):39-360.
[16] de la Torre J,Song Hao,Hong Yuan.A comparison of four methods of IRT subscoring[J].Applied Measurement in Education,2011,35(4):296-316.
[17] Sinharay S,Puhan G,Haberman S J.An NCME instructional module on subscores[J].Educational Measurement: Issues and Practice,2011,30(3):29-40.
[18] Sinharay S.Added value of subscores and hypothesis testing[J].Journal of Educational and Behavioral Statistics,2019,44(1):25-44.
[19] Yen W M.A bayesian/IRT index of objective performance[EB/OL].[2019-10-13].http://www.ets.org/Media/Research/pdf/Yen_OPI_1987.pdf.
[20] Wainer H,Vevea J L,Camacho F,et al.Augmented scores-"borrowing strength" to compute scores based on small numbers of items[M]∥Thissen D,Wainer H.Test scoring.Mahwah,NJ:Lawrence Erlbaum Associates,Inc,2001:343-387.
[21] Haberman S J.When can subscores have value?[J].Journal of Educational and Behavioral Statistics,2008,33(2):204-229.
[22] Liu Yue,Li Zhen,Liu Hongyun.Reporting valid and reliable overall scores and domain scores using bi-factor model[J].Applied Psychological Measurement,2018,43(7):1-15.
[23] Reckase M D.Multidimensional item response theory[M].New York:Springer,2009.
[24] de la Torre J,Song Hao.Simultaneous estimation of overall and domain abilities:a higher-order irt model approach[J].Applied Measurement in Education,2009,33(8):620-639.
[25] Yao Lihua,Boughton K A.A multidimensional item response modeling approach for improving subscale proficiency estimation and classification[J].Applied Psychological Measurement,2007,31(2):1-23.
[26] 马世晔,章建石.基于考试结果挖掘的教育评价:理论与实践[J].心理学探新,2012,32(5):461-465.
[27] Liu Ren,Qian Hong,Luo Xiao,et al.Relative diagnostic profile:a subscore reporting framework[J].Educational and Psychological Measurement,2018,78(6):1072-1088.
[28] 康春花,杨亚坤,曾平飞.海明距离判别法分类准确率的影响因素[J].江西师范大学学报:自然科学版,2017,41(4):394-400.
[29] 罗慧,熊建华,王晓庆,等.基于加权距离的一种认知诊断方法[J].江西师范大学学报:自然科学版,2018,42(1):74-81,88.
[30] Thissen D,Wainer H.Test scoring[M].Mahwah,NJ:Lawrence Erlbaum Associates,Inc,2001.
[31] 张尧庭,方开泰.多元统计分析引论[M].武汉:武汉大学出版社,2013.
[32] de la Torre J,Patz R J.Making the most of what we have:a practical application of multidimensional item response theory in test scoring[J].Journal of Educational and Behavioral Statistics,2005,30(3):295-311.
[33] Haberman S J,Sinharay S.Reporting of subscores using multidimensional item response theory[J].Psychometrika,2010,75(2):209-227.
[34] Haberman S,Sinharay S,Puhan G.Reporting subscores for institutions[J].The British Journal of Mathematical and Statistical Psychology,2009,62(1):79-95.
[35] 辛涛,谢敏.群体水平领域分数及其估计方法[J].心理发展与教育,2010,26(4):416-422.
[36] 辛涛,谢敏.矩阵取样设计中群体水平领域分数估计方法的精确性比较研究初探[J].中国考试:评价与测量,2011(5):3-12.
[37] 姚建欣,郭玉英.为学生认知发展建模:学习进阶十年研究回顾及展望[J].教育学报,2014,10(5):35-42.
[38] DeCarlo L T.On the analysis of fraction subtraction data:the DINA model,classification,latent class sizes,and the Q-matrix[J].Applied Psychological Measurement,2011,35(1):8-26.
[39] 王孟成,毕向阳.回归混合模型:方法进展与软件实现[J].心理科学进展,2018,26(12):2272-2280.
[40] Briggs D C,Alonzo A C.The psychometric modeling of ordered multiple-choice item responses for diagnostic assessment with a learning progression[C]∥Alonzo A C,Gotwals A W.Learning progressions in science: Current challenges and future directions.Rotterdam,The Netherlands:Sense Publishers,2012:293-316.
[41] 高一珠,陈孚,辛涛,等.心理测量学模型在学习进阶中的应用:理论、途径和突破[J].心理科学进展,2017,25(9):1623-1630.
[42] 宋丽红,汪文义,戴海琦,等.基于贝叶斯网的认知诊断模型构建[J].心理科学,2016,39(4):783-789.
[43] 喻晓锋,丁树良,秦春影,等.贝叶斯网在认知诊断属性层级结构确定中的应用[J].心理学报,2011,43(3):338-346.
[44] Zhan Peida,Ma Wenchao,Jiao Hong,et al.A sequential higher order latent structural model for hierarchical attributes in cognitive diagnostic assessments[J].Applied Psychological Measurement,2019,44(1):1-19.
[45] Wainer H,Feinberg R.For want of a nail:why unnecessarily long tests may be impeding the progress of western civilisation?[J].Significance,2015,12(1):16-21.
[46] Feinberg R A,Wainer H.When can we improve subscores by making them shorter the case against subscores with overlapping items[J].Educational Measurement:Issues and Practice,2014,33(3):47-54.