 WANG Wenyi,CHANG Hua-hua.A Practical View of Test Fairness to Improve Equity in Education from Statistical Measurement[J].Journal of Jiangxi Normal University:Natural Science Edition,2017,(04):383-393.





A Practical View of Test Fairness to Improve Equity in Education from Statistical Measurement
1.江西师范大学计算机信息工程学院,江西 南昌 330022; 2.伊利诺伊大学香槟分校心理系,伊利诺伊州 香槟 61820; 3.华东师范大学教育学部,上海 200062
WANG WenyiCHANG Hua-hua
1.College of Computer Information Engineering,Jiangxi Normal University,Nanchang Jiangxi 330022,China; 2.Department of Psychology,University of Illinois at Urbana-Champaign,Champaign,IL 61820,USA; 3.Faculty of Education,East China Normal University,Sh
考试公平 教育公平 项目功能差异 统计测量 高考
the fairness of testing equity in education differential item functioning statistical measurement the national college entrance examination
B 841.7
If the result of a test is unfair,it will affect the fairness of the educational opportunity and social fairness.Statistical analyses for test fairness in our country has been neglected and even ignored for a long time.The purpose is to give a review of key aspects concerning differential item/test functioning from the perspective of statistical measurement.Finally,regarding the problem of the fairness of testing in the context of high-stakes test use,some detailed and practical suggestions for test fairness are presented for readers’ reference.


[1] OECD.Equity and quality in education:Supporting disadvantaged students and schools [EB/OL].
[2] UNESCO.Education 2030:Incheon declaration and framework for action-towards inclusive and equitable quality education and lifelong learning for all.[EB/OL].
[3] 王旖旎.教育测评中的不公正问题:项目功能差异 [J].中国远程教育,1999(8):39-41,63.
[4] 中国教育学会教育测量与统计分会.项目功能差异 [J].中国考试,2003(Z4):51.
[5] Zieky M.Fairness reviews in assessment [C]// Downing S M,Haladyna T M.Handbook of test development,lawrence erlbaum associates.Inc:Mahwah,NJ,2006.
[6] Dorans N J,Sinharay S.Looking back:proceedings of a conference in honor of Paul W.Holland [M].New York,NY:Springer,2011.
[7] Osterlind S J,Everson H T.Differential item functioning [M].2nd.Thousand Oaks,CA:SAGE Publications,Inc,2009.
[8] Holland P W,Wainer H.Differential item functioning [M].New York:Routledge,Taylor & Francis Group,1993.
[9] Association A E R,Association A P,Education N C o M i.Standards for educational and psychological testing [M].ishington,DC:AERA,1999.
[10] Association A E R,Association A P,Education N C o M i.Standards for educational and psychological testing [M].ishington,DC:American Educational Research Association,2014.
[11] Psychology S f I a O.Principles for the validation and use of personnel selection procedures [M].Bowling Green,OH:Society for Industrial and Organizational Psychology,Inc,2003.
[12] 刘铁川,戴海琦,赵玉.现代测量理论观点下的测验偏差评价 [J].中国临床心理学杂志,2012,20(3):346-349.
[13] Jehangir K,van den Berg S M,Glas C A W.Correcting for differential item functioning in multi-level regression models in cross-national surveys [J].Measurement,2015,66:263-271.
[14] Lee H,Geisinger K F.The matching criterion purification for differential item functioning analyses in a large-scale assessment [J].Educational and Psychological Measurement,2015,76(1):141-163.
[15] Tay L,Huang Q,Vermunt J K.Item response theory with covariates(IRT-C):assessing item recovery and differential Item functioning for the three-parameter logistic model [J].Educational and Psychological Measurement,2015,76(1):22-42.
[16] Koo J,Becker B J,Kim Y S.Examining differential item functioning trends for English language learners in a reading test:a meta-analytical approach [J].Language Testing,2013,31(1):89-109.
[17] Latifi S,Bulut O,Gierl M,et al.Differential performance on national exams:evaluating item and bundle functioning methods using english,mathematics,and science assessments [J].SAGE Open,2016,6(2):1-14
[18] Cavanagh A,Wilson C J,Caputi Petal.Symptom endorsement in men versus women with a diagnosis of depression:a differential item functioning approach [J].International Journal of Social Psychiatry,2016,62(6):549-559.
[19] Chalmers R P,Counsell A,Flora D B.It might not make a big DIF:improved differential test functioning statistics that account for sampling variability [J].Educational and Psychological Measurement,2015,76(1):114-140.
[20] Berger M,Tutz G.Detection of uniform and nonuniform differential item functioning by item-focused trees [J].Journal of Educational and Behavioral Statistics,2016,41(6):559-592.
[21] ETS.ETS standards for quality and fairness [M].Princeton,NJ:Educational Testing Service,2014.
[22] ETS.ETS guidelines for fair tests and communications [M].Princeton,NJ:Educational Testing Service,2015.
[23] ETS.ETS international principles for fairness review of assessments [M].Princeton,NJ:Educational Testing Service,2009.
[24] ACT.Fairness report for the ACT tests [M].Iowa City,IA:ACT,Inc,2012.
[25] Testing A C.Technical manual the ACT [M].Iowa City,IA:ACT,Inc,2014.
[26] SBAC.Smarter balanced assessment consortium:2014-2015 technical report [M].Los Angeles,CA:Smarter Balanced Assessment Consortium,2016.
[27] PARCC.PARCC accessibility features and accommodaions manual 2016-2017 [M].Parcc Inc.ishington,DC:PARCC Assessment Consoria,2016.
[28] 曹亦薇,张厚粲.汉语词汇测验中的项目功能差异初探 [J].心理学报,1999,31(4):460-467.
[29] 曹亦薇.项目功能差异在跨文化人格问卷分析中的应用 [J].心理学报,2003,35(1):120-126.
[30] 相阳.利用数理统计方法进行测验偏差分析 [J].数学的实践与认识,1993(3):26-33,98.
[31] 宋丽红.LDFA方法及其在项目功能差异分析中的应用研究:以高考英语试卷分析为例 [D].南昌:江西师范大学,2008.
[32] 柴省三.汉语水平考试(HSK)阅读理解测验公平性研究 [J].语言文字应用,2013(4):107-116.
[33] 黄春霞.第二语言学习者专业背景对HSK阅读成绩影响的项目功能差异检验 [J].考试研究,2011(5):59-66.
[34] 肖园园.大学英语四级考试对不同学术背景和不同性别学生的项目功能差异研究 [D].广州:广东外语外贸大学,2013.
[35] 张颖,赵世明.医师资格考试中的项目功能差异研究 [J].中国考试,2004(10):23-26.
[36] 李现文,刘海宁,安静.老年抑郁量表城乡项目功能差异分析 [J].中国全科医学,2016,19(9):1002-1005.
[37] 耿亮,竺培梁.情绪智力量表(EIS)中文版的项目功能差异分析 [J].外国中小学教育,2008(9):42-46.
[38] 肖影影,毕重增,狄轩康.一般自我效能感量表的性别与跨文化项目功能差异分析 [J].心理研究,2013,6(5):38-41.
[39] 王蕾,黄晓婷.国际教育成效评价协会儿童认知发展状况测验项目功能差异分析 [J].考试研究,2006,2(4):94-107.
[40] 朱乙艺,韦小满.我国成就测验的项目功能差异研究述评 [J].教育与考试,2012(1):78-81.
[41] Holland P W.On the study of differential item performance without IRT [C]//Proceedings of the 27th Annual Conference of the Military Testing Association,San Diego,CA:1985,282-287.
[42] Holland P W,Thayer D T.Differential item functioning and the Mantel-Haenszel procedure [C]// Wainer H,Braun H I.Test validity,L.Erlbaum Associates:Hillsdale,NJ,1988:129-145.
[43] Mellenbergh G J.Contingency table models for assessing item bias [J].Journal of Educational Statistics,1982,7(2):105-118.
[44] Miller T R,Spray J A.Logistic discriminant function analysis for DIF identification of polytomously scored items [J].Journal of Educational Measurement,1993,30(2):107-122.
[45] Dorans N J,Potenza T M.Equity assessment for polytomously scored items:a taxonomy of procedures for assessing differential item functioning(Research Rep.RR-94-49)[M].Princeton,NJ:Educational Testing Service,1994.
[46] Chang Hua-hua,Mazzeo J,Roussos L.Detecting DIF for polytomously scored items:an adaptation of the SIBTEST procedure [J].Journal of Educational Measurement,1996,33(3):333-353.
[47] Millsap R E,Everson H T.Methodology review:Statistical approaches for assessing measurement bias [J].Applied Psychological Measurement,1993,17(4):297-334.
[48] 张龙,涂冬波.多级计分题项目功能差异常用检测方法及比较 [J].江西师范大学学报:自然科学版,2015,39(5):441-448.
[49] Roussos L A,Stout W F.Simulation studies of the effects of small sample size and studied item parameters on SIBTEST and Mantel-Haenszel type I error performance [J].Journal of Educational Measurement,1996,33(2):215-230.
[50] Chang Hua-hua,Mazzeo J.The unique correspondence of the item response function and item category response functions in polytomously scored item response models [J].Psychometrika,1994,59(3):391-404.
[51] Chang Hua-hua.A note on the monotonicity of the IRFs for polytomous IRT models [M].Princeton,NJ:Educational Testing Service,1994.
[52] Lord F M.The relative efficiency of two tests as a function of ability level [J].Psychometrika,1974,39(3):351-358.
[53] Scheuneman J.A method of sssessing nias in test items [J].Journal of Educational Measurement,1979,16(3):143-152.
[54] Baker F B.A criticism of Scheuneman’s item bias technique [J].Journal of Educational Measurement,1981,18(1):59-62.
[55] Scheuneman J D.A response to Baker’s criticism [J].Journal of Educational Measurement,1981,18(1):63-66.
[56] Marascuilo L A,Slaughter R E.Statistical procedures for identifying possible sources of item bias based on χ2 statistics [J].Journal of Educational Measurement,1981,18(4):229-248.
[57] Mantel N,Haenszel W.Statistical aspects of the analysis of Data from retrospective studies of disease [J].Journal of the National Cancer Institute,1959,22(4):719-748.
[58] Zwick R,Donoghue J R,Grima A.Assessing differential item functioning in performance tests(Research Rep.RR-93-14)[M].Princeton,NJ:Educational Testing Service,1993.
[59] Zwick R,Donoghue J R,Grima A.Assessment of differential item functioning for performance tasks [J].Journal of Educational Measurement,1993,30(3):233-251.
[60] Mantel N.Chi-square tests with one degree of freedom extensions of the mantel-haenszel procedure [J].Journal of the American Statistical Association,1963,58(303):690-700.
[61] Dorans N J,Kulick E.Demonstrating the utility of the standardization approach to assessing unexpected differential item performance on the scholastic aptitude test [J].Journal of Educational Measurement,1986,23(4):355-368.
[62] Dorans N J,Schmitt A P.Constructed response and differential item functioning:a pragmatic approach(Research Rep.RR-91-47)[M].Hillsdale,NJ:Educational Testing Service,1991.
[63] Dorans N J,Kulick E.Assessing unexpected differential item performance of female candidates on SAT and TSWE forms administered in december 1977:An application of the standardization approach(Research Rep.RR-83?9)[M].Princeton,NJ:Educational Testing Service,1983.
[64] Dorans N J,Schmitt A P,Bleistein C A.The standardization approach to assessing comprehensive differentiali item functioning [J].Journal of Educational Measurement,1992,29(4):309-319.
[65] Swaminathan H,Rogers H J.Detecting differential item functioning using logistic regression procedures [J].Journal of Educational Measurement,1990,27(4):361-370.
[66] Jodoin M G,Gierl M J.Evaluating type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection [J].Applied Measurement in Education,2001,14(4):329-349.
[67] Zumbo B D.A handbook on the theory and methods of differential item functioning(DIF):logistic regression modeling as a unitary framework for binary and likert-type(ordinal)item scores [M].Ottawa,ON:Directorate of Human Resources Research and Evaluation,Department of National Defense,1999.
[68] Shealy R,Stout W.A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias DTF as well as item bias/DIF [J].Psychometrika,1993,58(2):159-194.
[69] Noortgate W V d,Boeck P D.Assessing and explaining differential item functioning using logistic mixed models [J].Journal of Educational and Behavioral Statistics,2005,30(4):443-464.
[70] 吉丽.科举考试公平公正研究 [J].扬州大学学报:高教研究版,2011,15(1):28-32.
[71] Zwick R.A review of ETS differential item functioning assessment procedures:flagging rules,minimum sample size requirements,and criterion refinement(Research Rep.RR-12-08)[M].Princeton,NJ:Educational Testing Service,2012.
[72] Karami H,Nodoushan M A S.Differential item functioning(DIF):current problems and future [J].International Journal of Language Studies,2011,5(3):133-142.
[73] Salehi M,Tayebi A.Differential item functioning:implications for test validation [J].Journal of Language Teaching and Research,2012,3(1):84-92.
[74] 张华华,汪文义.“互联网+”测评自适应学习之路 [J].江西师范大学学报:自然科学版,2016,40(5):441-455.


更新日期/Last Update: 1900-01-01