High-Stakes Testing and Mathematics Performance of Fourth
Transcription
High-Stakes Testing and Mathematics Performance of Fourth
High-Stakes Testing and Mathematics Performance of Fourth Graders in North Cyprus Author(s): Osman Cankoy and Mehmet Ali Tut Source: The Journal of Educational Research, Vol. 98, No. 4 (Mar. - Apr., 2005), pp. 234-243 Published by: Taylor & Francis, Ltd. Stable URL: http://www.jstor.org/stable/27548083 . Accessed: 05/11/2014 10:56 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org. . Taylor & Francis, Ltd. is collaborating with JSTOR to digitize, preserve and extend access to The Journal of Educational Research. http://www.jstor.org This content downloaded from 128.192.114.19 on Wed, 5 Nov 2014 10:56:53 AM All use subject to JSTOR Terms and Conditions High-Stakes and Testing Performance of Fourth North Teacher MEHMET ALI TUT North Academy, Training Eastern Cyprus for Grade 1,006 skills taking matics that the who better, students performed than did items, schools. Cyprus time on test spent more in routine mathe especially students on time less who that approach as (a) working test spent was no difference in test results skills. There observed taking did indi from nonroutine However, story problems. analysis too much cate that spending time on test-taking skills led to memorizing and procedures on cuing surface attributes of of ways a problem. skills between cation generations tant education of any that has education matics critical or as well negative or system and teaching Researchers testing. have on give reliable producing influences unexpected can have on A crucial dents and sort of in the scored test because answers items, the sometimes they testing in education is standard in the are for making used tests high-stakes complete in comparison with that usually are taken standardized classroom weekly ment only a limited routine of the items, or steps. had However, did students procedure. to imple for nonrou student not to apply have If there were an algorithm the follow, that could be problem the mathe mathematics algorithm problems, number nonroutine and a study, a standard mathematics could used students were student had mathematics for in the used In this stan study the the to examine classroom we items, routine sought that the the math the mathematics contexts that problems and textbooks. practices answers to the following of performance routine answering the often questions. and the students nonroutine in and solving mathematics prob lems differ? stu all 2. conditions are high Is there of stakes a gender students mathematics decisions when effect they on the mathematics solve routine and performance nonroutine problems? 1999). Students tests quizzes routine 1. Does major about the students (Marcus, 1994; Popham, usually a year, same tests of mathematics influ is high-stakes sense that the under questions same manner. The For items It is the unexpected learners. of questions nonroutine solving routine this with to the nonroutine overlook test that consequences test ematics problem and apply the algorithm flexibly. Contrary affect the quality of testing. which testing, answer the same are test In or answered student concentrate and importance and and teachers ences that ultimately dardized prime testing characteristics indicators good to solve graders tine mathematics system as a whole (Amrein 6k Berliner, educa 2002; Paris, 1995; Popham, 1999, 2001). Although routinely as any formal algorithms. and the education tors surface emphasize publications problems. solved most the of and multiple-choice problem represented a textbook-like is not poorly objective prepared, as detrimental on students effects when testing, One systems. is assessment, process learning found and the elements important is attracting words problems ity of fourth edu scrutiny has led to the rethinking of impor of aspects of future the test, performance (e.g., National Council of Teachers of Mathe matics [NCTM], 1989, 1991, 2000), so we analyzed the abil and the growth communities, such practices or prior a current from questions answering cited mathematics of competition attention. That for Many With the progress of globalization classroom rule memorizing. non and test for cue seeking problem) and routine tests, on instructional test-driven several (b) giving students test questions for drill, and (c) teaching students standard algorithms and procedures (especially a standardized Key words: high-stakes routine mathematics items, test-taking of consisted the mathe that considered graders. We an important was to linkage skills standardized high-stakes, standard high-stakes, approach affected fourth spent Cyprus a how instructional of performance on test-taking time in 28 North students 4 revealed to determine attempted matics North University, ized test-driven of Analysis Mediterranean We to determine The authors the attempted a high-stakes standardized instruc testing-driven on mathematical tional The authors approach performance. a multiple-choice test mathematics developed performance ABSTRACT effects in Graders Cyprus OSMAN CANKOY Atat?rk Mathematics once or or twice Address end-of-unit No:24, or monthly. to Osman Cankoy, correspondence via Mersin North Cyprus, Lefkosa, cankoy@kktc.net) 234 This content downloaded from 128.192.114.19 on Wed, 5 Nov 2014 10:56:53 AM All use subject to JSTOR Terms and Conditions 12. Gelibolu 10, Turkey. Sok. (E-mail: [Vol. 98(No. 4)] 235 2005 March/April 3. Is there any difference among the groups of students (those who spent more and those who spent less time on number Is there 4. any of mean operation (those spent test-taking skills) story problems? surface orization, which and on time less on routine and rote mem reveal could or understanding, inmathematical words on time routine spent scores of mean of distractors, choice on the groups of students who those and nonroutine Is the students less spent scores among more in terms 6. and problems? 5. Is there any difference who of groups who those and in terms skills) test-taking routine of the broader domain understanding is A familiar with a state English test teacher who (Kober). could prepare students by drilling them in a few dozen items, group dependent? the systems standardized such are testing systems, are that assessment commonplace in which those especially high-stakes In the world. around are tests the most fundamental oriented education students' motivation and abilities. Overstressing subvert students' on strategies gies taught every day in the classroom (Paris, 1995). Educators primarily use high-stakes standardized testing to sort of large numbers sible; however, answer or test multiple-choice construction, objective usually questions. Also important speaking, and creating, which are schools, dardized or testing also might teaching activities. test status on a heavy reliance on an examination-oriented to the test, Teaching system, with raises scores important produce that stan high-stakes to the the deal" their time (Quality Counts, duces in the States, 79% of teachers said that they spent "a great of unproductive When misleading. tion on a test, the students instructing 2001). Teaching and teachers resulting uncritical in test-taking but students directly scores likely to a give an also curriculum is part and of an process ongoing least some change could teachers of aspects change the most in the skills, concepts test. For and manner curriculum contained and to the teaching the important knowledge, for a partic standards ular subject; (b) addressing standards for basic and higher order skills; (c) using test data to diagnose areas in which students are weak and focusing in those areas; and (d) giv diverse students ing teachers, and Camilli school included to teachers place problem discussion (2001) assessments Firestone, and of test-item formats, greater than emphasis use of hands-on did solving, explanation Jersey Monfil, found that the state's ele in mathematics a mix and connect and apply In a study of New researchers University Rutgers mentary writing, to opportunities they learn (Kober, 2002). of their science, encouraged other materials, states and on stu thinking. Standardized High-Stakes in North Cyprus Testing-Oriented Education skills to the test not only pro teach suc lifelong In one conducted surveys representative nationally United and performance of teaching by (a) teaching dent knowledge and skills in the subject being tested (Kober, 2002). of on effects students' increasing at which education negative decision making, students monitor Mayrowetz, 1989). (Bowers, traditional inconsistent their personal progress (Corono, standardized 1992). Although replacement of high-stakes tests by other more positive tests may be difficult, educators what acting, In general, any form of teaching without of type can and should be taught in to second-class relegated in short results as writing, such skills as pos a manner in as efficient students this narrow classroom example, strate learning student about of and Positive features of high-stakes standardized testing might be attained through replacement by performance testing or portfolios of work samples in which assessment is linked to can might examination with inconsistent usually student also results because inaccurate Good Testing Practices in which the overemphasizing measures of by and examination learning are strategies taking learning are cess (Dais, 1993). the distort might as a result suffer may that tests, and students with pre examination examinations as outcomes scores of importance system an with problem is that and standardized high-stakes exist. Minorities, to be used for prediction, inferences and pared badly, educators might have difficulty producing crit ical and reflective students. Those students who spend too standardized tests might have much time on high-stakes what they are learning or applying problems integrating their knowledge and skills to real situations. The of validity practices yet continue toward oriented tests, a valid be should performances in particular, disabilities Standardized Testing earlier of their knowledge and skills (Thurlow, ck Lehr, 2001). Even if one can Thompson, other problems might Problems With High-Stakes Education test students' Quenemoen, Literature on appeared reliable measure guarantee Related often out of the hundreds of vocabulary words that students are expected to learn (Popham, 2001). standardized testing is not only a barrier for High-stakes good teaching practices and the resulting higher order stu dent skills but also might be a waste of money. Haney, Madaus, and Lyons (1993) estimated that "taxpayers in the USA are devoting asmuch as $20 billion annually in direct payments to testing companies" (p. 95). In such a testing system, cue for only searches have that words vocabulary the among more spent nonroutine on scores problems? difference who (those of mean in terms skills) test-taking nonroutine ture of students' can specific inflated be ques pic In North Cyprus, education at the elementary and sec con ondary school level ishighly centralized and under the enter elemen trol of the Ministry of Education. Students tary school at age 6 and leave at age 11 (from first grade to This content downloaded from 128.192.114.19 on Wed, 5 Nov 2014 10:56:53 AM All use subject to JSTOR Terms and Conditions 236 The fifth grade). Most of the people inNorth Cyprus value edu cation and educated people. As a result of that attitude, to overemphasize testing entrance examinations, tend people sonant with and can be working called high-stakes standardized tests. In addition, North and obtaining good Cypriots value knowing mathematics giving in mathematics. grades At instructional general the classrooms ries; in the the and approaches with aligned one can 3 years, first activities mathematics school elementary are also observe theo learning constructivist learn ing perspectives. At the end of each school year, most fifth graders (nearly one third of all elementary school graduates) in North Cyprus take the Entrance Examination for the Middle Schools (EEMS), for which the general medium of instruction isEnglish. The examination is considered by the majority of families inNorth Cyprus as the most important key in the future academic life of students. The EEMS is prepared and administered once a year by the Ministry of in the students actual test questions students standard and algorithms teachers preservice are Cyprus families many on test. Each year, spend a large amount of their family budget to prepare lessons private to the to teaching mostly geared of North schools children their for test. the EEMS is considered the most important high Although stakes standardized test in North Cyprus, the Ministry of Education has not gathered empirical evidence about its reliability and validity for the last 25 years. That lack of evidence could be problematic and misleading for the over all education system inNorth Cyprus. data collected In the academic year 2000-2001, interviews vice and questionnaires activities training with during and fourth- through group ities without and prior rest the spent test instruction. items The ele fifth-grade at spent least of 70% class their time working on actual test questions from a current test, (b) 85% of the teachers gave their students actual test questions for drill, and (c) 75% of the teachers taught their students test-taking skills and had them practice with tests from prior years. The data also that showed of nearly 70% matics because semester time teachers spent fifth-grade on and mathe language the EEMS consisted and for mathematics and fourth- their language of two batteries of tests skills. Participants = randomly selected 28 schools out of 83 schools (n 1,006) in North Cyprus. Then, from each selected school, we chose a number (ranging from 1 to 4) of fourth-grade We We to determine for observation that was spent on 28 volunteer trained perform one which of and worksheets students most of from in which schools nearly skills spent on test-taking = 207)- That group (MEG; n on its time activ current used generally as the main source of activities noninstructional The practices. students who in which schools attended nearly 30% of the class time was spent on test-taking skills formed the low-emphasis group (LEG; n = 448). That group the spent on its time of remainder guided by the of Education. instructional ities, along with textbook recommended by the Ministry That group with spent the MEG. the groups, more on time cally multiple-choice-type and the activities students "teaching compared to be applied for all standard in answering was test "rule category those to the observations According category activ noninstructional activities questions" was memorizing" algo specifi used used most, The least. of Education was book recommended by the Ministry instructional based primarily on traditional approaches and had few indirect relations with conceptual under and standing nonroutine problem solving. Instrument We developed a 36-item, multiple-choice Mathematical Performance Test (MPT) and adapted it for this study to the mathematical of performance fourth graders. test included three subtests (Number Skills, Operations With Numbers, and Story Problems) and six dimensions: (a) Routine Number Items (RNI), (b) Nonroutine Number Items (ROI), (d) Items (NRNI), (c) Routine Operation Items Routine Story Nonroutine (e) (NROI), Operation Problems (RSP), and (f) Nonroutine Story Problems in The Method time answering and (d) memorizing and instructional activities guided by the textbook recom mended by the Ministry of Education for regular classroom measure classes (b) (c) teaching for noninstructional Teachers time was spent on its time of a textbook. rithms and procedures inser the school teachers showed that (a) 65% of the teach mentary ers structured (a) test, the listed categories had occurred in each 1-min period. Each class was observed for at least 6 class hr (each class hour was nearly 40 min). The students from schools in which nearly 70% of class time was spent on test-taking skills formed the high-emphasis group (HEG; n = 351). That group in elementary for drill, to observe tried Education. approaches sheet were rules. which instructional test-taking or prior procedures questions, especially multiple-choice teachers' The a current from questions Research coded classrooms. listed on the observation test 50% of the class formed the moderate-emphasis Because of this high-stakes standardized testing, usually begins at the fourth through the fifth grade, teacher preservice activities on The thought-provoking with congruent in used techniques behaviorist the level, each sheet, instructional skills categories con preparation which observation of Educational Journal the observations. the skills test-taking a structured teachers time Nonroutine five, Contexts seven, eight, sion (see Table in mathematics. elementary preservice Using percentage of class to unit rienced elementary the Ministry and 1).We items, were There (NRSP). six seven, in each respectively, three, dimen previously gave the items to 13 expe school of Education teachers and 5 inspectors and asked them to categorize This content downloaded from 128.192.114.19 on Wed, 5 Nov 2014 10:56:53 AM All use subject to JSTOR Terms and Conditions from the 2005 March/April [Vol. 98(No. 237 4)] 1. Selected Mathematics TABLE Problems Test question Routine Category Number skills 1 :How Item Nonroutine is "three and fifteen" thousand hundred Item 9:Which one of the following represented numbers (A) 315 (C) 30,015 Item with Operations numbers (B) 3,015 (D) 300,015 is the result 15:What (A) 65,000 (C) 55,500 be written (A)4 (Q2 of x 26" "2,500 cannot (B)3 (D)0 Item 14:Which used in order "a" with (B) 62,000 (D) 6,500 can be operation to find the value of the shortest way in "524 + a = 816?" (A)Addition (C)Multiplication Story can produce Item 24: A factory How 2,007 many toys daily. toys can be produced in 2 weeks? problems (A) 4,004 (C) 10,035 as items routine and nonroutine by the considering teach .71, and .61, respectively. Alpha including the and evaluation 36 items, was and expert for the test, reliability .86. We asked two mathematics a measurement in Turkey to judge the Middle East Technical University content validity of the test.We provided each of the experts with information that included the definition of what we intended to be measured, the instrument, and the surface memorization, understanding, or 36: Ali, Mehmet, and Oya, a race in 10, 8, 9, completed is and 7 seconds Who respectively. the first? a (A) Ali (C) Oya search for only cue words. All the experts concluded that the content and format of the items were consistent with the definition of the variable and the study participants. became School us for we conducted this study, we sought permission to the conduct in the 28 schools preservice present provide for a week observation the results, trained 28 so the live classroom observations. We into three groups (HEG, MEG, participants to the administer simultaneously. We teachers elementary and study conduct could they divided LEG) observers the and, adminis tered the MPT. of Data Analysis We methods. based We observe terms Before to agreed to the Ministry of Education, the Elementary Education Division offered permission and assis the MPT to of the from the Elementary School Education Division Ministry of Education to develop and administer the EEMS academic within a high-stakes context for the 2000-2001 noted year. The Elementary School Education Division would be for the data that could that permission given only we after However, impossible. feedback multivariate Procedures (B)Mehmet (D) Cem be collected from the EEMS. Unfortunately, because of the lack of evidence for the reliability and validity of the the research with that requirement EEMS, conducting according descrip tion of the intended sample. We also asked the experts to judge whether the distractors of the items could enlighten rote Item tance from educators (B) Subtraction (D) Division Cem (B) 4,014 (D) 28,098 ing and learning practices in North Cyprus elementary schools. The fourth graders had 1 hr to answer the items on the MPT. Alpha reliability of the subtests?Number Skills, .75, Operations With Numbers, and Story Problems?were in the is a three-digit place of "a" if "13a" even number? by numerals? number detailed given variable the choice problems with operations chi-square to determine and of (HEG, MEG, items and answering used analysis to the among in mathematics skills, We problems. differences the performance nonroutine and to of on quantitative the data analysis mainly used analysis of variance and (ANOVA) analysis of variance (MANOVA) procedures LEG). We routine whether based related and numbers, dependent in groups solving that were procedures were This content downloaded from 128.192.114.19 on Wed, 5 Nov 2014 10:56:53 AM All use subject to JSTOR Terms and Conditions three to story conduct the responses on the the group interpreta a 238 The tions of the effect Cohen the (1988); study was sizes observed on the level of significance MEG, .05. in routine was Results were one The intercorrelated. related to nonroutine story was exception problems. the dimension of the the differ ence, we used MANOVA procedures to analyze the first five dimensions (RNI, NRNI, ROI, NROI, RSP). When we analyzed the performances of the three groups (HEG, However, scores TABLE 2. Between Intercorrelations Subdimensions of ear Subdimension .41* ? 3. ROI 6. NRSP .60* .62* -.02 .48* .46* -.05 .64* ? .61* -.02 .66* -.02 routine = Mathematical Performance Test; RNI = routine num ber items; NRNI = nonroutine number items; ROI = routine oper ation items; NROI = nonroutine items; RSP = routine operation = nonroutine in nonroutine story problems; NRSP story problems contexts. Note. MPT that *p < .05. for Routine of mean had gender and Nonroutine Number Group Number no Low scores significant Problems problem Nonroutine MSD SD n Girls 4.56 4.27 Boys 1.61 1.66 1.39 1.23 172 0.85 179 0.90 4.41 Total 1.63 1.30 0.88 351 Girls 4.32 1.77 1.21 108 0.83 4.45 Boys 1.59 1.36 0.83 99 Total 4.38 1.68 1.29 207 0.83 Girls 3.69 1.73 1.06 233 0.79 3.56 Boys 1.85 emphasis 215 0.89 0.94 Total 3.63 1.79 1.00 448 0.84 4.11 Girls 1.74 1.20 513 0.83 4.00 Boys 1.77 1.13 493 0.89 4.06 1.76 1.170.86 Total and boys effect on mean problems. that resulted from RNI, small effect sizes. emphasis Girls main mathematics (see Table 5). High emphasis Moderate items, Skills Routine M than higher mathematics significant and RSP with RNI and NRNI Statistics significantly Table 5 shows that group had significant effects on mean scores of RNI and NRNI with small effect sizes. However, group was more effective on RNI compared with NRNI (see effect sizes in Table 5). Bonferroni post-hoc tests (p < .05) revealed no significant difference between the HEG and the MEG in terms of mean scores on RNI and NRNI. However, both groups had significantly higher mean scores than did the LEG on RNI and NRNI. Results also revealed -.03 3. Descriptive was nonroutine nonroutine and combination Number ? TABLE no had gender for NRNI, NROI, .45* NROI ? 5 .68* ? 4. 5. RSP 4 6 3 ? l.RNI 2. NRNI 12 on score MANOVA results in Table 4 revealed that group and gen der and their interaction had significant effects on the lin for Fourth Graders (N = 1,006) theMPT items mathematics mean = t(l,005) 43.285, p < .05. That finding also was confirmed = a 1.36). Later analysis showed sig by large effect size (d nificant but small group-dependent main effects (consider scores of routine, F(2, ing small effect sizes) on mean = = < .01, r\2 .07, and on nonroutine 35.239, p 1,006) = = mathematics 24.644, p < .05, i\2 problems, F(2, 1006) the HEG and MEG with the LEG. 0.05, favoring compared of the MPT Because Research LEG) in solving nonroutine story problems, we used AN OVA one-way procedures (see Table 3). A paired samples t test showed that mean score on items the suggestions of used throughout Table 2 shows that five of the six dimensions of Educational Journal 1,006 This content downloaded from 128.192.114.19 on Wed, 5 Nov 2014 10:56:53 AM All use subject to JSTOR Terms and Conditions effect on mean scores on March/April 2005 [Vol. 98(No. 4)] 239 difference between the HEG and the MEG in terms of mean scores on ROI and NROI. However, both of those groups had significantly higher mean scores than did the LEG on ROI and NROI. Results also revealed that gender had no sig nificant effect on mean scores on ROI (see Table 6). How Operations With Numbers Results showed that group had significant effects on mean scores of ROI and NROI with small effect sizes (see Table 5). Bonferroni post-hoc tests (p < .05) revealed no significant TABLE 4. Wilks's Lambda Test of Significance for Effects of Level of Preparing for High-Stakes Standardized on Fourth Test Mathematics Effect Graded Error df Hypothetical df Group A Group B AxB .93 7.53 10 .98 3.43 5 .02 1.99 10 Note. Fourth graders' performances and solving routine story problems. *p < .05. 5. Tests TABLE Nonroutine of Between-Subjects and Operation Number Variable involved SS 1,992.00 996.00 1,994.00 routine and nonroutine answering Effects Items for Performance and Solving 147.11 .017 .031* .010 and operation in Answering Routine Story Problems 25.02 10.88 problems and .000* .048 15.10 .000* .029 141.94 70.97 31.15 .000* .059 143.04 151.06 71.52 20.21 .000* .039 75.53 21.47 .000* .041 B Group 2.13 2.13 0.72 .395 .001 0.46 0.46 0.63 .427 .001 5.82 5.82 2.56 .110 .003 27.92 27.92 7.89 .005* .008 0.49 0.49 0.14 .709 .000 0.88 .416 .002 2.45 .086 .005 Group RNI NRNI ROI NROI RSP .036 .004* A 73.56 21.77 RNI NRNI ROI NROI RSP Routine number .000* MS df Group RNI NRNI ROI NROI RSP Performances A X Group 5.17 2.58 3.54 1.77 B 4.62 2.31 1.01 .363 .002 10.41 5.20 1.47 .230 .003 30.82 15.41 .013* .009 4.38 Error RNI NRNI ROI NROI RSP Note. 2.94 0.72 1,000 1,000 2.28 3,538.54 3,517.86 1,000 3.52 2,278.38 operation 3.54 items; NRNI = nonroutine number items; RSP = routine story problems. RNI = routine number nonroutine *p < 1,000 1,000 2,940.13 721.00 = routine items; ROI operation items; NROI .05. This content downloaded from 128.192.114.19 on Wed, 5 Nov 2014 10:56:53 AM All use subject to JSTOR Terms and Conditions = 240 The ever, results showed that girls had a significantly higher mean score than did boys on NROI, t( 1,004) = 3.33, p < .05, with an effect size near to amedium degree (d = 0.3). TABLE scores of RSP groups were 6. Descriptive Statistics sig for Routine and Nonroutine Operation Problems Operation problem Routine Group Nonroutine M SD n M SD Girls 3.50 1.46 4.37 172 1.79 3.35 Boys 1.48 3.97 179 1.81 Total 3.42 1.47 4.16 351 1.81 108 1.96 High emphasis Moderate Low emphasis 3.45 Girls 1.46 4.17 3.46 Boys 1.59 4.11 99 1.98 Total 3.46 1.52 4.14 207 1.97 Girls 2.85 2.51 Boys 1.59 1.47 3.70 3.10 233 1.87 215 1.94 Total 2.68 1.54 3.41 448 1.92 Girls 3.19 3.00 Boys 1.55 1.56 4.02 3.62 513 1.88 493 1.95 3.10 1.56 3.83 emphasis Total Girls TABLE and boys 7. Descriptive Statistics for Routine and Nonroutine Story Story Group 1,0061.93 Problems problem Routine M Nonroutine M SD n SD High emphasis Moderate Low Girls 3.99 3.56 Boys 1.90 2.02 0.65 0.75 172 0.70 0.71 179 Total 3.77 1.97 0.70 0.71 351 3.30 Girls 3.84 Boys 1.86 1.91 0.85 0.79 0.71 108 0.80 99 Total 3.56 1.90 0.82 207 0.75 Girls 2.92 1.81 0.70 233 0.69 2.94 Boys 1.79 0.74 0.70 215 Total 2.93 1.80 0.72 448 0.70 3.36 Girls 1.90 0.71 513 0.70 3.35 Boys 1.93 0.75 0.72 493 1.92 0.73 0.71 emphasis emphasis Total Girls and boys 3.35 of Educational Research nificantly different with a small effect size. Bonferroni post hoc tests (p < .05) revealed that no significant difference existed between the HEG and the MEG in terms of mean scores on RSP. Both groups had significantly higher mean scores on RSP than did the LEG (see Table 7). Results also revealed that gender and group had significant interaction effects on mean scores on RSP with a small effect size. In Story Problems Table 5 shows that mean Journal 1,006 This content downloaded from 128.192.114.19 on Wed, 5 Nov 2014 10:56:53 AM All use subject to JSTOR Terms and Conditions 2005 March/April 4)] 241 [Vol. 98(No. the LEG, there was no significant difference between girls and boys in terms of mean scores on RSP. In the HEG, girls had significantly higher mean scores than did boys on RSP, = 2.02, p < .05, but the difference was insignificant t(349) because of the small effect size (d = .01). In the MEG, how ever, boys had a significantly higher mean score than did = -2.067, p < .05. The effect size for girls on RSP, t(205) = as small as for the HEG. Results was not the MEG (d .2) also showed that group had no significant effect on mean scores on NRSP (see Table In other 8). mean words, scores of the items, surface rization, were wrords, distractors or group dependent. for search Chi-square cue only statistics for each 1) showed that in 25 items (16 of 25 were out contexts) in the reported of 36 the items, choice was section Instruments of instructional nonroutine small relatively no differences mances were That in depen students from the typical distractor, A. their results routine show is reasonable finding routine fourth the education surface than in characteristics, students will be dardized that students performances, were instructed their story spent test was answering problems. an the preparing espe problems, reason important with The favoring a test-driven routine mostly who students TABLE 8. Tests problems ducted approach number, were mathematics this test who resulted from instructed of of Between-Subjects Group (A) Group (B) AxB is an 505.524 no untested that on work that the proce we problems. in non differences are concepts less the high-stakes analyzed likely 2001). Before we con (Quality Counts, nonroutine show that story nonroutine standard The problems. problem solving activities of mathematics dimension standards. solution Unlike was a solution, not that standards, previous solving as engagement method to find Many & Levi, although that must version in a task for which known students notice researchers in advance and on knowl draw their in Fennema, (e.g., 1998; Gallagher gender was not as effective did items. When tests boys. the was for Nonroutine Story Problems F MS on students' than did placed, the performed moderate girls. In North rfp 2.070 .127 .004 0.387 .534 .000 0.456 0.903 .406 .002 This content downloaded from 128.192.114.19 on Wed, 5 Nov 2014 10:56:53 AM All use subject to JSTOR Terms and Conditions girls in 0.196 0.506 routine solving a large emphasis on contrary, p Franke, In this study, girls responded better operation On Jacobs, 1994). as group in the group in which standardized than Carpenter, & De Lisi, performance, to nonroutine in mathemat differences gender emphasis group, boys outperformed 1.046 1,000 to prob instruction. For should be emphasized during mathematics in the NCTM released 2000, example, Principles and Stan a revised version of the dards for School Mathematics, better a 2 observed study important story problems, Effects 2.0932 0.1961 0.913 of types guarantee nonroutine includes that we emphasized this high-stakes and with teaching to use when which research, questions results boys students operation, of perfor ized tests used at the elementary school level in North we for the last 10 found that only 5% of the years; Cyprus dfSS Source Error the classroom, was to be emphasized mathematics stan high-stakes in their factor differences a for not it does to determine able further ics performances cially those of problems, are generally emphasized (David the amount of son, Deuser, 6k Sternberg, 1994). Although time terms in groups edge, and through this process, they will often develop new mathematical (NCTM, 2000, p. 51). understandings" overall Cyprus three learning algorithms might in the occasions order items. the in North system and procedures considers How a test-driven that standard lem solve. Although some "the perfor better mathematics one when graders' were items in nonroutine performances examination-oriented which that mathematics indicates Therefore, story problems. to solve different procedures not to teach is obviously the way them students 1989 overall the among defines problem The performed items. operation in nonroutine routine Discussion mances and sizes effect also approach number instructional approach has little to do with improving story problems higher order skills like solving nonroutine (e.g., Kenney ck Silver, 1997; Lindquist, 1989). There were A HEG (67%) andMEG (64%),X2(8,N = 1,006) = 16.183, p < .01, selected ever, ing word the dent on the group (see Table 2). For example, in Item 14, the typical distractor "A" was selected by fewer students from LEG (44.2%) compared with students from HEG = = 32.114, 1,006) (62.8%) and MEG (57.3%), x2(8, N p < .01. There was a similar finding in Item 36 (see Table 1) in which the typical distractor was A. For Item 36, fewer students from the LEG (60%) compared with on dures and how to use them correctly. Kantowski (1981) stated that problem solving means different things to dif ferent people; the majority of people believe that it is solv rote memo enhance could understanding, item (see Table in nonroutine which test-driven better problems of the three groups were not significantly different on NRSP. We also wanted to determine whether the choice of the distractors more Cyprus, 242 The Journal of Educational Research families have stricter rules for girls than for boys because of the country's social and ethical perspectives. That rigidity might cause girls to follow certain rules and prescriptions more precisely than boys do. Therefore, it is likely that in a more examination-oriented environment, more benefit girls boys We the elementary found that skills did not We school level. more spending considered and conceptual of nonroutine story in a nonconstructivist dents solving. stu educating not guarantee may perspective test-driven standardized, high-stakes lead to higher educational too much spending instruction The tend may that groups to one with not should the conclude in which group that emphasized that no observations showed that was there use limited nontraditional Therefore, practices. teaching that foster selection of instructional approaches is mathematics skills essential. higher order likely careful Moreover, groups that ly performed better on a test-driven used small effect sizes throughout groups on were test-taking skills. Small instructional test-driven to better much from that those who experts all understanding. of not contribute interesting findings of this study was the dependency of items that could or understanding, rote enlighten search aspects about offer change the overall and education. recommendations following to eliminate 3. Tests and its con and knowledge assessed. instruction classroom skills, problem-solving should especially should be and emphasize are that those nonroutine. 4. Students should be trained and encouraged to become skilled problem solvers with the ability to conduct qual itative analyses tative solutions. 5. Preservice and to view tunity structivist assessments Sur and MEG were distracted more than prisingly, were low-emphasis groups. That finding implies that more time spent focusing on procedural skills such as drills, test standardized high-stakes of mathematics be might surface in group. words decisions we of problems inservice before and teach teachers they should mathematics quanti perform have the oppor in a more con way. should use open-ended age the use and integration of the distractors memorization, cue for only that implied a doubt views the sources of assessment 2. Multiple information used inmaking high-stakes decisions. 6. Teachers One attempt is no there Also, to time less the most did approaches make summarize, but that these spent sizes also effect large problems, the study showed so different not approach mathematics many should to maintain of mathematics knowledge perspective. at the especially teachers prepare to need beliefs (in line with changing positive approaches in the education field) of society as a whole, beginning with the foster structivist, and routine and Researchers changed. and pedagogical educators nections con of that tests, solely because we found no differences among in higher order mathematics skills. Classroom emphasized the groups rules, understanding computation based activities level 1. If it is not possible problem was least instruction test-driven per algorithms, was there require in light of our findings and current practice. scores. inflated be conceptually school as school a constructivist with To test-driven instruction test-driven formed better on the problems but some only produce used more or skills test-taking not does problem strong content to the test or quality. Teaching on time education in elementary must solving elementary skills that are improvement in higher order mathematics for scientific important thinking and human life. Kenney and Silver (1997) and Lindquist (1989) have shown that of basic discipline do not that algorithms a only and reasoning (B. J. Reys 6k R. E. Reys, 1998). Therefore, Reys and Nohda suggested that the narrow view of mathe emphasize aspects reasoning that found as mathematics and formulas, matics test-taking story problem qualitative and problems on time class influence nonroutine view might than boys. For example, Cai (1995) and Hyde, Fennema, and Lamon (1990) stated that girls performed better than at 2001; Sowder, 1992). If the instructional on is emphasis mainly procedural skills (rote memoriza tion) and not conceptually based, then the students not but also only suffer from (meta)cognitive disadvantages Tchoshanov, and classroom problems of conceptual that encour knowledge in practices. REFERENCES the HEG or taking, with reasoning, can memorize tests with practice connection students to and this Unfortunately, that elementary introducing, ciency without 2000, with written total algorithms E. Reys little qualitative single as current up and and 6k Nohda, a to path 1989; practice, in instruction to 80%) establishing solving understanding to them 1997; Lindquist, range practicing, conceptual 1991; R. a of mathematics (estimates developing, for search with encourage as well study, the majority schools and and single answer (Kenney & Silver, Mestre, 1991; Sacks, 2000). indicates years, prior understanding conceptual distract procedures from is spent profi computations (e.g., America 1992; Pape 6k (1991). An educational strategy tomove theAmerican educa DC: tional system ahead tomeet the needs of the 21st century. Washington, U.S. Department of Education. Amrein, A. L., & Berliner, D. C. (2002). High-stakes testing, uncertain 1-70. i0(18), ty, and student learning. Education Policy Archives, to standardized educational assessment. Bowers, B. C. (1989). Alternatives on Educational Management. (ERIC Eugene, OR: ERIC Clearinghouse America 2000. Service No. ED312773) Document Reproduction students' mathe Cai, J. (1995). A cognitive analysis of U.S. and Chinese on tasks involving matical computation, performance simple problem solving, and complex problem solving. Journal for Research inMathe matics Education Council (Monograph Series 7). Reston, VA: National of Teachers of Mathematics. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum. for learn L. (1992). Encouraging students to take responsibility Corono, ing and performance. Elementary School Journal, 93, 69-83. Dais, T. A. (1993). An analysis of transition assessment practices: Do they rec DC: Vol. 2, pp. 1-19. (ERIC ognize cultural differences? Washington, Document Service No. ED372519) Reproduction This content downloaded from 128.192.114.19 on Wed, 5 Nov 2014 10:56:53 AM All use subject to JSTOR Terms and Conditions March/April 2005 [Vol. 98(No. 4)] 243 E., Deuser, R., & Sternberg, R. J. (1994). The role of metacog in problem solving. In ]. Metcalfe & A. R Shimamura (Eds.), Metacognition Cambridge, MA: MIT Press. (pp. 207-226). Fennema, F., Carpenter, T. R, Jacobs, V. R., Franke, M. L., & Levi, L. W. in young children's (1998). A longitudinal study of gender differences mathematical thinking. Educational Researcher, 27, 6-11. D., & Camilli, G. (2001, April). Firestone, W. A., Monfil, L., Mayrowetz, The ambiguity of teaching to the test: A multiple method analysis. Paper pre Educational of the American Research sented at the annual meeting Davidson, nition Seattle, WA. Association, in scholastic A. M., & De Lisi, R. (1994). Gender differences stu test?mathematics solving among high-ability problem aptitude dents. Journal of Educational Psychology, 86, 204-211. Haney, W., Madaus, G., & Lyons, R. (1993). The fractured marketplace for standardized testing. Boston: Kluwer. in differences Hyde, ]. S., Fennema, E., & Lamon, S. J. (1990). Gender Gallagher, A meta-analysis. mathematics Bulletin, Psychological performance: 107(2), 139-155. (Ed.), Mathe Kantowski, M. G. (1981). Problem solving. In E. Fennema Council matics education research (pp. 111-126). Reston, VA: National of Teachers of Mathematics. Kenney, P. A., & Silver, E. A. (1997). Results from the seventh mathematics assessment of theNAEP. Reston, VA: National Council of Teachers of Mathematics. Kober, N. (2002). Teaching to the test: The good, the bad, and who's responsi ble. Test talk for leaders, no. 1.Center on Education Policy. Retrieved Jan uary 20, 2003, from http://www.ctredpol.org/pubs/testtalkjune2002.html Lindquist, M. M. (1989). Results from the fourth mathematics assessment of of Teachers of Mathematics. theNAEP. Reston, VA: National Council E. (1994). Rite of passage: French teachers demand Marcus, reasoning. Educational Vision, 2(2), 18-19. in pre-college Mestre, J. P. (1991). Learning and instruction physical sei ence. Physics Today, 44(9), 56-62. and Council of Teachers of Mathematics. (1989). Curriculum evaluation standards for school mathematics. Reston, VA: Author. of Teachers National Council of Mathematics. (1991). Professional stan dards for teaching mathematics. Reston, VA: Author. National Council of Teachers of Mathematics. (2000). Principles and stan dards for school mathematics. Reston, VA: Author. in M. A. (2001). The role of representation(s) Pape, S. T., & Tchoshanov, mathematical Theory Into Practice, 40(2), understanding. developing National 118-127. assessment is better than high learner-centered Paris, S. G. (1995). Why stakes testing. In N. M. Lambert & B. L. McComb (Eds.), How students learn: Reforming schools through learner-centered education (pp. 189-210). Association. DC: American Washington, Psychological tests don't measure educational standardized ]. (1999). Why Popham, W quality. Educational Leadership, 56(6), 8-15. VA: Associa ]. (2001). The truth about testing. Alexandria, Popham, W tion for Supervision and Curriculum Development. Counts. (2001). A better balance. Education Week, 20(17), 36. in the elementary curricu Reys, B. J.,& Reys, R. E. (1998). Computation lum: Shifting the emphasis. Teaching Children Mathematics, 5, 236-241. alternatives for the twenty Reys, R. E., & Nohda, N. (1992). Computational of Teachers of Mathematics. first century. Reston, VA: National Council Sacks, P. (2000). Standardized minds: The high price of America's testing cul ture and what we can do to enhance it.Cambridge, MA: Perseus Books. sense. In D. C. Grouws and number Sowder, J. T. (1992). Estimation (Eds.), Handbook of research on mathematics teaching and learning (pp. New York: Macmillan. 371-389). S., & Lehr, C. (2001). Princi R., Thompson, Thurlow, M., Quenemoen, systems ples and characteristics of inclusive assessment and accountability on Edu MN: National Center (Synthesis Rep. No. 40). Minneapolis, Quality cational Outcomes. This content downloaded from 128.192.114.19 on Wed, 5 Nov 2014 10:56:53 AM All use subject to JSTOR Terms and Conditions