The fallacy of testing in education
For the last several years education reformers have been preaching the religion of testing as the lynchpin to improving education (meanwhile offering no meaningful evidence that education is failing in the first place). Last year, the PARCC test (Partnership for Assessment of Readiness for College and Careers) made its maiden voyage in Illinois. Now teachers and school districts are scrambling to implement phase II of the overhaul of the teacher evaluation system begun two years before by incorporating student testing results into the assessment of teachers’ effectiveness (see the Guidebook on Student Learning Objectives for Type III Assessments). Essentially, school districts have to develop tests, kindergarten through twelfth grade, that will provide data which will be used as a significant part of a teacher’s evaluation (possibly constituting up to 50 percent of the overall rating).
To the public at large — that is, to non-educators — this emphasis on results may seem reasonable. Teachers are paid to teach kids, so what’s wrong with seeing if taxpayers are getting their money’s worth by administering a series of tests at every grade level? Moreover, if these tests reveal that a teacher isn’t teaching effectively, then what’s wrong with using recently weakened tenure and seniority laws to remove “bad teachers” from the classroom?
Again, on the surface, it all sounds reasonable.
But here’s the rub: The data generated by PARCC — and every other assessment — is all but pointless. To begin with, the public at large makes certain tacit assumptions: (1) The tests are valid assessments of the skills and knowledge they claim to measure; (2) the testing circumstances are ideal; and (3) students always take the tests seriously and try to do their best.
But none of these assumptions are true most of the time — and I would go so far as to say that all of them being true for every student, for every test practically never happens. In other words, when an assessment is given either the assessment itself is invalid, and/or the testing circumstances are less than ideal, and/or nothing is at stake for students so they don’t try their best (in fact, it’s not unusual for students to deliberately sabotage their results).
For simplicity’s sake, let’s look at the PARCC test (primarily) in terms of these three assumptions; and let’s restrict our discussion to validity (mainly). There have been numerous critiques of the test itself that point out its many flaws (see, for example here; or here; or here). But let’s just assume PARCC is beautifully designed and actually measures the things it claims to measure. There are still major problems with its data’s validity. Chief among the problems is the fact that there are too many factors beyond a district’s and — especially — a classroom teacher’s control to render the data meaningful.
For the results of a test — any test — to be meaningful, the test’s administrator must be able to control the testing circumstances to eliminate (or at least greatly reduce) factors which could influence and hence skew the results. Think about when you need to have your blood or urine tested — to check things like blood sugar or cholesterol levels — and you’re required to fast for several hours beforehand to help insure accurate results. Even a cup of tea or a glass of orange juice could throw off the process.
That’s an example that most people can relate to. If you’ve had any experience with scientific testing, you know what lengths have to be gone to in hopes of garnering unsullied results, including establishing a control group — that is, a group that isn’t subjected to whatever is being studied, to see how it fares in comparison to the group receiving whatever is being studied. In drug trials, for instance, one group will receive the drug being tested, while the control group receives a placebo.
Educational tests rarely have control groups — a group of children from whom instruction or a type of instruction is withheld to see how they do compared to a group that’s received the instructional practices intended to improve their knowledge and skills. But the lack of a control group is only the beginning of testing’s problems. School is a wild and woolly place filled with human beings who have complicated lives, and countless needs and desires. Stuff happens every day, all the time, that affects learning. Class size affects learning, class make-up (who’s in the class) affects learning, the caprices of technology affect learning, the physical health of the student affects learning, the mental health of the student affects learning, the health of the teacher affects learning (and in upper grades, each child has several teachers), the health and circumstances of the student’s parents and siblings affect learning, weather affects learning (think “snow days” and natural disasters); sports affects learning (athletes can miss a lot of school, and try teaching when the school’s football or basketball team is advancing toward the state championship); ____________ affects learning (feel free to fill in the blank because this is only a very partial list).
And let me say what no one ever seems to want to say: Some kids are just plain brighter than other kids. We would never assume a child whose DNA renders them five-foot-two could be taught to play in the NBA; or one whose DNA makes them six-foot-five and 300 pounds could learn to jockey a horse to the Triple Crown. Those statements are, well, no-brainers. Yet society seems to believe that every child can be taught to write a beautifully crafted research paper, or solve calculus problems, or comprehend the principles of physics, or grasp the metaphors of Shakespeare. And if a child can’t, then it must be the lazy teacher’s fault.
What is more, let’s look at that previous sentence: the lazy teacher’s fault. Therein lies another problem with the reformers’ argument for reform. The idea is that if a student underachieves on an exam, it must be the fault of the one teacher who was teaching that subject matter most recently (i.e., that school year). But learning is a synergistic effect. Every teacher who has taught that child previously has contributed to their learning, as have their parents, presumably, and the other people in their lives, and the media, and on and on. But let’s just stay within the framework of school. What if a teacher receives a crop of students who’d been taught the previous year by a first-year teacher (or a student teacher, or a substitute teacher who was standing in for someone on maternity or extended-illness leave), versus a crop of students who were taught by a master teacher with an advanced degree in their subject area?
Surely — if we accept that teaching experience and education contribute to teacher effectiveness — we would expect the students taught by a master teacher to have a leg up on the students who happened to get a newer, less seasoned, less educated teacher. So, from the teacher’s perspective, students are entering their class more or less adept in the subject depending on the teacher(s) they’ve had before. When I taught in southern Illinois, I was in a high school that received students from thirteen separate, curricularly disconnected districts, some small and rural, some larger and more urban — so the freshman teachers, especially, had an extremely diverse group, in terms of past educational experiences, on their hands.
For several years I’ve been an adjunct lecturer at University of Illinois Springfield, teaching in the first-year writing program. UIS attracts students from all over the state, including from places like Chicago and Peoria, in addition to students from nearby rural schools, and everything in between (plus a significant number of international students, especially from India and China). In the first class session I have students write a little about themselves — just answer a few questions on an index card. Leafing through those cards I can quickly get a sense of the quality of their educational backgrounds. Some students are coming from schools with smaller classes and more rigorous writing instruction, some from schools with larger classes and perhaps no writing instruction. The differences are obvious. Yet the expectation is that I will guide them all to be competent college-level writers by the end of the semester.
The point here, of course, is that when one administers a test, the results can provide a snapshot of the student’s abilities — but it’s providing a snapshot of abilities that were cured by uncountable and largely uncontrollable factors. How, then, does it make sense (or, how, then, is it fair) to hang the results around an individual teacher’s neck — either Olympic-medal like or albatross like, depending?
As I mentioned earlier, validity is only one issue. Others include the circumstances of the test, and the student’s motivation to do well (or their motivation to do poorly, which is sometimes the case). I don’t want to turn this into the War and Peace of blog posts, but I think one can see how the setting of the exam (the time of day, the physical space, the comfort level of the room, the noise around the test-taker, the performance of the technology [if it’s a computer-based exam like the PARCC is supposed to be]) can impact the results. Then toss in the fact that most of the many exams kids are (now) subjected to have no bearing on their lives — and you have a recipe for data that has little to do with how effectively students have been taught.
So, are all assessments completely worthless? Of course not — but their results have to be examined within the complex context they were produced. I give my students assessments all the time (papers, projects, tests, quizzes), but I know how I’ve taught them, and how the assessment was intended to work, and what the circumstances were during the assessment, and to some degree what’s been going on in the lives of the test-takers. I can look at their results within this web of complexities, and draw some working hypotheses about what’s going on in their brains — then adjust my teaching accordingly, from day to day, or semester to semester, or year to year. Some adjustments seem to work fairly well for most students, some not — but everything is within a context. I know to take some results seriously, and I know to disregard some altogether.
Mass testing doesn’t take into account these contexts. Even tests like the ACT and SAT, which have been administered for decades, are only considered as a piece of the whole picture when colleges are evaluating a student’s possible acceptance. Other factors are weighed too, like GPA, class rank, teacher recommendations, portfolios, interviews, and so on.
What does all this mean? One of things that it means is that teachers and administrators are frustrated with having to spend more and more time testing, and more and more time prepping their students for the tests — and less and less time actually teaching. It’s no exaggeration to say that several weeks per year, depending on the grade level and an individual school’s zeal for results, are devoted to assessment.
The goal of assessment is purported to be to improve education, but the true goals are to make school reform big business for exploitative companies like Pearson, and for the consultants who latch onto the movement remora-like, for example, Charlotte Danielson and the Danielson Group; and to implement the self-fulfilling prophecy of school and teacher failure.
(Note that I have sacrificed grammatical correctness in favor of non-gendered pronouns.)
Hello, I am David Alan Binder
I love interviewing authors of published book(s); drop me a line and let’s talk at ab3ring@juno.com or dalanbinder@gmail.com.
My website is located at the following place; (it is a safe site)
https://sites.google.com/site/dalanbinder
I look forward to hearing from you. It is a simple process I email you questions and you return them with the answers.
Here is a sample of an interview with David A. Adler author of 250 books I recently posted to my website:
https://sites.google.com/site/dalanbinder/blog/0668-232post-authordavidaadlerhaswritten250booksinterviewbydavidalanbinder
My blog also features articles to help writers, interviews of authors, I offer a Write Coach service and I also Ghostwrite, plus I have a tab of poetry (don’t we all, LOL), and some T-shirts ideas in text that combined with someone’s graphic would be awesome.
Let’s talk.
David Alan Binder
[…] I’ve been writing about the Danielson Framework for Teacher Evaluation for a couple of years, and in fact my “Fatal Flaws of the Danielson Framework” has been my most read and most commented on post, with over 5,000 hits to date. I’ve also been outspoken about how administrators have been misusing the Framework, resulting in demoralized teachers and unimproved (if not diminished) performance in the classroom. (See in particular “Principals unwitting soldiers in Campbell Brown’s army” and “Lowered teacher evaluations require special training.”) At present, teachers are preparing — at great time and expense — to embark on the final leg of the revamped teacher evaluation method with the addition of student performance into the mix (see ISBE’s “Implementing the Student Growth Component in Teacher and Principal Evaluation”). I’ve also written about this wrongheaded development: “The fallacy of testing in education.” […]
[…] It all looks quite scientific. You can generate spreadsheets and bar graphs, showing where students are on this outcome versus that outcome; how this group of students compares to last year’s group; make predictions; justify (hopefully) expenditures. It’s the equivalent of the much-publicized K-12 zeal for standardized testing, which gives birth to mountains of data — just about all of which is ignored once produced, which is just as well because it’s all but meaningless. People ignore the data because they’re too busy teaching just about every minute of every day to sift through the voluminous numbers; and the numbers are all but meaningless because they only look scientific, when in fact they aren’t scientific at all. (I’ve written about this, too, in my post “The fallacy of testing in education.”) […]
[…] their claims of the effectiveness of the practice they’re advocating, because (as I’ve written about before) testing in education is fraught with problems. It’s extremely difficult, if not […]