Learning in Mind

Rethinking the Purpose of Education

Standards and Assessment: Part 1

Bubble Test Answers


hat do you picture when you hear the word "assessment" in relation to public education? The picture that pops into most people's minds involves a paper and pencil test of some kind. In recent years, that picture looks more and more like this.

This is not a picture of a test, but of an answer sheet. Correct answers on bubble sheets such as this have become the ultimate goal of educational policy makers. This is not because learners actually care about the answers, but because they have been brainwashed, along with their parents and a good portion of the public, into believing that correct answers and high test scores lead to good grades, which lead to acceptance into a top college, which leads to a high-paying job, which supposedly leads to success (as defined by a small, but powerful segment of society.) Once again, NCLB and its successors are largely responsible for this perception and this myth.

Types of Standardized Tests

Although there are dozens of ways to assess learning, the "high-stakes" tests that now have the greatest influence and the most distressing repercussions include two major types. Norm-referenced tests (NRT) report whether test takers performed better or worse than a "standard" or hypothetical average student.

The second major type of test is a criterion-referenced test. (CRT) In this case, the "standard" refers to a body of information that students are expected to have acquired. For example, the PAARC test is one of the "high-stakes" tests required of children in public schools around the country. The standards (or criteria) on which these test are based are the Common Core Standards in language arts and math. For more information on Common Core and PAARC may wish to read this research review posted by ParentsAcrossAmerica.org. Other relevant information can be found here, here, and here.

A Deeper Drive into Norm-Refenced Tests

Bell Curve

Because the results of a norm-referenced test are based on the score earned by a hypothetical average student, that student's score is defined as "average." Anyone who scores higher than the average is considered "above average" and anyone who scores below the average is "below average." Statistically, test proponents assume that the scores on standardized tests will follow a "normal distribution," known as the Bell Curve. In practice, the "average" or mean score may not identified or selected until after students have taken the test, based on the mean of the actual test!

What would happen if, in the above diagram, the average or mean score last year was 50, but this year, the average is 75? Also assume that the test in both years is the same, or as nearly the same, as possible. Wouldn't that suggest that student achievement had improved? Well, no…because half of the scores would still be above average and half below average (not counting the "average" scores themselves.) In other words, every student in a given class could raise his or her score into what had previously been an "above average" range between test 1 and test 2…and 1/3 of them would still be "below average." Norm-referenced (standardized) assessments, by design, group students into the "smart/gifted ones" (those who scored above average) and the "slow/remedial" ones (the ones who scored below average.) Those who score in the middle are labeled "average," which in the mind of many translates to "mediocre." And those labels all too often determine the educatonal opportunities students are given.

The concepts of "average" and "normal distribution" are useful in statistical analysis of data. In reality, a normal distribution only occurs when there is a relatively small range of data that measures a single factor under the same conditions. For example, let's say you measured the time that it takes you to get to work each day for a month. You plot the data on a graph and it takes the shape of a normal curve. You might then use that data to decide how early to leave home to make sure you got to work on time. But could you still do that if you left at a different time each day? If you took a different route? If you drove a different vehicle? The minute you add a second factor to your data, the results of your measurements become all but useless.

How then, can a test based on normal distribution be applied to learning, which clearly depends on a huge number of factors? People are so accustomed to thinking in terms of average, and the concept of normal distribution has become so universally accepted, that many assume it can be applied to anything. "However, most real life data does not exhibit normal distribution. A normal distribution is more of an exception than a rule. Real world data shows variations (high and low) that are far more frequent than what the bell curve predicts."(1)

Grade Distribution Graph

Think about the implications of this design. Let's look at just a couple of ways the concept is used in education.

You may have heard of "grading on the curve." In this model, the normal distribution is assumed, even though it does not exist! To be consistent with the Bell Curve, in a class of 100 students, 10 will get an A, 20 will get a B, 40 will get a C, 20 will get a D, and 10 will get a failing grade (generally F because E is sometimes mistaken as Excellent!)

How likely is it that these grades reflect the learning of these students? Have all the students who earned an A learned the same things? Did they all get the same questions correct? Have all the students who earned an F learned nothing at all? What if your son or daughter had the 11th highest grade in the class…and that grade was only 0.1 point lower than the 10th highest grade? The grade automatically drops from an A to a B. Once those grades are assigned, what do you know about the learning of any student in that class? Keep in mind that those same grades would have been distributed if the range of scores was between 1 and 10, between 30 and 50, or between 0 and 100 on a 100 point test!

Standards were implemented as a potential cure for what was seen as an "achievement gap" between poor minority students and their more economically advantaged counterparts. Proponents of NCLB claimed that setting the same standards for all students would necessarily mean that all students, regardless of economic or social status, would have the same opportunities to succeed because teachers would be forced to hold higher expectations for all students. (See this article for an explanation of why this fundamental assumption was based on a false premise.)

Standardized testing was mandated so that schools could prove they were holding every student to the same standards. Through this method, NCLB promised to provide "equal access" for all to a meaningful education. But how, if schools were not equally funded, could access to the same opportunities be "equal." Unfortunately, many people seem to believe that just the existence of standards implies that something good must be happening!

In addition, the questions on these standardized tests are based on a set of externally defined things that a group of adults have decided all students of a given age should "know and be able to do." Because the only experience most Americans have is of "age graded" schools, they accept this premise…even though there is no evidence that it is true! [To learn more about the error of age-graded standards, you may wish to read this article.]

The False Assumption of Fairness

Presently, thanks to Common Core, the tests used to assess both students and teachers are limited to Language Arts and Math. Because of the test design itself, the "results" of these tests are skewed toward a certain type of student. They are largely written for students with the circled learning styles in the diagram below.

Learning Styles

Research has shown that these types of learners are more often found in families with higher socioeconomic status! Any student unfortunate enough to learn more efficiently in other ways starts the test at a huge disadvantage and is at greater risk of falling into the below average part of the curve. How can proponents of high-stakes tests claim that the tests are needed to give "equal access" to all students when the tests themselves are skewed toward learners from wealthier families? In fact, when researchers look for correlations between the scores on standardized tests, such as the SAT, the ONLY correlation they find is to the socioeconomic status of the families!(3)

Recently, the idea of "learning styles" has been questioned. Some researchers claim that, because everyone uses the same parts of the brain to process information, there are no "styles." But you need only think about how you personally learn the most effectively to realize that, if we don't have styles, we certainly have preferences based on what does or doesn't work for us. Further, the "learning style" chart is consistent with Howard Gardner's "multiple intelligences" theory, which has been widely accepted for years.

To make matters worse, CRTs and NRTs are not written by teachers based on what learners have actually studied. In fact, the reverse is true. Because of the importance given to these tests, their content has become the curriculum of the schools. And because the focus is on language arts and math, other subjects such as science, social studies, the arts, and job skills have been greatly diminshed or eliminated from the curriculum in many schools.

Multiple Intelligences

Because the government demands hard data they can use to compare schools, the same tests must be given to all children of a given age. Therefore, the construction of the tests is handed over to publishing companies. The costs of these tests pull huge amounts of money out of school budgets. The tests the companies construct are based on "generic" questions that meet the Common Core Standards in Language Arts and Math. I can tell you from personal experience that those hired to write these questions are largely freelance "writers"…many of whom are "content experts" and have never taught school! They may have majored in a field relevant to the test, but without having been in a classroom with real children, they have only the vaguest idea of what children are or are not capable of doing. So not only are the age/grade-level standards themselves flawed, but questions based on the standards are based on the skewed perceptions of individual freelancers, and certainly not on any scientific basis that insures validity.

To underscore how bad these questions can be, here's an article that recently appeared. As you read it, keep in mind that the teacher who had to give the test contacted the author because she couldn't answer the questions herself! In one case, it was because the publisher had printed the poem incorrectly in the test! And in another, the question, which made sense only in the mind of the test question writer, had no "correct" answer from the point of view of the poem's author! 

In addition, the publishers force teachers to sign confidentiality agreements in what they claim is an effort to prevent other students from gaining access to the questions. In effect, it means that everyone using the tests has to take the publishers' word that the tests 1.) are valid (test what they claim to test); and 2.) reliable: (produce stable and consistent results).

The key things to understand about norm-referenced tests are that:

Case in point: a few years ago, a writer in Education Week recalled a conversation with the director of testing for a state's education system who "agreed that being able to make a public presentation was likely to be a more important skill for adults than knowing how to factor a polynomial. 'But,' he added, 'I know how to test the ability to factor a polynomial.'" Only the latter, therefore, would be assessed—and taught—simply because it was easily tested.

If policy makers know that these tests are statistically invalid, why do they keep mandating their use? And if they truly don't know, why are they in a position of making educational policy decisions? In Part 2 of this article, we'll look at how these high-stakes tests are used and why they fail to do what they promise—assess learning.

  1. Vohra, Gaurav (2013) The Curse of the Bell Curve http://www.datasciencecentral.com/profiles/blogs/the-curse-of-the-bell-curve-part-2
  2. https://www.psychologytoday.com/blog/freedom-learn/201002/children-teach-themselves-read
  3. http://blogs.wsj.com/economics/2014/10/07/sat-scores-and-income-inequality-how-wealthier-kids-rank-higher/

Share This

Do you like what you find here? Are you intrigued? Please take the opportunity to share this page on your favorite social media site. It helps raise awareness and starts or adds to dialogue. Take a moment to share this page.