29 June 2009

Bow Data Day

Just completed an awesome day of work with the Bow School District. We had about 25 people from the high school, middle school and elementary school looking at data all day.

They said they were in the novice category in terms of data but I found out early that they were being to hard on themselves. They asked great questions and showed the same kind of insight that I have seen with so many teachers with whom I have had the pleasure to work.

Highlights for me included the incredible dedication of the teachers and the hard work done.

As we move to our second day together we will focus on:
  • Are we asking the right questions?
  • Are the questions big enough?
  • Can we really get to conclusions? Or even inferences?
  • What is really going on?

Looking forward to it.

19 June 2009

Grading and Reporting at CHS

This got a little longer than I anticipated. The introduction explains some of my thinking and the research section on page two includes quotes and summaries from leading researchers on assessment and grading.

Introduction

  • This weekend I tried to think back to why I began doing 1-5 grading in the first place. It actually had nothing to do with percent grades. I knew from the outset that the conversion would cause problems. My purpose was to clearly communicate to students exactly where they were on a learning continuum.
  • I was convinced by Bob Marzano that 5-9 categories were about the most anyone could reliably use to judge students.
  • Rick Stiggins and Anne Davies convinced me that assessment should be FOR learning. That it should not communicate and end point but that it should let a student know where they are and what they can do to go to the next level.
  • Tom Guskey convinced me that I really couldn’t reliably sort students into the 101 categories available on the 0-100 scale. (And I certainly couldn’t sort them into the 1001 categories available on the 0.0-100 scale.)
  • Grant Wiggins made me reconsider averages. Why, he asked, would we give a student the average when they could do it at the end of a course?
  • Dylan Wiliam gave me the example of the driver’s license—no matter how many times you fail—we all get the same driver’s license once we finally pass.

So putting this all together I began to use a system where I scored each part of each assignment using the 1-5 scale. I stressed early and often that even if you had a 1 you were still a good person—we are all at the beginning at some point in our lives. I stopped giving zeros and gave incompletes. I would say to students that I couldn’t give them a score until I had that piece of work. I stopped talking about percent grades at all—the only thing I would talk about with students was 1, 2, 3, 4 or 5.

It seemed to work. Conversations with students transitioned from being about their grade to being about what they can to improve their understanding of a specific topic. The only time it didn’t work was for any of the 8 reporting periods when I was limited to giving students only a 2 digit grade to summarize their performance. Which is where I was 4 years ago and where I am today…

So…

Here is an attempt to compile the thoughts of leading researchers on grading. First of all I think it is important to remember that all experts recommend a systematic approach to rethinking grading. They are all pretty similar so I will use the steps that Tom Guskey and Jane Bailey advise. (The first 5 would go in order. Six through 9 give information about steps 1-5.)

  1. Purpose—what is the purpose of grades?
  2. Define the impetus for change—why are we re-examining our grading and reporting system?
  3. Exploring the history of grading and reporting—what is the history of grading?
  4. Laying a foundation for change—what is the research that surrounds grading?
  5. Building a Grading and Reporting System
  6. Grading and Reporting Methods I: Letter grades, percentage grades, and other categorical grading
  7. Grading and Reporting Methods II: Standards-based, pass/fail, mastery grading, and narratives
  8. Grading and reporting for students with special needs
  9. Special problems in grading and reporting

Marzano recommends the following order:

Phase 1—have a vanguard team of teachers experiment with competency based assessment and record keeping.
Phase 2—identify the competencies and the software that will be used.
Phase 3—implement the system in stages.

Research
Anne Davies
Dr. Davies does not mention grading scales in her works or presentations. She does strongly and continuously stress that a change in grading should be a change from “assessment of learning,” to “assessment FOR learning.” Her take is that assessment is changing from something that happens to students to something that is done to help students.

“Research shows that when students are involved in the learning process—learning to articulate what they have learned and what they still need to work on—achievement improves.”[1]

Ken O’Connor
“Impreciseness is the main point of those who argue for letter grades rather than percentage grades; they believe that dividing student achievement into a limited number of categories is all that we can ever hope to do with any pretense of real meaning. According to this argument, using a 101 point scale gives a false sense of precision and, therefore, detracts from the main purpose of grades—meaningful communication of student achievement.

This argument has a great deal of merit, especially for elementary and middle schools, where grades are not involved in high stakes decisions, except pass/fail. However, where grades are involved in high stakes decisions about students’ educational future—such as college entrance, graduate school acceptance, and employment opportunities, numbers may be preferable to letters because there are more scale points available.”[2]

O’Connor then illustrates an example of a student who gets all 89s and gets a B and one who gets all 90s gets an A. The 5 point scale of A-F would amplify differences between students that really aren’t that great. BUT O’Connor is not against a 5 point system. He goes on to say,

“If all the guidelines and principles described in chapters 1-8 [Nearly the entire book] are applied then letter grades based on teachers’ professional judgments using a detailed descriptive scale will produce the best grades. But if teachers crunch numbers to arrive at grades, especially in high school and college, then percentage grades are probably fairer, and therefore better, than letter grades.”[3]

Again, the second sentence here might seem to be a rejection of a 5 point system. But, go back and read the first sentence again. What O’Connor advocates throughout the book, and when he presents, is for teachers to use grades, professional judgment, conversations with students, and multiple measures to determine grades. He is the man who told me about the following simple way of explaining the 5 point scale.

5=Wow!
4=Great!
3=Got it!
2=Nearly there!
1=Oops!

O’Connor would look at the 2 students described above and using professional judgment determine whether all 89s should merit the designation of A.

Guskey
“Letter grades offer a brief description of students’ achievement and level of performance, along with some idea of the adequacy of that performance (Payne, 1974). Because most parents experienced letter grades during their school years, they also have a general sense of what letter grades mean. For this reason, parents often prefer letter grades to newer, less traditional reporting methods (Libit, 1999).

Despite their simplicity, however, letter grades also have their shortcomings. First and probably most important, their use requires the combination of lots of different forms of evidence into a single symbol (Stiggins, 2001). As described in Chapter Three, many teachers combine product, process, and progress evidence in a single grade. This makes the grade a confusing hodgepodge that’s impossible to interpret, rather than a meaningful summary of students’ achievement and performance (Brookhart, 1991; Cross and Frary, 1996).

Second, despite educators’ best efforts; many parents interpret letter grades in strictly norm-referenced terms. Probably because the letter grades they received as students reflected their standing in comparison to classmates, parents frequently assume the same is true for their children. To them, a C doesn’t represent achievement at the third level of a five point scale, similar to a middle level belt in a karate class. Instead, a C means “average” or “in the middle of the class.”

A third shortcoming of letter grades is that the cutoffs between grade categories are always arbitrary and difficult to justify. If the teacher decides that the scored for a grade of B will range from 80 to 89, for example, the student with a score of 80 will receive the same grade as the student with a score of 89, even though there is a nine-point difference in their scores. But the student with a score of 79—a one point difference—receives a grade of C. Why? Because the teacher set the cutoff for a B grade at 80. Although cutoffs are absolutely necessary in any multilevel grading method, where they are set is always arbitrary.

Finally, letter grades lack the richness of other, more detailed reporting methods, such as standards-based grading or narratives. Although they offer a brief description of adequacy of students’ achievement and performance, letter grades provide no information that can be used to identify students’ unique accomplishments, their particular learning strengths, or their specific areas of weakness.”[4]

Guskey goes on to say, “letter grades should always be based on clearly stated learning criteria, not on norm-referenced criteria.”

Guskey and Bailey
The seriousness of arguments over plus and minus grades contrasts sharply with the simplicity of the issue involved. Basically, the issue comes down to whether is is better to have a 5-category grade system (A, B, C, D, and F), or a 12 category system (A, A-, B+, B, B-, and so on0. But if more categories are better, one might ask, “Why stop at 12?” There’s nothing sacred or particularly special about using 12 categories. Instead, we might consider a scale similar to the one used to express grade point average: 0.0-4.0.”[5]

This would equal 41 categories if you stick to just tenths place. Or you could go on to percents and get to 101 categories. Or even more if you go to percents and decimals. 92.34 for example.

Guskey and Bailey go on to say, “Research on rating scales shows that increasing the number of rating categories from 4 to just 6 generally lowers both the reliability and validity of the measures (Chang, 1993, 1994). Other studies indicate that scaled of 5 to possibly 9 categories are about as many as any qualified judge can reliable distinguish (Hargis, 1990, p. 14). Moreover, as the number of potential grades or grade categories increases, especially beyond 5 or 6, the reliability of grade assignments decreases. This means that the chance of two equally competent judges looking at the same collection of evidence and coming up with exactly the same grade is drastically reduced.”

Guskey and Bailey’s Recommendation
“Although to our knowledge no research evidence to date confirms that more affirming grade-category labels reduce stigma attached to low grades, we remain optimistic that this may be true. Certainly the connotation of Novice or Beginning is far less negative than that of Failing….

At the more advanced grade levels, we also believe that it is much more advantageous to assign a grade of I or Incomplete to students’ work and expect additional effort than it is to assign a letter grade of F (see the discussion of “Grades as Punishments” in Chapter 3).”[6]

Marzano
Marzano first writes a book on grading theory and only reluctantly gets to conversions about 7/8s of the way in. From having seen him talk it is clear that in his mind converting scores to anything is of little use. But I have also seen him say that grades in their traditional sense will probably always be necessary in grades 10-12 at least.

In essence Marzano explains that giving a score for each of the measurement topics (competencies) in a class would be preferable. But if a “district or school…wishes to use the traditional A, B, C, D, and F grading protocol,” it would need “a translation, such as the following:

3.00-4.00=A
2.50-2.99=B
2.00-2.49=C
1.50-1.99=D
Below 1.50=F”[7]

“Summary and Conclusions
Various Techniques can be use for computing final scores for topics and translating these scores to grades. Computer software that is suited to the system described in this book has three characteristics. First, the software should allow teachers to easily enter multiple scores for an assessment. Second, it should provide for the most accurate estimate of a student’s final score for each topic. Third, it should provide graphs depicting student progress.”[8]

Stiggins
Rick Stiggins and Anne Davies go hand in hand when they talk about Assessment FOR Learning. He wrote the original paper where he began always writing the for in all caps. They both focus most of their work on the idea that assessment (to sit beside) should be something that helps a student learn. In terms of conversions Stiggins suggests a “Decision Rule.”

“[If} at least 50% of the ratings are 5s and the rest are 4s, the grade is an A, [if] at least 75% of the ratings are 4s or better and the other 25% are not lower than 3, then the grade is a B, and [if] 40% of the ratings are 3s or better and the other 60% are not lower than two then the [grade] is a C.”[9]


“What about the situation in which a student receives a B, but it’s a high B or a low B? Over the course of an entire year, the difference will not be significant in terms of mastery, and mastery is what grades are based on, not averages. This isn’t being dismissive, but the reality is that the difference in learning (mastery) between the high and low versions of one particular grade is not that much. In larger grading scales, for example, the difference between a B and a B+ is just a few points. How exact can we be when identifying a student’s true mastery of something? Does a 0.01 (1 percent) difference in a grade-point average really mean a discernable, significant difference in mastery? No. It’s splitting hairs.

There are some teachers who disagree with this. They claim that there are a large number of mastery points wrapped into each percentage point due to multiple and influential assessments over a long period of time, and that the difference of one percentage point can describe mastery or lack of mastery of a significant amount of material. If this is the case, then whittling grades down to their exact and relative values (offering 2.75s for example) may be necessary. Each time we are tempted to do this however, let’s remember how elusive declarative mastery is, as well as how subjective we are in the micro-moment of grading each product from each student, and how we make it even more subjective when we aggregate a variety of data for a summative grade. And let’s wonder whether having done this, even justifiably, will have any lasting impact ten years down the road.”[11]

“One caution: If we primarily use a 4-point scale, many students and their parents will equate the highest numerical value (4.0) [or 5 in our case] with an A,…They will wonder why we just don’t write A, B, C, D and F if that’s what they really are.”[12]

Thanks for reading—if you made it this far.

Tom Crumrine
[1] Research of Black and Wiliam 1998 and Stiggins 2001 reported in Conferencing and Reporting by Gregory, Cameron and Davies.
[2] How to Grade for Learning—Ken O’Connor. Page 200.
[3] How to Grade for Learning—Ken O’Connor. Page 200.
[4] How’s My Kid Doing?-Tom Guskey. Pages 45-46.
[5] Developing Grading and Reporting Systems for Student Learning. Tom Guskey and Jane Bailey. Page 70.
[6] Developing Grading and Reporting Systems for Student Learning. Tom Guskey and Jane Bailey. Page 77.
[7] Classroom Assessment and Grading that Work—Robert J. Marzano. Page 122.
[8] Classroom Assessment and Grading that Work—Robert J. Marzano. Page 124.
[9] Fair Isn’t Always Equal—Rick Wormeli. Page 154. Wormeli quotes Stiggins here.
[10] Fair Isn’t Always Equal—Rick Wormeli. Page 154.
[11] Fair Isn’t Always Equal—Rick Wormeli. Page 154-155.
[12] Fair Isn’t Always Equal—Rick Wormeli. Page 157.

Cell phones in class

  • Mr. Crumrine’s Electronic Device Experiment
    Semester Two
    2008-2009 School Year
    Chemistry and Its Applications and Anatomy and Physiology Classes


    There has been much discussion this year about electronic devices and their place in the Concord High School community. While some would limit or outlaw electronic devices I come down firmly on the other side. Electronic devices are powerful tools that can connect us to each other and the work. The iPod touch, to highlight one, has thousands of educational applications. Some of them are as simple as the built in calculator while others provide students with interactive x-rays of the human body.

    Electronic devices can be used improperly—by both students and adults. I become pretty upset when I am giving a presentation and see people tapping away on their email accounts. And I really don’t like it when someone is ostensibly talking to me but in reality looking at their BlackBerry the whole time. But both students and adults can use these powerful tools in responsible ways. When I was in high school teachers had to teach us that our graphing calculators were appropriate for use in certain ways. In math class—OK. Playing tetris in English—not OK. We can do the same kind of teaching with electronic devices.

    During the second semester of this year my students and I came up with the following plan. We talked first about the value of electronic devices and then we talked about ways they could be used improperly.

    Here is the plan:


    Why would a student be allowed to use a cell phone/electronic device?

    ¨ Effective communication.
    ¨ Active self directed learner.

    In order to be successful after high school students need to know how to use modern technological tools. These include cell phones and other pocket electronic devices.

    The use of these tools however must be done in an appropriate way. Just as it would not be OK for an adult to text while a colleague is explaining something to them—it is not OK for a student to do the same.

    What do we do?
    • Use the calculator on your phone—OK at appropriate times.
    • Use the agenda feature—OK at appropriate times.
    • No texting at any time—because could disturb others outside of our class.

    Please ask for permission to:
    --use the calculator feature.
    --use the agenda feature in the last 5 minutes of class.

    In certain cases, with permission, you may be allowed to:
    --play games—work must be done.
    --listen to music—work must be done.
    --use the internet—you must use it for something related to class.

    Penalties
    • First offense—teacher takes cell phone—student gets it at end of class.
    • Second offense—teacher takes device—student gets it at end of school day.
    • Third offense—teacher takes device and turns it in to administration.


    I agree to this revised cell phone policy for our classroom only.

    Print Name:_______________________


    Sign Name:________________________

Reflections

The plan worked extremely well with the Anatomy and Physiology group.

  • I warned one student one time about improper use.
  • Nine or ten students had iPod touches (iPods touch?). Several of them used the anatomy flash card applications that are available for free or $0.99. One of them purchased a ten dollar application which was basically a digital textbook. They said they found it very helpful.
  • During our end of the year project several students used text messaging in an appropriate way to communicate with partners. An example is a group where 3 people were dissecting a cat and one person was in the library researching cat dissection.

The plan worked better than I expected with Chemistry and Its Applications
In a very cute and funny way students would always ask if they could use the calculator functions on their phones. I found this to be effective and it saved me money because I usually buy about 10 calculators per year with my own money.

  • A higher percentage of students in ChemApps had iPod touches when compared with Anatomy. But they did not take advantage of flash cards or other applications to my knowledge.
  • One notable exception was a student who had, on his own, downloaded a ten dollar spreadsheet application. We were doing a lab where the students had to take the temperature of a substance every 30 seconds. He asked if he could put the data in his phone. Not only did he input the data but he was able to quickly create a graph when he was done. While other students took about ten-15 minutes to create the graph he was able to immediately start answering the post lab questions. The most important part of this lab was not making but interpreting the graph so this was a plus.
  • There were several first offense violations during the semester (10) but students surrendered their phones for the period and I did not see repeat offenses except for the next bullet.
  • I encountered major issues with 2 students. They did not follow the cell phone policy at all even though they had signed the sheet. They would not surrender their phones when asked and used them pretty much whenever they wanted. I should have done a better job working with administration with these two students. I will say that these two students were the same ones who did not follow any of the other rules of our classroom. They walked out of class without asking, swore at me and other students, used racial slurs, and sometimes screamed. I don’t think any cell phone policy would cure them of their other ills.

Quick Conclusion:

I would be willing to work hard on a cell phone/electronic device policy that emphasizes proper use of these powerful tools. For our students these are the tools that they have used for communication since they first learned to communicate. Teaching proper use will not be easy but I would rather work on that than tell students they can never use the computer that is right there in their pocket.