skip page navigationOregon State University
OSU Home.|Calendar.|Find Someone.|Maps.|Site Index.

Faculty Senate

Faculty Senate » Faculty Forum Papers

Faculty Forum Papers

May 1978 - The Uses and Misuses of Student Evaluations By

Charles F. Warnath
Psychology

April 26, 1978


     Student evaluation of their college courses has a long tradition. In the 40's the faculty of the university which I attended accepted them as a routine part of each course. These evaluations were designed and analyzed by students and published in a special issue of the school newspaper. The main purpose of the evaluations were to tell students very specifically what they might expect when they signed for a course and, secondarily, to tell instructors how students reacted to specific aspects of the course. The items were developed by students to answer the questions which students have about a course: quantity and quality of the reading; tests and papers; value of lectures or discussions; willingness of instructor to meet with individual students; and the like.

     Now, I am not trying to sell this format as the best of all possible approaches to student evaluations but rather to contrast this format with the system which seems to predominate at OSU in order to raise some questions about what we are doing. The most basic difference between the two systems is that the one described was student sponsored and operated. It had no official sanction except that faculty members gave up one of their class periods. Although it was assumed that faculty and administrators could read the results as well as the students, there was no implication that administrative decisions would be affected by those results. Students evaluated aspects of the course, including the instructor's participation, primarily with the intent of telling other students what they might anticipate in the course and not with the intent of "sending a message" to the administration. By including items on the reading, papers, homework assignments, and tests, students were made aware that the course was a total learning experience and not restricted to aspects of the course in which the instructor was personally involved.

     Secondly, the purpose of the evaluation was clear: students were evaluating courses for the benefit of their fellow students. Evaluations which are supplied through an administrative office can carry mixed messages to students, particularly when no effort is made to feed back to students any of the information which has been collected. This is particularly true where, as in the case of the official OSU evaluation, all items relate to some function or characteristic of the instructor. Perhaps it is a fine point, but I believe there is a difference in the set with which student responds when the focus of the evaluation seems to be entirely on the instructor, with that evaluation going to the administration, and when the focus is on the course with the feedback going to the instructor or to other students.

     Third, the specificity and baseline of the questions encourage a significant difference in the types of responses which students will make to the items. In the evaluation described, the items were designed to elicit responses to specific aspects of the course based on individual quantitative or qualitative judgments, not on a scale requiring comparisons with other courses or other instructors. With generalized items and an unspecified baseline for responses, the probability exists that students will respond with different item interpretations and react with different comparison scales. To test this possibility, I gave the items appearing on the OSU punch card to two of my classes and asked them to respond in terms of what the items meant to them and how they made decisions about their ratings. It was obvious from their responses that there was little agreement on the meaning of the items and that their judgements covered a wide range of expectations and individual experiences. For instance, "Clarity of Presentation" was responded to by some students in terms of whether the instructor speaks in a loud, clear voice.

     So far as the baseline for responses is concerned, the OSU punch card implies some sort of comparison but the student is left with the task of deciding the criteria by which he/she will judge "average" and the extremes. As might be expected, students differed in terms of responding from their personal experience with the particular instructor, comparing the instructor to others (whether all, others in the department, or those most clearly remembered is not clear), or making judgements against some sort of "ideal expectation." For example, on the "Concern for the Student" item, some apparently rate on the basis of such specific factors as whether the instructor remains in the classroom at the end of class period to talk with students, while others rate on some general concept such as instructor is a "human person." Ironically, a characteristic of the instructor which results in a high rating on one item can result in a low rating on another item. "Acting as an authority" apparently impresses some students with the instructor's lack of concern for them while this same characteristic indicates to the other students the instructor's mastery of the subject. The baseline used by students for judging an instructor's "Availability" shows almost no consistency. While some students rate on the basis of the number of posted office hours, others rate on the basis of whether or not the instructor was in his/her office when they personally wanted to talk to the instructor.

     Fourth, where the student is encouraged to select a rating for every item by the omission of a "No Opinion" or "No Basis for Judgment" choice, some students are obviously making judgments on the basis of second-hand information or a "halo effect" carry over from other items. This is likely to occur when a student has not had personal experience which he/she generally uses as the basis for rating a particular item. The clearest example would be that of the hours posted but be rated below average by a student who has never made an attempt to meet with the instructor. Being forced to make some rating, the student may well use one item to reinforce his/her feelings, good or bad, about some other aspect of the course. One of my colleagues teaches a large lecture course for lower division students during the same term that he teaches a small, upper division class. While the lower division students give him low ratings on "Availability," his upper division students give him very high ratings on the same item. It seems logical that he is equally available to both groups except for immediate after-class contact.

     Moreover, in classes where attendance is not mandatory and test items are not keyed specifically to class presentations, some students may spend few hours in class and, yet, in the OSU evaluation, they are required to make judgments about the instructor which they can only reasonably through an on-going contact with the instructor in the classroom.

     As reported in recent issue (December, 1977) of Teaching Psychology, numerous studies indicate a consistency of ratings for an instructor teaching the same course at different times; however, the prediction of the ratings for an instructor from one course to another can drop almost to zero. The point which the authors of this research project were attempting to make was that, because of the lack of consistency of ratings for instructors teaching different courses, students could not predict the quality of course or characteristics of instructor in another course. The results of this research also indicate that only a fraction of the variance in the ratings seems to be due to the characteristics of the instructor of the usually identifiable factors within the course. I have a feeling that all of us know that there are some among us who are "stars" and scintillate in all their classes and there are some at the other end of the scale who mumble along or confuse students in whatever class they appear, but, that most of us have some good courses and we have some poor ones and the "goodness" or "badness" of a particular course is only partially within the control of the instructor. We like to think that every instructor could, with the proper instruction, become the complete Mr. Chips, loved by all students, beautifully organized in all class presentations and able to stimulate the most reluctant student. But let's face it, Mr. Chips is a myth along with the idea that the instructor is necessarily the cause of all the problems assigned to education by politicians and students. As one faculty member has remarked, "Each of us is the best instructor some students have had." Whether this is true or not, I have serious doubts our impact as instructors can be neatly summarized in a set of three-digit averages.

     And this brings me to my final point. The effectiveness of a particular course is not simply a matter of the instructor's "teaching well." What goes on in a class is a complex transaction between the instructor and a number of individuals with a variety of needs, expectations, and personal characteristics. The very instructor style and class design which excite some students can (and do) turn off other students completely. For those students who are passive listeners and do only what is specifically outlined for them to do, the class which requires their participation and initiative may be perceived as disorganized and the instructor as not doing his/her job. This variation in response extends even to the course details such as the source of reading materials. In classes where I have no text, some students complain about having to spend time withdrawing books from the Reserve Room while others are enthusiastic about the choices they have in their reading.

     From the comments made by student representatives on a committee to draw up guidelines for student participation in administrative reviews of faculty as well as comments I have heard from students in general, it would appear that students assume that there exists a set of judgments about an instructor held by all students and that the criteria for those judgments are the ones which they personally apply.

     The purpose of the above discussion is not to build a case for abolishing student evaluations. That would be a futile gesture at this point since they have become an integral part of consumer politics and a staple in the public relations "concern for students" approach to potential students and their families. Student evaluation of faculty are now too much identified with "accountability" to be eliminated. Beyond that, I do feel that direct student input should be considered in administrative decisions about faculty since the alternative is to rely on hearsay and the gripes of disgruntled students who complain to chairmen, deans, and the president. Moreover, instructors can learn from good student feedback about aspects of a course which would improve the learning possibilities for some of the students.

     My hope is that the points I have raised will sensitize faculty to some of the problems involved in student evaluations and that poor evaluations with ambiguous items can result only in misinformation. The purpose and goals of an evaluation must be absolutely clear to the students; the items must be specific; and the baselines for making the projective test which allows the student to respond from individual, often idiosyncratic, interpretations. With ambiguous questions to which students respond with only their own personal baselines as a guide, neither the instructor nor the administration can expect to receive helpful information. Since everyone seems to be taking the results of our present evaluation forms seriously, it would seem that faculty should become more concerned about the instruments which generate the information which administrators are using to make judgements related to salary, tenure and promotion.