Faculty Senate Back to Main Page
Student Evaluation of Teaching (SET):
Pilot Test Report and Motions for Faculty Senate
By: Advancement of Teaching Committee
Fall, 2002

Background and charge: In fall 2000, the Faculty Senate Executive Council asked the Advancement of Teaching Committee (AOT) to consider adding a "diversity" question to the existing OSU student evaluation of teaching (SET) form. AOT members searched for a question(s) from other universities or examples within OSU. We discovered that several universities had implemented comprehensive "campus climate" surveys, but none added diversity questions to SET forms. Reasons included the extreme complexity of and multiple meanings associated with the concept of diversity. During review of existing SET questions, AOT and survey specialists determined that three questions measured two variables and that validation data for these questions could not be found. Additionally, AOT learned that the scanning program at Milne Computer Center cannot be modified since both the programmers and the program language (Fortran) retired several years ago. In 2001, AOT was charged with revision of the questions and form long with searching for a way to include diversity questions.

Student Evaluation of Teaching

Process: AOT members first defined the primary goal of SET to help faculty or instructors improve teaching in the classroom, Extension educational events, and Extended Campus. Student ratings of instruction measure general instructional skill, which is a composite of three subskills: delivering instruction, facilitating interactions, and evaluating student learning (d'Appollonia and Abrami, 1997). Other factors that contribute to effective teaching such as assessing learning, facilities, or activities beyond the classroom were considered important, but outside the scope of SET. Although the sciences of survey research and teaching evaluation continue to evolve, our attempt has been to improve the SET form and questions used at OSU and to provide guidelines for the consistent and appropriate interpretation and use of data within existing policy and university accreditation domains.

AOT members consulted 12 universities, more than 450 faculty, several OSU colleagues with evaluation expertise or experience, and the scientific literature associated with SET analysis and interpretation (see Cuseo, 2000 for overview). OSU faculty and instructors described the need to modernize several questions on the existing form, to add space for specific questions or accreditation requirements, and to revise the narrative questions. Using a single page was recommended to save paper (except small class sizes where confidentiality might be compromised). The proposed SET questions and form were pilot tested in spring, 2002. Results were analyzed for reliability and validity (see below).

Ten SET questions were selected after basic teaching functions were identified from UW, Kansas State University (KSU), and 10 other university SET forms including the OSU form and published papers. Based on our review of SET literature and consultation with survey consultants, we selected a 6-point Likert scale anchored with equally spaced descriptors beginning with "very poor" on one end and "excellent" at the other end. A column "unable to rate" was added. Questions were selected with permission of the authoring institutions or modified to accommodate the equally distributed scale. Two questions (http://www.washington.edu/oea/describe.htm) selected from and validated by UW will standardize comparisons among faculty for P&T, awards, or merit. Questions were modified for Extension events while Extended Campus plans to adapt questions and forms to electronic evaluation upon Senate approval. Existing forms are available for fall quarter and part of winter quarter. The Milne Computer Center awaits approval by the Senate to update the form and scan program, purchase revised forms, and transition to new forms and reports during winter quarter.

Pilot test of SET: During spring quarter, 20 faculty from the Colleges of Liberal Arts and Engineering, and Extension volunteered various teaching events for evaluation using the proposed questions. One class was team-taught by 7 faculty from various disciplines.

A total of 458 students (Table 1) and 62 people participating in 4 Extension events (Table 2) completed the pilot. Half of the faculty chose to add questions about specific teaching interests or accreditation requirements on the back of the form (Table 1). AOT asked students to complete an "exit interview" about the SET form within the narrative box on the backside (Appendix 1).

Responses on the 6-point Likert scale were converted to a 1 to 6 numeric scale for calculating means, developing reports for faculty, and considering results in Tables 1 and 2. Evidence suggests that students did differentiate between questions, faculty instruction, and different faculty within a team-taught course. Slightly lower means for one or more teaching functions might indicate a topic for improvement (see each column). Students enrolled in lower division Engineering courses rated faculty lower than upper division courses. The literature commonly shows a correlation between lower division students and lower evaluation scores (see criterion-related validity below). Students in the team taught course clearly differentiated between faculty when names were printed on the SET form and instructions were clearly communicated.

The pilot test was analyzed for statistical reliability and validation. Reliability coefficients represent the level of agreement among students on the ratings of individual classes relative to mean differences across classes. Values can range between 0.0 (no agreement) to 1.0 (perfect agreement). Coefficients for all questions are considered excellent as summarized in the following table:
 
Reliability coefficients forresident (n = 458) and Extension (n = 60) faculty
  Questions 1 & 2 Questions 3 - 12 All questions
Resident faculty .8554 .9376 .9514
Extension faculty .8916 .9000 .9253
 
Validity determines whether the questions measure what was intended, in this case, the quality of teaching. Validity was established three ways, ie., content, criterion-related, and construct.

  • Content validity was reviewed by more than 450 faculty, several evaluation specialists, and about 2/3 of the students who answered "did the course evaluation allow you to adequately express your feedback on the course and the instructor?" (see Q 21 in Appendix). Nearly all responses were favorable.
  • Criterion-related validity correlates factors such as large class size, entry level students, required courses outside the major, etc with lower SET results than upper division courses within the major. Associative interpretation suggests that faculty and supervisors may interpret SET results based on research literature that confirms this relationship unless other data such as peer reviews or narrative comments fail to support this relationship.
  • Construct validity tests a hypothesis about the relationships among questions. In this case, we tested the hypothesis that questions for P&T differed from the ten questions designed to assess quality of teaching. Factor analysis (20 resident teaching events) provided evidence that all questions measure the same construct with the first question accounting for 64% of variance. Although questions on the Extension form were similar, sample size was too small for factor analysis.


Overall, students responded positively (89%) to the question, "did the course evaluation allow you to adequately express your feedback on the course and the instructor?" About 7% wrote comments about the form and improved questions (Appendix 1). A majority of students (77%) indicted SET questions were clear, except 7 identified question # 7 and 5 described #12 as confusing. AOT discussed the purpose of both questions was to link or show relationships between two basic teaching functions. For example, question #7 asks whether the instructor used various instructional techniques to accommodate differences in learning styles among students while #12 asks about instructor's evaluation of student performance was in accordance with course objectives? AOT recommends that the questions be maintained as important relationships associated with quality teaching at OSU. Of course, faculty and/or supervisors also need to interpret the relevance of all questions for the discipline or individual instructors.

Interpreting and Reporting SET results: Teaching improvement is achieved with self-assessment, SET data, and peer review, both in the classroom and from others teaching similar topics at comparable universities (England, et. al., 1996; Marsh and Roche, 1997). SET literature clearly distinguishes questions involving "overall" satisfaction with the course or instructor as valid for comparing faculty and courses between departments and colleges. Thus, questions 1 and 2 are intended for reporting P&T, awards, or merit while questions 3 through 12 are designed to improve the quality of teaching. Reports of the 12 questions are currently sent to instructors and supervisors for review and improvement as needed. All questions selected by the faculty should remain confidential. Consistency for individual instructors is improved with a minimum of 7 course evaluations (Gillmore, 2000). As McKeachie (1997) advises, AOT proposes to conduct workshops to ensure more valid use of student ratings. Also, our charge of clarifying P&T guidelines with Faculty Senate P&T and Awards committees will be accomplished following approval of the form and questions.

AOT Recommends: Adoption of the revised form and questions as follows:

  • Adopt the modified student assessment form and 12 questions as proposed.
  • Adopt proposed policy of reporting means for questions 1&2 for P&T, awards, and merit; reporting means of questions 3-12 to faculty and supervisors; and reporting all data from questions added to the back of the form to faculty only (except accreditation requirements).
Diversity Questions within SET

As AOT explored diversity, we soon discovered that diversity is defined from one or more of the following perspectives: 1). equality versus discrimination, 2). behaviors or attitutes, and 3). disciplinary hypotheses or view points. In addition, we discovered that several universities have completed campus climate surveys addressing broad aspects of diversity in the classroom, campus, and community, but none has included diversity questions within the SET forms. In spring 2002, Nana Lowell, Director of the Office of Educational Assessment at UW, led a Faculty Forum at OSU to consider diversity within the framework of teaching. Following the Forum, AOT recommends a cautious approach to adding diversity questions to the SET form. We propose a pilot to test questions, frequency, and learning (utility of adding diversity questions to SET form). The pilot would complement a proposed campus climate survey.

Diversity pilot test proposed: AOT proposes 6 questions be tested to assess the quality of teaching associated with possible unintended attitudes or habits related to diversity in the classroom (Appendix 2). These questions were selected or modified from several campus climate surveys. Questions would be printed on a separate sheet for scanning using the format from the back of the SET form. Means will be summarized and sent only to faculty with data aggregated at the College level. Pilot analysis will be coded per IRB confidentiality requirements. Faculty may choose to include diversity questions in at least one course or teaching event or to sample courses such as one lower, one upper, and one graduate division course during the pilot year. AOT would summarize results and report to the Senate in winter/spring, 2004.

AOT recommends endorsement of:
  • A pilot test of diversity questions per IRB guidelines beginning winter quarter, 2003 for 1 year and reported to Senate in 2004. The pilot would complement the proposed campus climate survey.

Advancement of Teaching Committee Members
2000-2001 2001-2002 2002-2003
Hans van der Mars, chair Ray William, chair Ray William, chair
Laura Rice Elaine Pedersen Ken Krane
Elaine Pedersen Ken Krane Paula McMillen
Ken Krane Paula McMillen Margie Haak
Ray William Elizabeth Thompson Molly Engle
 
References

Cuseo, J. 2000. Evaluating new-student seminars and other first-year courses via course-evaluation surveys: Research-based recommendations regarding instrument construction & administration, data analysis, data summary, & reporting results. Marymount College. http://www.brevard.edu/fyc/fya/CuseoLink.htm

d' Apollonia, S. and P.C. Abrami. 1997. Navigating student ratings of instruction. American Psychologist. 52:1198-1208.

England, J., P. Hutchings, and W.J. McKeachie. 1996. The professional evaluation of teaching. American Council of Learned Societies Occasional Paper No. 33. http://www.acls.org/op33.htm

Gillmore, G.M. 2000. Drawing inferences about instructors: The inter-class reliability of student ratings of instructors. OEA Research Reports 00-02. http://www.washington.edu/oea/0002.htm

Marsh, H.W. and L.A. Roche. 1997. Making student's evaluations of teaching effectiveness effective. American Psychologist. 52:1187-1197.

McKeachie, W.J. 1997. Student ratings: The validity of use. American Psycologist. 52:1218-1225.

Table 1. Means of SET (student evaluation of teaching) pilot test for resident instruction, spring quarter, 2002. Ratings are based on 6-point scale.
 
Question Liberal Arts Engineering Team Taught Course
 
Section 1 Anthro PS PS PS PS Psy Psy Anthro ECE ECE ECE CS Soc PS F&W FOR ANS HORT XXX
 
1. The Course as a whole was 4.3 4.9 5 5 5.4 5 5 4.7 3.3 4.2 5.1 3.5 4 4.4 3.5 2.7 5.5 4.3 4.1
 
2. The instructor's contribution to the course was 4.8 5.2 5.4 5.2 5.8 5.4 5.3 5.2 3.7 4.6 5 3.7 4.7 5.2 3.7 3.7 5.2 4.2 4.5
 
3. Clarity of course objectives or outcomes was 4.5 4.9 5.4 4.9 5.5 5.2 5.3 4.8 3.5 4.6 4.9 3.7 2.8 3.6 3 2.6 3.7 3.2 3.4
 
4. Clarity of student responsibilities and requirements was 4.7 4.9 5.5 5.2 5.8 5.5 5.4 4.9 3.6 4.7 5 4.1 2.8 3.7 2.8 2.7 4.2 3.3 3.4
 
5. Course organization was 4.2 5 4.9 4.9 5.3 5.1 5.2 5 3 4.6 4.8 3.9 3.5 3.6 3.3 2.7 4.2 3.5 3.6
 
6. Availability of extra help when needed was 4.4 5.1 5.6 5.6 5.5 4.9 5.5 4.7 4 4.9 5.4 4.3 4.2 5.4 4 4 5.4 3.8 5.1
 
7. Instructor's use of various instructional techniques to accommodate differences in learning styles among students was 4.5 3.7 5.2 5.2 4.7 4.8 4.5 4.9 3.6 4.4 5.1 3.6 4.4 5 3.8 3.3 5 4.2 4.3
 
8. Instructor's interest in my learning was 4.5 5.3 5.2 5.8 5.6 4.9 4.8 4.9 4 4.9 5.3 3.9 4 5.6 4.5 3.9 5.7 4.7 4.8
 
9. Instructor's ability to stimulate my thinking more deeply about the subject was 4.3 5.2 5.2 5.2 5.3 5 4.9 5.2 3.6 4.6 5.1 3.5 4.8 5.7 4.3 3.1 5.5 4.2 4.8
 
10. Instructor's timely feedback to tests and other work was 4.4 5.2 5 4.6 5.4 5.6 5.8 4.8 3 4 5 3.7 5 4.8 5 3 5.7 4.5 5.4
 
11. Instructor's ability to develop a welcoming classroom environment for all participants was 4.8 4.3 5.5 5.3 4.9 5.5 5.5 5 4 5 5.7 4 4.3 5.6 5 3.7 5.8 4.3 4.5
 
12. Instructor's evaluation of student performance in accordance with course objectives was 4.5 4.8 5.2 5.4 5.4 5 5.1 4.9 3.7 4.6 5.3 4 4.5 5.4 4 3.3 5.8 4 4.1
 
Table 1. Means of SET (student evaluation of teaching) pilot test for resident instruction, spring quarter, 2002. Ratings are based on 6-point scale.
 
Question Liberal Arts Engineering Team Taught Course
 
Section 1 Anthro PS PS PS PS Psy Psy Anthro ECE ECE ECE CS Soc PS F & W FOR ANS HORT XXX
 
Individual questions by faculty or or for accreditation 3.8 4.8     5.5     4.6/4.9   4.7 4.9   4.4/4.4 4.9/4.0 4.2/4.0 3.3/3.7 5.5/4.7 4.7/3.7 4.4/4.1
 
4.2 5.4     5.6     4.8/4.9   4.9 4.4   4.4/4.4 3.9/4.7 3.8/3.5 3.4/2.9 4.8/5.2 3.8/4.7 4.0/3.6
 
4.2 5.1     5.5     4.3/4.7   4.5 4.7   3.0/3.4 3.3/4.4 3.3/4.2 3.0/4.1 4.3/5.3 2.8/4.2 3.2/3.5
 
              4.9/4.8   4.5 4.9   4.2/4.6 5.2/4.8 4.0/3.8 4.3/3.0 5.8/5.8 4.7/3.5 4.6/4.4
 
              4.8/4.4   4.2 5.1                
 
              4.9/4.8   4.8                  
 
Number of respondents (n=) 47 47 8 12 16 19 29 133 24 39 14 19 6 9 6 7 6 6 8
 
Table 2. Means of SET (student evaluation of teaching) pilot test for Extension, spring quarter, 2002. Ratings are based on 6-point scale.
 
Section 1 Ext. training #1 Ext training #2 Ext. training #3 County evaluation
 
1. The training as a whole was 4.9 4.9 5 5.5
 
2. The instructor's contribution to the training was 4.8 4.9 5.1 5.5
 
3. Clarity of training objectives was 4.4 4.4 4.9 5.3
 
4. Clarity of how you might use this training was 5.1 5.1 4.7 5.1
 
5. Teaching organization was 4.1 4.8 4.4 5.3
 
6. Instructor's use of examples and illustrations was 4.6 4.4 4.4 5.6
 
7. Instructor's use of teaching aids (slides, overheads, charts, etc.) was 4 4.4 4.6 5.2
 
8. Instructor's ability to stimulate my thinking more deeply about the subject was 4.6 4.7 4.4 5.6
 
9. Instructor's responsiveness to questions was 5.3 5.4 5.1 5.8
 
10. Instructor's use of participant discussion to enhance my learning was 5 5.4 5 5.7
 
11. Instructor's ability to develop a welcoming classroom environment for all participants was 4.7 4.6 4.7 5.8
 
12. Instructor's skill in making the information useful to me was 5 4.8 4.4 5.3
 
Appendix 1. Written Comments by Students during pilot SET, spring quarter, 2002. (Q#21, 22, and 23 were "survey exit" questions about the form providing feedback about the course and instructor, about clarity of survey questions, and additional comments.)
Course - Fac # Survey # Q# Comment
Psy - #18 167 22 #12
Psy- #18 170 23 I liked the question about returning our tests & grades. I also like having the ability to rate prof on things other than ?____?
Psy- #18 175 22 Q#7 on the front page was a little confusing, most teachers stay consistent with the techniques they use throughout the course. The additional information took me by surprise and I couldn't think of anything to add.
Psy- #18 176 22 Q#7 - wordy
Psy- #18 176 22 Q#9 - vane
Psy- #18 176 23 Questions were much better in this evaluation than old one, more concrete and easy to choose answer.
Psy- #18 182 21 Questions 7, 10, 11 & 12 are great questions that are very relevant to the learning process.
Psy- #18 183 21 The new questions are annoying as the previous evaluation forms.
Psy- #18 183 22 Q#7, 8, 9, 3
Psy- #18 184 21 yes, this form was very complete. The last section (III) was very helpful in that it allowed for other comments.
Psy- #18 185 21 Yes. I like the opportunity to rate additional subjects or skills that may be more relevant to the class.
Psy- #18 185 22 Q#7 seemed irrelevant.
Psy- #18 186 21 It could have used some of the issues I stated in Q#22.
Psy- #18 186 22 Keeps interest and attention to class was good. Receptiveness to students was good. Variety of teacher's techniques was good. Involved class in ?___? was fair to good.
Psy- #18 187 23 I liked Q#7 about techniques.
Psy- #18 188 22 Rating the course as a whole at the beginning of the evaluation may not be good. I think it would be better at the end.
Psy- #18 191 21 Evaluation of instructor and course should be in clearly different areas.
Psy- #18 192 23 I like Section I - it makes it a lot clearer.
Psy- #18 195 23 This form is more comprehensive and allows full expression.
Psy- #17 149 21 There is discretion over what each rating means to the individual.
Psy- #17 149 22 Q#7 - I don't think this means anything to the individual - how do I know if his style accommodates others - I just know what I got out of the class.
Psy- #17 149 23 Many of these questions asked the same things the other survey did - I liked the way the other one asked things better - it didn't sound so formal and fancy.
Psy- #17 151 21 I liked the format of the first page.
Psy- #17 153 23 I feel that this form was more clear and better suited to the class evaluation. The style of response options made it much easier for me to express myself.
Psy- #17 165 23 The instructor's interest in my learning shouldn't be a question because of the large class sizes - no teacher knows me personally.
ECE- #13 059 22 Some of the questions are too general.
ECE- #13 060 22 Q#3
ECE- #13 060 23 This is better than the normal evaluation.
ECE- #13 063 23 This evaluation is more thorough than the normal evaluation.
ECE- #13 067 21 No questions about how the instructor taught the class.
ECE- #13 069 23 Better than the old one!
ECE- #13 072 23 Finally - a new form!
ECE- #13 075 22 Everything was good.
ECE- #12 023 23 I like the extended scale.
ECE- #12 028 23 Good form!
ECE- #12 035 23 This evaluation form is much more meaningful than the standard one.
ECE- #12 047 21 Maybe a little better.
ECE- #10 002 22 I don't think I could rate Q#7 on the other side, it didn't seem to be something I personally could judge.
ECE- #10 008 22 All the questions were clear.
XXX- #27 428 22 Although some of the questions seemed a little repetitive.
Psy- #23 240 21 …but a direct question about whether tests were relevant would be a welcome addition.
Psy- #23 240 22 Section 3, Q#3 - for a math course, the instructor can have almost no impact, as the material can be learned from the book. It's a result of the coursework, not the instructor that I learned the math.
Psy- #23 244 22 10 - I though was unclear.
Psy- #23 246 22 Section II was a little confusing.
Psy- #23 257 21 It had no questions about the work the instructor gave or resources they used and what was thought about that.
Psy- #23 266 23 Approach ideas in different ways for different learning styles.
Psy- #23 274 22 Q#8 was kinda of ambiguous. Q#2 was kinda unclear became what level of contribution?
Anthr-#15 098 21 I would like to see something to effect of "Instructor showed enthusiasm."
Anthr-#15 098 22 Q#12 on front is a bit unclear, could be worded better.
Anthr-#15 099 21 I think there should be more questions. But I love how it makes you think more!!
Anthr-#15 099 23 I like the new format. It would be nice if the teacher's name course & section was pre-filled out.
Anthr-#15 101 22 Is #12 talking about test questions or the actual grades received for tests?
Anthr-#15 110 21 No different from the other past forms.
Anthr-#15 111 22 Section I #12
Anthr-#15 111 23 I like that the evaluations are being improved.
Anthr-#15 112 23 I think this is a more effective way to evaluate teachers, but I don't think these evaluations ever do anything as far as change teachers' ways.
Anthr-#15 113 23 Much better than before! Can you add questions to Section 3?
Anthr-#15 116 22 Q#6 was confusing.
Anthr-#15 118 23 Would prefer additional question relating to in-class work/group work.
Anthr-#15 123 23 The ?____? to rate was a great option!
Art- #24 326 21 This one is better than the standard one.
Art- #24 333 22 Q#7 doesn't make sense to me.
Art- #24 350 22 Q#12 in Section 1.
Art- #24 393 21 I am extremely impressed with this evaluation form, it is a huge improvement.
XXX represents cross-listed team taught course.
 
Appendix 2: Proposed diversity questions for pilot testing.

SECTION III: - Additional information to improve teaching.
  PLEASE FILL-IN THE APPROPRIATE RESPONSE.
MARK ONLY ONE CIRCLE PER QUESTION.
VERY POOR POOR FAIR GOOD VERY GOOD EXCELLENT   UNABLE TO RATE
1. The instructor's ability to treat all students fairly was ¡ ¡ ¡ ¡ ¡ ¡   ¡
2. The instructor's respect for all students was ¡ ¡ ¡ ¡ ¡ ¡   ¡
3. The instructor's ability to constructively deal with conflicting perspectives was ¡ ¡ ¡ ¡ ¡ ¡   ¡
4. The instructor's skill in dealing with disparaging or insensitive remarks during class was ¡ ¡ ¡ ¡ ¡ ¡   ¡
5. The contribution this course made to explore different perspectives within the discipline was ¡ ¡ ¡ ¡ ¡ ¡   ¡
6. The instructor's response to disruptive behavior was ¡ ¡ ¡ ¡ ¡ ¡   ¡
7.   ¡ ¡ ¡ ¡ ¡ ¡   ¡
8.   ¡ ¡ ¡ ¡ ¡ ¡   ¡
9.   ¡ ¡ ¡ ¡ ¡ ¡   ¡
10.   ¡ ¡ ¡ ¡ ¡ ¡   ¡
11.   ¡ ¡ ¡ ¡ ¡ ¡   ¡
 
Your handwritten comments in response to the following questions will be returned to the instructor after grades are turned in. We encourage you to respond to all questions as thoughtfully and constructively as possible. Your comments will be used by the instructor to improve the course. However, you are not required to answer any questions.
 
 
If you would like written comments to be placed in the Instructor's personnel file, you need to write and sign a letter to the appropriate Chair, Head, or Dean.
 

| Agendas | Bylaws | Committees/Councils | Faculty Forum Papers | Handbook | Meetings/Locations | Membership | Minutes |