Reliability in Self-Assessment

SELF-ASSESSMENT in LANGUAGE TESTING:
RELIABILITY and VALIDITY ISSUES
Christine Coombe

Self-assessment is not an alien concept to human behavior. All human beings are involved, either consciously or subconsciously, in an on-going process of self-evaluation. Until recently, however, the value of this human process was largely ignored in pedagogy. Learners were rarely asked to assess their performance, much less have a say in the construction of evaluation instruments. Pedagogically, the term self-assessment was rendered all but oxymoronic.

In the last decade, with the increased attention to learner-centered curricula, needs analysis, and learner autonomy, the topic of self-assessment has become of particular interest in testing and evaluation (Blanche 1988; Oscarson, 1998). It is now being recognized that learners do have the ability to provide meaningful input into the assessment of their performance, and that this assessment can be valid. In fact, with regard to second and foreign language, research reveals an emerging pattern of consistent, overall high correlations between self-assessment results and ratings based on a variety of external criteria (Blanch 1988; Oscarson 1984, 1997, 1998; Coombe 1992). In spite of these results, however, issues concerning the validity and reliability of language self-assessment need to be addressed.

Whereas formal or standardized tests have already established construct, predictive, and concurrent validity and reliability indices, the question of the validity and reliability of learners' estimates still remains moot. Because of the complex process nature of the language learning process, constructs of what is being measured need to be clarified. To be able to validly assess their behavior, learners need to know, in non-linguistic, simplified and practical terms, exactly what it is that they are trying to assess. Many language constructs, such as proficiency and communicative competence, are elusive and must be clearly and concisely operationalized and communicated to ensure the validation of assessment among learners. The criterion by which learners are to assess themselves may be opaque and thus add an additional threat to validity. Language learners in EFL contexts may find self-assessment particularly difficult if no comparisons to a native speaker are available to them. They may be able to judge their own fluency and understanding fairly accurately, but may find it more difficult to assess their accuracy of speech and pronunciation.

An additional consideration of validity is whether different language skills are comparable for assessment. They probably are not, and learners must be made aware of this. Linguistic analyses may require a different focus than communication does. Receptive skills may demand different attention than productive skills. The degree to which language learners are able to carry out valid self-assessments will depend on the nature of the skills being assessed and the relative accuracy with which learners can define and use, in concrete, behavioral terms, the skills they are to assess.

The reliability of learners' judgement is subject to variables whose influence on the learner is difficult to establish. Extraneous factors, such as parental expectations, career aspirations, amount of exposure to foreign languages, age, past academic record and lack of training in self-assessment, affect the accuracy of self-estimates and must, in some way, be accounted for. Furthermore, because reliability, like validity, depends on systematic analysis, the questions is raised as to whether short term self-assessments lend themselves to consistency. They most likely do not. Learners need to be asked to assess their performance on a regular basis. Their performance must be carefully and closely linked with the particular skills that they are working on. Learner ability to accurately self-assess language performance is not automatic. Therefore, constant feedback within a formative, as well as summative framework is a crucial factor for obtaining reliable self-assessment results.

As previously stated, there is strong evidence that self-assessments yield consistent and homogeneous results; indeed, research indicates that learner self-assessment is working in situations that were traditionally reserved for standardized tests (i.e. placement) (LeBlanc & Painchaud 1985). Nevertheless, self-assessment is not a panacea for all testing problems, and the field is fraught with problematic issues, a few of which have been addressed in this article. Further research is needed, not only to investigate the many validity and reliability issues involved, but also to help establish the place of self-assessment in the complete measurement and evaluation process.

References
Blanche, P. (1988). The FLIFLC Study. Monterey: The US Department of Defense Language Institute.
Coombe, C. (1992) The Relationship Between Self-assessment Estimates of Functional Literacy Skills and Basic English Skills Test Results in Adult Refugee ESL Learners. Ph.D. Diss. The Ohio State University.
LeBlanc, R. & Painchaud, O. (1985). Self-Assessment as a Second Language Instrument. TESOL Quarterly. 19, 4, 673-687.
Oscarsson, M. (1984). Self-Assessment of Foreign Language Skills: A Survey of Research and Development Work. Strasbourg: Council of Europe.
Oscarsson, M. (1997) "Self-Assessment of Foreign and Second Language Proficiency". In The Encyclopedia of Language and Education, Vol. 7. Kluwer Academic Publishers, pp 175-187.
Oscarsson, M. (1998). "Learner Self-Assessment of Language Skills". IATEFL TEA SIG Newsletter, Nov. 1998.