Luận văn Evaluating the Reliability and Validity of an English Achievement Test for Third-Year Non- major students at the University of Technology, Ho Chi Minh National University and some suggestions for changes

The English syllabus for Information Technology students is designed by teachers of the English section, the University of Technology, which has been applied for over five years. It is divided into two phases: Basic English (1) and ESP (2). Phase 1, which lasts three first semesters with 99 forty-minute periods, is covered by Lifelines series in which the students only pay attention to reading skill and grammar.

Phase 2, including three final semesters with 93 forty-minute periods in total, is wholy devoted to ESP. It should be noted that the notion ESP in this context is simple the constitution of the English language and the contents for Information Technology. In Phase 2, the students work with Basic English for Computing which consists of twenty eight units providing background knowledge and vocabulary for computing. This book covers four skills such as listening, speaking, reading and writing and language focus. The reading texts in the course book are meaningful and useful to the students because it first revises their knowledge, language items and then supplies the students with background knowledge and source of vocabulary relating to their major - Information Technology. Table 3.1 illustrates how the syllabus is allocated to each semester.

38 trang | Chia sẻ: maiphuongdc | Lượt xem: 3442 | Lượt tải: 2

Bạn đang xem trước 20 trang tài liệu Luận văn Evaluating the Reliability and Validity of an English Achievement Test for Third-Year Non- major students at the University of Technology, Ho Chi Minh National University and some suggestions for changes, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên

ficient for a test indicates that the items of the test are very similar to each other in content. It is important to note that the length of a test can affect internal consistency reliability. Split-half reliability is one variety of internal consistency methods. The test may be split in a variety of ways, then the two halves are scored separately and are correlated with each other. A formula for the split-half method may be expressed as follows: (3). (Henning, 1987) (In which: Rtt: reliability estimated by the split half method; r A, B: the correlation of the scores from one half of the test with those from the other half). Rational equivalence is another method which provides us with coefficient of internal consistency without having to compute reliability estimates for every possible split half combination. This method focuses on the degree to which the individual items are correlated with each other. (4). Kuder-Richardson Formular 20 (Henning, 1987) Test-retest reliability indicates the repeatability of a test scores with the passage of time. This estimate also reflects the stability of the characteristics or constructs being measured by the test. The formula for this method is as follows: (5). Rtt = r 1, 2 (Henning, 1987) (In which: Rtt: the reliability coefficient using this method; r1, 2: the correlation of the scores at time one with those at time two for the same test used with the same person). Inter-rater reliability is used when scores on the test are independent estimates by two or more judges or raters. In this case reliability is estimated as the correlation of the ratings of one judge with those of another. This method is summarized in the following formula: (6). (In which Rtt: Inter-rater reliability, N: the number of raters whose combined estimates form the final mark for the examinees, rA, B: the correlation between the raters, or the average correlation among the raters if there are more than two). To improve the reliability of a test is to become aware of test characteristics that may affect reliability. Among these characteristics are test difficulty, discriminability, item quality, etc. Test difficulty: is calculated by the following formular: (7) (In which, p: difficulty, Cr: sum of correct responses, N: number of examinees) According to Heaton (1988: 175), the scale for the test difficulty is as follows: p: 0.81-1: very easy (the percentage of correct responses is 81%-100%) p: 0.61-0.8: easy (the percentage of correct responses is 61%-80%) p: 0.41-0.6: acceptable (the percentage of correct responses is 41%-60%) p: 0.21-0.4: difficult (the percentage of correct responses is 21%-40%) p: 0-0.2: very difficult (the percentage of correct responses is 0-20%) Discriminability The formula for item discriminability is given as follows: (8) (In which, D: discriminability, Hc: number of correct responses in the high group, Lc: number of correct responses in the low group). The range of discriminability is from 0 to 1. The greater the D index is, the better the discriminability is. The item properties of a test can be shown visually in a table as below: Table 2.2 Item property Item property Index Interpretation Difficulty 0.0-0.33 0.33-0.67 0.67-1.00 Difficult Acceptable Easy Discriminability 0.0-0.3 0.3-0.67 0.67-1.00 Very poor Low Acceptable (Henning, G., 1987) This index sets a ground for remarking the difficulty and discriminability in the final achievement test that was chosen by the author. 2.3.2 Test Validity It should be noted that different scholars think of validity in different ways. Heaton (1988: 159) also provides a simple but complete definition of validity as “the validity of a test is the extent to which it measures what it is supposed to measure”. Hughes (1989: 22) claimed that “A test is said to be valid if measures accurately what it is intended to measure”. It is taken from the Standards for Educational and Psychological Testing (1985: 9) that “Validity is the most important consideration in test evaluation. The concept refers to the appropriateness, meaningfulness, and usefulness of the specific inferences from the test scores. Test validation is the process of accumulating evidence to support such inferences”. Thus, to be valid, a test needs to assess learners’ ability of a specific area that is proposed on the basis of the aim of the test. For instance, a listening test with written multiple-choice options may lack validity if the printed choices are so difficult to read that the exam actually measures reading comprehension as much as it does listening comprehension. Validity is classified into such subtypes as: Content validity This is a non-statistical type of validity that involves “the systematic examination of the test content to determine whether it covers a representative sample of the behavior domain to be measured” (Anastasi & Urbina, 1997: 114). A test has content validity built into it by careful selection of which items to include. Items are chosen so that they comply with the test specification which is drawn up through a thorough examination of the subject domain. Content validity is very important in evaluating the validity of the test in terms of that “the greater a test’s content validity, the more likely it is to be an accurate measure of what is supposed to measure” (Hughes, 1989: 22). Construct validity A test has construct validity if it demonstrates an association between the test scores and the prediction of a theoretical trait. Intelligence tests are one example of measurement instruments that should have construct validity. Construct validity is viewed from a purely statistical perspective in much of the recent American literature Bachman and Palmer (1981a). It is seen principle as a matter of the posterior statistical validation of whether a test has measured a construct that has a reality independence of other constructs. To understand whether a piece of research has construct validity, three steps should be followed. First, the theoretical relationships must be specified. Second, the empirical relationships between the measures of the concepts must be examined. Third, the empirical .evidence must be interpreted in terms of how it clarifies the construct validity of the particular measure being tested (Carmines & Zeller, 1991: 23). Face validity A test is said to have face validity if it looks as if it measures what it is supposed to measure. Anastasi (1982: 136) pointed out that face validity is not validity in technical sense; it refers, not to what the test actually measures, but to what it appears superficially measure. Face validity is very closely related to content validity. While content validity depends on a theoretical basis for assuming if a test is assessing all domains of a certain criterion, face validity relates to whether a test appears to be good measure or not. Criterion-related validity Criterion-related validity is used to demonstrate the accuracy of a measure or procedure by comparing it with another measure or procedure which has been demonstrated to be valid. In other words, the concept is concerned with the extent to which test scores correlate with a suitable external criterion of performance. Criterion-related validity consists of two types (Davies, 1977): concurrent validity, where the test scores are correlated with another measure of performance, usually an older established test, taken at the same time (Kelly, 1978; Davies, 1983) and predicative validity, where test scores are correlated with some future criterion of performance (Bachman and Palmer, 1981). 2.3.3 Reliability and Validity Reliability and validity are the two most vital characteristics that constitute a good test. However, the relationship between reliability and validity is rather complex. On the one hand, it is possible for a test to be reliable without being valid. It means that a test can give the same result time after time but not measure what it was intended to measure. For example, a MCQ test could be highly reliable in the sense of testing individual vocabulary, but it would not be valid if it were taken to indicate the students’ ability to use the words productively. Bachman (1990: 25) says “While reliability is a quality of test scores themselves, validity is a quality of test interpretation and use”. On the other hand, if the test is not reliable, it cannot be valid at all. To be valid, as for Hughes (1988: 42), “a test must provide consistently accurate measurements. It must therefore be reliable. A reliable test, however, may not be valid at all”. For example, in a writing test, candidates may be required to translate a text of 500 words into their native language. This could well be a reliable test but it cannot be a valid test of writing. Thus, there will always be some tension between reliability and validity. The tester has to balance gains in one against losses in the other. 2.4 Achievement test Achievement tests play an important role in the school programs, especially in evaluating students’ acquired language knowledge and skills during the course, and they are widely used at different school levels. Achievement tests are known as attainment or summative tests. According to Henning (1987: 6), “achievement tests are used to measure the extent of learning in a prescribed content domain, often in accordance with explicitly stated objectives of a learning program”. These tests may be used for program evaluation as well as for certification of learned competence. It follows that such tests normally come after a program of instruction directly. Davies (1999: 2) also shares an idea that “achievement refers to the mastery of what has been learnt, what has been taught or what is in the syllabus, textbook, materials, etc. An achievement test therefore is an instrument designed to measure what a person has learnt within or up to a given time”. Similarly, Hughes (1989: 10) said that achievement tests are directly related to languages courses, their purpose being to establish how successful individual students, groups of students, or the courses themselves have been in achieving objectives. Achievement tests are usually carried out after a course on a group of learners who take the course. Sharing the same idea about achievement tests with Hughes, Brown (1994: 259) suggests: “An achievement test is related directly to classroom lessons, units or even total curriculum”. Achievement tests, in his opinion, “are limited to a particular material covered in a curriculum within a particular time frame”. There are two kinds of achievement tests: final achievement test and progress achievement test. Final achievement tests are those administered at the end of a course of study. They may be written and administered by ministries of education, official examining boards, or by members of teaching institutions. Clearly the content of these tests must be related to the courses with which they are concerned, but the nature of this relationship is a matter of disagreement amongst language testers. According to some testing experts, the content of a final achievement test should be based directly on a detailed course syllabus or on the books and other materials used. This has been referred to as the syllabus-content approach. It has an obvious appearance, since the test only contains what it is thought that the pupils have actually encountered, and thus can be considered, in this respect at least, a fair test. The disadvantage of this type is that if the syllabus is badly designed, or the books and other materials are badly chosen, then the results of a test can be very misleading. Successful performance on the test may not truly indicate successful achievement of course objectives. The alternative approach is to design the test content directly on the objectives of the course, which has a number of advantages. Firstly, it forces course designers to elicit course objectives. Secondly, pupils on the test can show how far they have achieved those objectives. This in turn puts pressure on those who are responsible for the syllabus and for the selection of books and materials to ensure that these are consistent with the course objectives. Tests based on course objectives work against the perpetuation of poor teaching practice, a kind of course-content-based test, almost as if part of a conspiracy fail to do. It is the author’s belief that test content based on course objectives is much preferable, which provides more accurate information about individual and group achievement, and is likely to promote a more beneficial backwash effect on teaching. Progress achievement tests, as the name suggests, are intended to measure the progress that learners are making. Since “progress” in approaching course objectives, these tests should be related to objectives. These should make a clear progression toward the final achievement tests based on course objectives. Then if the syllabus and teaching methods are appropriate to the objectives, progress tests based on short-term objectives will fit well with what has been taught. If not, there will be pressure to create a better fit. If it is the syllabus that is at fault, it is the tester’s responsibility to make clear that it is there, that change is needed, not in the tests. In addition, more formal achievement tests need careful preparation; teachers could feel free to set their own ways to make a rough check on students’ progress to keep students on their toes. Since such tests will not form part of formal assessment procedures, their construction and scoring need not be purely towards the intermediate objectives on which a more formal progress achievement tests are based. However, they can reflect a particular “route” that an individual teacher is taking towards the achievement of objectives. Summary In this chapter, the writer has presented a brief literature review that sets the ground for the thesis. Due to the limited time and the volumn of this thesis, the writer wishes to focus only on evaluating the reliability and the validity of a chosen achievement test. Therefore, this chapter only deals with those points on which the thesis is carried out. CHAPTER 3: THE STUDY This chapter is the main part of the study. It provides practical background for the study and is an overview of English teaching, learning and testing at the University of Technology, Ho Chi Minh National University. More importantly, it presents data analysis of the chosen test and findings drawn from the analysis. 3.1 English learning and teaching at the University of Technology, Ho Chi Minh National University 3.1.1 Students and their backgrounds Students who have been learning at the University of Technology are of different levels of English because of their own background. It is common that those who are from big cities and towns have greater ability of English than those from the rural areas where foreign language learning is not paid much attention to. In addition, there are some students who have had over ten years of learning English before entering university, some have just started for few years, and others have never learned English before. Moreover, their entry into the University of Technology is rather low because they don’t have to take any entrance exams. Instead, they only apply their dossiers to be considered and evaluated. As a result, their attitude towards learning English in particular and other subjects, in general, is not very highly appreciated. 3.1.2 The English teaching staff The English section of the University of Technology is a small section with only five teachers. They take over teaching both Basic English and English for Specific Purpose (ESP) majoring in Computing. All the English teachers here have been well trained in Vietnam and none of them has studied abroad. One of them obtained Master Degree of English; three are doing an MA course. They prefer using Vietnamese in class, as they found it is easy to explain lessons in Vietnamese due to the limitation of students’ English ability. Furthermore, they are always fully aware of adapting suitable methods of teaching homogenous classes and they have been applying technology in their teaching ESP. This results in students’ high involvement in the lessons. 3.1.3 Syllabus and its objectives The English syllabus for Information Technology students is designed by teachers of the English section, the University of Technology, which has been applied for over five years. It is divided into two phases: Basic English (1) and ESP (2). Phase 1, which lasts three first semesters with 99 forty-minute periods, is covered by Lifelines series in which the students only pay attention to reading skill and grammar. Phase 2, including three final semesters with 93 forty-minute periods in total, is wholy devoted to ESP. It should be noted that the notion ESP in this context is simple the constitution of the English language and the contents for Information Technology. In Phase 2, the students work with Basic English for Computing which consists of twenty eight units providing background knowledge and vocabulary for computing. This book covers four skills such as listening, speaking, reading and writing and language focus. The reading texts in the course book are meaningful and useful to the students because it first revises their knowledge, language items and then supplies the students with background knowledge and source of vocabulary relating to their major - Information Technology. Table 3.1 illustrates how the syllabus is allocated to each semester. Table 3.1 Syllabus content allocation Semester 45-minute periods Teaching content Course book 1 33 Reading and grammar Lifelines Elementary 2 33 Reading and grammar Lifelines Elementary 3 33 Reading and grammar Lifeline Pre-Intermidiate 4 39 Reading and grammar and vocabulary Basic English for Computing 5 27 Reading and grammar and vocabulary Basic English for Computing 6 27 Reading and grammar and vocabulary Basic English for Computing The two course books used in six semesters include four skills: reading, writing, listening and speaking but reading and grammar are paid more attention to because of the objectives of the course. Table 3.2 Syllabus goal and Objectives COMMON GOAL To equip the students with the basic English grammar and general background of computing English necessary for their future career. OBJECTIVES Semester 1+2+3 - To revise students’ grammar knowledge and help them use the knowledge fluently in order to serve the next semesters Semester 4+5+6 - To supply the students with the basic knowledge and vocabulary of computing. - To consolidate students’ reading skills and instruct them how to do translation. In addtion, to help the students read, comprehend and translate English materials in computing. Applying teaching methods encounter a variety of difficulties such as students’ habits of passive learning, low motivation, big classes etc. A clear goal is set up by the teaching staff for the whole syllabus. The goal is realized by the specific objectives for each semester. 3.1.4 The course book: “Basic English for computing” The book was written by H. Glendinning, E. and McEwan, J and published in 2003 by Oxford University Press with some key features: * A topic-centred course that covers key computing functions and develops learners' competence in all four skills. * Graded specialist content combined with key grammar, functional language, and subject-specific lexis. * Simple, authentic texts and diagrams present up-to-date computing content in an accessible way. * Tasks encourage learners to combine their subject knowledge with their growing knowledge of English. * Glossary of current computing terms, abbreviations, and symbols. * Teacher's Book provides full support for the non-specialist, with background information on computing content, and answer key. The book was designed to cover all four skills and followed by language focus. However, because of the objectives of the ESP taught at the University of Technology, only reading skill and grammar are focused as mentioned so far. The detail content of the book can be found at Appendix 2. The book appears good with authentic and meaningful texts. The final achievement tests are often based closely on the content of the course book. 3.2 English testing at the University of Technology 3.2.1 Testing situation English tests for students at the University of Technology are designed by the staff of the English section. Each teacher from the staff is responsible for test items for each semester and then, all the materials will be fed into a common item bank that is controlled by a kind of software in a server. Before the examinations, the person who is in charge of preparing the tests will use the software to mix the test items in the item bank and print out the tests. All the tests are designed under the light of syllabus-content approach. All in all, the students are required to take six formal tests throughout their courses. Within the limited scope of the study, the writer would like to focus on the third-year final test or the sixth semester, which is the last test that the students have to do. Current English testing situation at the University of Technology has several worth-noting points as follows: Students are often instructed with the test format long before the actual test, which leads to the test-oriented learning. Students do not have test papers returned with feedbacks and corrections, so they hardly know which their strong and weak points are. Students can copy answers from one another during the tests in spite of examiner supervision, thus their true abilities are not always reflected. Some tests still contain basic errors such as spelling errors, extremely easy or difficult items, badly-designed appearance, etc. Test items are not pre-tested before live tests. 3.2.2 The current final third-year achievement test (English 6) General information: * Final Achievement Test, Semester 6, English 6 * Time allowance: 90 minutes * Testees: most of the third-year students at the University of Technology * Supervisors: teachers from the University of Technology. English test 6 is a syllabus-based achievement test whose content is taken from teaching points delivered in the three last semester 4, 5 and 6. The test covers a wide range of knowledge in computing, vocabulary, grammar, reading, writing and translation skills. Table 3.3 describes English Test 6 with seven parts and marking scale as below: Table 3.3 Specification of Test 6 Part Language items/ skills Input Task types Marking I Vocabulary and Grammar Sentences ´ 10, 4-option multiple choice 15 II Reading comprehension Narrative text relating to the computing, approx. 300-400 words × 5, 4-option multiple choice 25 III Reading and Vocabulary Narrative text relating to the computing, approx. 150-200 words × 10, open cloze 15 IV Writing Incomplete sentences × 5, sentence building 15 V Writing Incomplete sentences × 5, sentence transformation 15 VI English-Vietnamese translation Sentences in English 2 sentences 10 VII Vietnamese- English translation Sentences in Vietnamese 2 sentences 5 Total 100 (For the specific test, see Appendix 3) As explained above, the students are supposed to apply their reading skills, grammar and vocabulary in preparation for the final examination, so the test is aimed at accessing both knowledge and skills. In the first part of the test, the students have to perform their tasks with the background knowledge, vocabulary and language items relating to computing. Part 2 requires the students to read an ESP passage and then, choose the best option for each question. In part 3, the students have to choose words among the given ones to complete the text. It is also for testing the students’ reading comprehention and vocabulary. Part 4 and part 5 force the students to use their knowledge about grammar to make meaningful and correct sentences. Finally, the two last parts of translation are aimed at accessing the students’ general understanding regarding their vocabulary, their use of language and terminology. 3.3 Research method 3.3.1 Data collection instruments To analyse the data input to evaluate the reliability and the validity of the final achievement test, the author wishes to combine some instruments which are shown below: Formula 1 (as shown in Chapter 2) to compute the reliability coefficient. Solfware: Item and Test Analysis Program-ITEMAN for Windows Version 3.50 to analyze item difficulty and item discrimination, and to evaluate contruct validity. 3.3.2 Participants The study is heavily based on scores obtained from 127 test papers, whi

Các file đính kèm theo tài liệu này:

Thuyet minh luan van.doc