Luận văn Đánh giá độ tin cậy của bài thi trắc nghiệm thứ nhất trên máy tính cuối kỳ 4 dành cho sinh viên năm thứ hai không chuyên ngành tiếng anh trờng đại học kinh doanh và công nghệ Hà Nội

TABLE OF CONTENT

CANDIDATE’S STATEMENT i

ACKNOWLEDGEMENT ii

ABSTRACT iii

LIST OF ABBREVIATION iv

LIST OF TABLES AND CHARTS v

TABLE OF CONTENT vi

Chapter 1: INTRODUCTION 1

1.1. Rationale for the study 1

1.2. Aims and research questions 2

1.3. Theoretical and practical significance of the study 2

1.4. Scope of the study 2

1.5. Method of the study 2

1.6. Organization of the paper 3

Chapter 2: LITERATURE REVIEW 4

2.1. Language testing 4

2.1.1. What is a language test? 4

2.1.2. The purposes of language tests 4

2.1.3. Types of language tests 5

2.1.4. Criteria of a good language test 5

2.2. Achievement test 6

2.2.1. Definition 6

2.2.2. Types of achievement test 6

2.2.3. Considerations in final achievement test construction 7

2.3. MCQs test 7

2.3.1. Definition 7

2.3.2. Benefits of MCQs test 8

2.3.3. Limitations of MCQs test 10

2.3.4. Principles on designing a good MCQs test 11

2.4. Reliability of a test 11

2.4.1. Definition 11

2.4.2. Methods for test reliability estimate 12

2.4.3. Measures to improve test reliability 15

2.5. Summary 15

Chapter 3: The Context of the Study 16

3.1. The current English learning, teaching and testing situation at HUBT 16

3.2. The course objectives, syllabus and materials used for the second non-majors of English in Semester 4. 17

3.2.1. The course objectives 17

3.2.2. Business English syllabus 17

3.2.3. The course book 19

3.2.4. Specification grid for the final achievement Computer-based MCQs test in Semester 4. 19

Chapter 4: Methodology 21

4.1. Participants 21

4.2. Data collection instruments 21

4.3. Data collection procedure 21

4.4. Data analysis procedure 22

Chapter 5: RESULTS AND DISCUSSIONS 23

5.1. The compatibility of the objectives, content and skill weight format of the final achievement computer-based MCQ test 1 for 4th semester with the course objectives and the syllabus 23

5.1.1 The test objectives and the course objectives 23

5.1.2. The test item content in four sections and the syllabus content 24

5.1.3. The skill weight format in the test and the syllabus 26

5.2. The reliability of the final achievement test 27

5.2.1. Reliability coefficient 27

5.2.2. Item difficulty and discrimination value 27

5.3. The attitude of students towards the MCQs test 1 29

5.4. Pedagogical implications and suggestions on improvements of the existing final achievement computer-based MCQs test 1 for the non-English majors at HUBT. 34

5.5. Summary 38

Chapter 6: CONCLUSION 39

6.1. Summary of the findings 39

6.2. Limitations of the study 40

6.3. Suggestions for further study 40

REFERENCES 41

APPENDICES I

APPENDIX 1

Grammar, Reading, Vocabulary and Functional language check list II

APPENDIX 2

Survey questionnaire (for students at HUBT) IV

APPENDIX 3

Students’ test scores VII

APPENDIX 4

Item analysis of the final achievement computer-based MCQs test 1- 150 items, 349 examinees XII

APPENDIX 5

Item indices of the final achievement computer-based MCQs test 1 XVII

73 trang | Chia sẻ: maiphuongdc | Lượt xem: 2418 | Lượt tải: 1

Bạn đang xem trước 20 trang tài liệu Luận văn Đánh giá độ tin cậy của bài thi trắc nghiệm thứ nhất trên máy tính cuối kỳ 4 dành cho sinh viên năm thứ hai không chuyên ngành tiếng anh trờng đại học kinh doanh và công nghệ Hà Nội, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên

220 8 Starting up- Vocabulary C.B- 70-71 P.F- 32 4 220 8 Listening- Reading C.B- 72-73 220 8 Language review- Skills C.B. 73-75 P.F- 33 5 220 8 Case study C.B- 76-77 P.F. 34-35 220 8 Text bank- talk business T.B. 128-129/ P.F. 68-69 Grammar review correction 6 220 9 Starting up- Listening C.B. 78-79 220 9 Vocabulary- Reading C.B. 80-81 P.F- 36 7 220 9 Language review- skills C.B- 82-83 220 9 Case study C.B. 84-85 P.F- 38-39 8 220 9 Text bank- Talk business T.B- 130-131/ P.F- 70-71 Grammar Review correction 220 C Revision C.B- 86-89 9 220 Written test Note: C.B: Course book; P.F: Practice file; T.B: Teacher’s book Table 3: The syllabus for 4th semester (for non –English majors) Time allocation for language skills and sections is illustrated as follows: Skills Class numbers ( period ) Percentage (%) Listening 16.5 22% Speaking 19.5. 26% (13% for practicing functional language) Reading 13.5 18% Writing 6 8% Grammar 10.5 14% Vocabulary 9 12% Table 4: Time allocation for language skills and sections 3.2.3. Course book The course book in semester 4 for the second year students at HUBT is Market Leader Pre-intermediate which was written by Davis Cotton, David Falvey and Simon Kent and published in 2002 by Longman. These books mainly focus on three skills: speaking, listening and reading. It does not put a great emphasis on grammar. The book is divided into 12 units and closely interrelated but each with a slightly different emphasis. The pattern including starting up-Vocabulary-Listening-Reading-Language review-Skills-Case study is the same for all units. In the fourth semester, students study the last six units of this book (Unit 7-Unit 12) The course book check lists necessary for examining the task and content in the course book used for construction of the achievement computer-based MCQs test 1 is in Appendix 1. 3.2.4. Specification grid and scoring scale for the final achievement Computer-based MCQs test 1 in Semester 4. In order to evaluate students’ achievement, the following grid is used to design achievement test 1 Part Main skill Input Item type Number of mark Skill weighting 1 Vocabulary Incomplete sentences, approx. 18 words 50 x; 4 multiple choice 4 33% 2 Grammar Incomplete sentences, approx. 18 words 50 x; 4 multiple choice 3 33% 3 Reading Narrative or factual test, approx. 60 words 30 x; 4 multiple choice 1.67 20% 4 Functional language Short sentences, approx. 16 words 20 x; 4 multiple choice 1.33 14% Table 5: Specification grid for the final computer-based MCQs test 1 The scoring scale for the test is designed by the teachers in HUBT and includes two levels as follows: Pass: For students who can get 50% of the whole test Fail: For students who get below 50% of the whole test. Chapter 4: Methodology 4.1. Participants The first subjects who participated in this study include 349 second year students from 14 classes. Their test scores were collected for the purpose of analyzing and computing the internal consistency reliability, item difficulty, and item discriminability. The second subjects who took part in answering a questionnaire include 236 second year non-English majors. Their responses to 14 questions were analyzed in order to investigate the students’ attitude towards the final achievement MCQs test 1. 4.2. Data collection instruments The following instruments were adopted to obtain information for the study: - Kuder-Richardson Formula 20 for internal consistency reliability estimate - Item difficulty and item discrimination formulae mentioned in section 2.4.2. - A questionnaire survey for students (see Appendix 2) The questionnaires were designed on the basis of Henning’s list of threats to reliability of a test (1987). The objective is to find out students’ attitude towards the reliability of the current achievement MCQs test 1 in the 4th term . The questionnaires included 14 items and were in Vietnamese to make sure the informants understood the questions appropriately (see Appendix 2). These items focus on the characteristics of the test, test administration and test-takers. 4.3. Data collection procedure The data about test objectives and the course objectives were elicited through English Department Bulletin, HUBT enacted in 2003. The data about the syllabus content were collected through the syllabus for the second year students. The data about the test content and test format were obtained through a copy of the official current test from English Department. The data about the students’ test scores and items responses were obtained from a file containing both the students’ score and responses on the test provided by Informatics Department, HUBT. The data about the results of questionnaire were collected from 236 second year students who were randomly selected one week after they have finished the final achievement test 1. 4.4. Data analysis procedure First, the comparison between the test objectives and the course objectives, the test content and the syllabus content, and skill weight in the test format and the syllabus was made in order to determine if they are compatible with each other. Second, reliability coefficient, item difficulty and item discrimination indices of the MCQs test 1 were analyzed in order to determine the extent to which the final achievement test 1 is reliable Finally, analysis of students’ responses on the questionnaire was made in order to find out students’ attitude towards the MCQs test given to them. Chapter 5: Results and Discussions 5.1. The compatibility of the objectives, content and skill weight format of the final achievement computer-based MCQ test 1 for 4th semester with the course objectives and the syllabus 5.1.1 The test objectives and the course objectives As mentioned in section 3.2.1, the course is mainly targeted to further develop students’ essential business communication skill of speaking such as making presentations, taking parts in meetings, negotiating, telephoning and using English in social situation. Through a lot of interesting discussion activities, students will build up their confidence in using English and improve their fluency. The course is also aimed at developing students’ listening skill such as listening for information and note-taking. In addition, it provides students with important new words and phrases and increases their business vocabulary. Students’ skill of reading will be also built up through authentic articles on a variety of topic on business. The course also helps students to revise and consolidate basic grammar, to improve their pronunciation and to perform some writing tasks on business letter and memorandum. The MCQs test 1 is designed to check what students have learnt about vocabulary, grammar, reading topics and functional language in Unit 7,8,9 of Market Leader Pre-. It is also constructed to assess students’ achievement at the end of the course, especially to evaluate students’ results after completing these 3 units. Particularly, vocabulary and grammar section making up of 100 items are aimed at examining the amount of vocabulary and grammar that students have been instructed. Reading section of 30 items is to measure students’ reading skill on business topics such as marketing, planning and managing. Functional language sections of 20 items is to measure students’ ability of communicating in daily business situations. Obviously, the objectives of the course and of the MCQs test 1 are partially compatible with each other. That is to say, the course provides students with knowledge about vocabulary, grammar and functional language and develop students’ reading skills and the MCQs test 1 is designed to measure students’ ability of these knowledge and skills. However, the difference is that the course objectives are targeted to develop both receptive and productive skills for students whereas the test merely focuses on students’ receptive skill of reading and examines students’ ability of knowledge recognition rather than language production. 5.1.2. The test item content in four sections and the syllabus content * Grammar section The grammar items in the test are shown clearly and specifically in the table below. Grammar items Numbers of tested items Percentage of tested items 1 Future expressions 15 30 2 Prepositions 10 20 3 Wh-question forms 8 16 4 Verb tense and verb form 8 16 5 Reported speech 5 10 6 Connectors 2 4 7 Adjective comparatives 2 4 Table 6: Main points in the grammar section Compared to the grammar checklist (see Appendix 1), it can be seen that test items in this section generally cover grammar items in the course book such as question forms, future time expression and reported speech. However the total proportion of these items only makes up of 56%, a little higher than the total percentage of items which are not targeted at in grammar part of the syllabus such as prepositions, connectors, comparatives, verb tense and verb form.. * Vocabulary section The table below shows the allocation of test items according to the topics of vocabulary included in the textbook. Vocabulary Numbers of tested items Percentage of tested items 1 Noun- noun collocation (Marketing terms) 15 30 2 Verb- noun collocation (ways to plan ) 13 26 3 Verb- preposition collocation (ways to manage) 12 24 4 Other economic terms definitions 3 6 5 Verbs showing trends 3 6 6 Multi-word verbs 3 6 7 Adjective related to profits 1 2 Table7: Main points in the vocabulary section In comparison with vocabulary checklist (see Appendix 1), it can be recognized that test items in vocabulary section of the test are of 80% the same as vocabulary items in the course book. That is to say, the test items stick to what students have learnt such as noun-noun collocation relating to marketing terms, verb-noun collocation relating to ways to plan and verb-preposition collocation relating to ways to manage. Nevertheless, there are also items such as verbs showing trends, multi-word verbs and adjective related to profits which do no include in vocabulary part of the syllabus but in reading articles in Unit 7, 8,9. * Reading comprehension section In this section, there are 30 extracts of which main topics are shown as follows: Extract Topic Numbers of tested items Percentage of tested items 1 Coaching new employees 6 20 2 Company profile 5 16.7 3 Company song 4 13.3 4 Town planning 4 13.3 5 Time managing 3 10 6 Planning for tourism 3 10 7 The role of Public Relation department 3 10 8 The role of Marketing department 2 6.7 Table 8: Topics in reading section By comparing the reading section with the reading checklist (see Appendix 1), it can be observed that the topics in the MCQs test 1 such as managing, marketing and planning are highly relevant to the ones that the students have already learnt. * Functional language section This section includes 20 items of business situations. The function of language in these situations is presented in the following table: Item Numbers of tested items Percentage of tested items 1 Clarifying 5 25 2 Making suggestions 4 20 3 Checking information 3 15 4 Asking for opinions 3 15 5 Finishing conversation 2 10 6 Giving opinion 2 10 7 Saying goodbye 1 5 Table 9: Items in the functional language sections To bring Table 9 into comparison with functional language checklist (see Appendix 1), it can be obviously realized that all test items broadly cover what the students have already been taught in business situations (for example, telephoning, meeting and socializing & entertaining). However, there is a lack of language items of interruption and making excuses although they are focal points in the syllabus. To sum up, with regard to the content, items in four sections of the MCQs test 1 is generally to large extent relevant to the course book. 5.1.3. The skill weight format in the test and the syllabus According to skill weight format in the syllabus illustrated in Table 4- section 3.2.2, among four parts including reading, vocabulary, grammar and functional language, reading has the highest proportion of skill weight (18%) and ranks number 1. Grammar ranks number 2 with the skill weight percentage of 14. Functional language ranks number 3 with the rate of 13% and vocabulary is at the bottom with the proportion of 12%. However, in the test specification grid, skill weighting for four sections is not in the same rank as in the syllabus. Vocabulary and grammar section, with the number of 50 tests items for each hold the same rank – number 1 whereas the rank of reading (30 test items) and functional language (20 test items) is number 3 and 4 respectively. Thus it can be seen that in the MCQs test 1, the rank of reading section is changed from number 1 to number 3 and vocabulary section changed from rank 4 to rank 1. From the detailed findings presented above, we can realize that the MCQs test 1 objectives are partially compatible with the course objectives. Also, the skill weight format of the MCQs test 1 is partially similar to the skill weight format in the syllabus. Only the content of the MCQs test 1 nearly reflects all the course book content. It thus might be concluded that the MCQs test 1 is to a certain degree related to the teaching content and objectives. 5.2. The reliability of the final achievement test 5.2.1. Reliability coefficient The results we get from test scores are demonstrated as follows: Mean 6.59 The variance of the test score 2.12 Standard deviation (s.d) 1.46 The sum of the variance of all items (∑pq) 33 Reliability coefficient -14.6 Table 10: Test reliability coefficient As stated in chapter 2, the typical reliability coefficient for MCQs tests is >= 0.8 and the closer it gets to 1.0, the better it is. However, the reliability coefficient of the MCQs test 1 here is too low in comparison with the desirable one. 5.2.2. Item difficulty and discrimination value Difficulty and discriminability value for each of the 150 tested items were illustrated in Appendix 5. * Item difficulty value Among 150 items, there are 54 items of which p value is bigger than 0.67, making up of 36% of the total test items while there are no items with p value smaller than 0.33 (see Appendix 5). That means 64% of test items have acceptable difficulty level, 36% of test items are too easy and 0% of test items is too difficult. In addition, the MCQs test 1 merely obtained the average p value of 0.55 (see Appendix 3) The following table illustrates p-value for items in 4 sections of the MCQs test 1: Section Number of items with acceptable p-value Number of items without acceptable p-value Vocabulary 41 9 Grammar 25 25 Reading 29 1 Functional language 1 19 Table 11: p-value of items in 4 sections Table 10 shows that half of test items in grammar section and especially 95% of test items in functional language section are too easy. It appears that the MCQs test 1 includes too many too easy items, especially in functional language section. Besides, this test as a whole does not have a range of items with a desirable average p-value of 0.55. Accordingly, items with undesirable difficulty index in this test might reduce the test reliability. * Item discrimination value Among 150 items there are 76 items of which discrimination values are acceptable (>=0.67). The others are non-discriminating (see Appendix 5). The following table demonstrates discrimination value for items in 4 sections of the MCQs test 1: Section Number of items with acceptable discrimination value Number of items without acceptable discrimination value Vocabulary 21 29 Grammar 29 21 Reading 25 5 Functional Language 1 19 Table 12: Discrimination value of items in 4 sections Table 11 proves that 95% of items in functional language section are mostly non-discriminating. Roughly half of items in vocabulary and grammar section are also not of good discrimination value. Only items in reading sections can discriminate students well. Thus it can be inferred that item discriminability of MCQs 1 is not as good as expected. * The number of items with both acceptable p-value and discrimination value is 68, making up of 45.3% of the whole test (see Appendix 5). That might be understood that only 45.3% of test items have good quality. The number of test items with acceptable p-value and discrimination value in 4 sections of the MCQs test 1can be shown as follows: Section No of items with acceptable p-value and discrimination value Vocabulary 27 Grammar 15 Reading 25 Functional Language 1 Table 13: Number of test items with acceptable p-value and discrimination value in 4 sections From this table we can see that items in reading section have the best quality as they satisfy the requirement for p-value and discrimination value. Then come the items in vocabulary section. The items in grammar, and especially in functional language section are undesirable since they are too easy and non-discriminating. In brief, the findings show that the MCQs test 1 to large extent lacks reliability for two reasons. First, the reliability coefficient of this test is too far from a desirable reliability coefficient of an MCQs test. Second, more than half of test items (54.7%) do not have good p-value and discrimination value. 5.3. The attitude of students towards the MCQs test 1 The survey questionnaires were delivered to 236 second year non-English majors, but only 218 papers were collected. The following are the results: In order to find out students’ perception about the test content, the author asked students whether the content in 4 sections of the test was relevant to what they had learned. The result is shown in the chart below: Chart 1: Students’ response on test content Among four sections, functional language was perceived as the most relevant with the total proportion of 65%. Reading section was claimed to be the least relevant (31% only). Vocabulary was said to be a little more relevant to the syllabus than grammar (59% compared with 52%). Giving opinions on the test length, three fourths of the students (75%) found that the total number of 150 multi choice items was reasonable for them. 25% of them thought that it was too many. Answering the question whether the test as a whole had power to discriminate among students in the ability of interest, approximately 36 % of students determined that test items actually discriminate the student level of English. The rest of 64% claimed the level of discrimination was not remarkable. The result can be seen clearly in the following pie chart: Discrimination value 36% 64% high low Chart 2: Students’ response on item discrimination value In the fourth question, students were asked if they had enough time to fulfill the tasks given in the achievement test1. The following chart illustrates the result: Time length 84% 9% 7% enough not enough too much Chart 3: Students’ response on time length By observing the result in Chart 3, we realize that roughly 84% of students answered time management was not a problem for them. 7% of responses showed that time allowance was too much while 9% said that they needed more time to finish the tasks. Regarding the clarify of the test instruction, 90 % of student stated that the instruction was clear. Only 10% of them perceived it was quite unclear. When being asked about the influence of test supervision on the test result, 98% of students commented that test supervisors were strict. Only 2% of them acknowledged that they were under no very strict supervision. Students were also asked whether testing room affected their performance. 40 % of them claimed that the testing room did have impact on their test performance. 60% stated they were not affected. Responding to the question whether they experienced computer breakdown when doing the test and whether their test results were affected, a third of informants stated that they did and had to do the test again. 77% of them found it have a very negative influence on their test performance. The rest of 23% saw no impact. When being asked if they suffered from physical and emotional pressure when performing tasks, 45% of students admitted they did while 55% of them did not. With reference to test-taking behavior, 56% the informants responded that they did select the answers arbitrarily whereas 44% did not. The result was illustrated in the chart below. : Response Arbitrariness 56% 44% Yes No Chart 4: Students’ response arbitrariness Answering the question about prior exposure to the test format and content, 97 % of students realized that they were familiar with this type of test. And only 3% were not. This can be explained that they were the second year students and have done a number of tests. Concerning students’ computer skill, 61% of students claimed that they were good at using computer to do the test. 38% thought their skill was normal. Only 1% stated that they were not good at it. When asked whether any difference between doing the test with hard copy and soft copy exists, amazingly 50 % of the participants found it different and 50% did not although they were the second year student and experienced four times doing MCQs English test on computers. In the last question, students were asked whether the test scores reflected their actual achievement during the 4th semester. The result was presented in the following pie chart: test score and students' achievement 66% 34% exactly not exactly Chart 5: Students’ response on relation between test score and their achievement As it can be seen from Chart 5, 66% of students acknowledged that the test score actually reflected their achievement while 34% of them did not get the score as expected. From these results we can realize some points as follows: - Factors which do not affect students’ scores include students’ computer skill, students’ familiarity with the test format and content, test supervision, clarity of test instruction, and time allowance. - Factors affecting students’ test performance involve test characteristics, testee characteristics and test administration characteristic. Test characteristics include a large number of test items, low content relevance to the course book and low discrimination power. Testee characteristics consist of response arbitrariness, suffering from pressure and bad ability of reading texts on the screen. Test administration characteristic involves computer breakdown. Clearly when performing tasks, students were heavily influenced by both objective and subjective factors and therefore the results they got did not reflect their true ability as 34% of them claimed. In short, the test scores do not seem reliable from students’ perspectives. That is because students’ performances on the test were affected by a number of both objective and subjective factors. All of the findings to three research questions mentioned above lead to a conclusion that the MCQs test 1 does not yeild a reliable result. The unreliability of the test resulted from the performance of both test-takers and test-designers. As for test designers, they made the test of low quality. The allocation of items with difficulty among four sections was not reasonable. The items were also not really discriminating. As for test-takers, they did not perform the tasks well. Notably, according to the findings obtained from the comparison and analysis of test item content, there is high relevance between the test and the course book, especially in reading section. However, the findings from the questionnaire survey for students show that the test content is not actually relevant to what students have been taught, especially reading part. It is likely that the flunctuation from students when doing the test such as pressure, difficulty in reading texts on computers and response arbitrariness made them believe that the content of the test was generally 50% relevant to what they have learnt and their test scores does not reflect their true ability. Regarding to all aspects, the MCQs test 1 has one good point. That is, it is valid in terms of content. Nevertheless, this point is not enough to conclude that it is a good test as it lacks reliability. 5.4. Pedagogical implications and suggestions on improvements of the existing final achievement computer-based MCQs test 1 for the non-English majors at HUBT. In this section, some suggestions for test-designers are offered to improve the quality of the final achievement MCQs test 1. A good achievement test must be valid and reliable. In order to make a more valid achievement test, test des

Các file đính kèm theo tài liệu này:

Luan van Nguyen Thi Viet Ha K14A.doc