A Guide to Testing in English Language Teaching

English Language Testing

When we train as English language teachers, testing is often one of the parts of the job that doesn’t get much attention. So here’s a guide to how, why and when we test our students, and how to do it well!

Why and how do we test our students?

There are four main types of test in English Language Teaching.

1. Informal testing

Whether we realise it or not, we test our students informally all the time in the classroom. Every time we ask our students to produce or understand English, we are testing how well they can do this.

For example, you might ask your students to understand some details from a reading text. The activity you choose for this “tests” their understanding. Often, this type of test is not our main objective, but is necessary in order to progress to the next stage of the lesson, which may be extracting some grammar from the text or something else that requires them to have understood the text.

Another example of informal testing in the classroom is when we use practice activities. Our aim with these is to give our students an opportunity to practise what they have just learned. But we are also informally testing how effectively they have understood whatever language was the focus of our lesson. The results of this “test” may then help us decide how much more time we need to spend on that language point in a future lesson.

2. Progress tests

We might give our students a weekly or monthly progress test, reviewing what’s been taught over that period. This is still usually informal, but can again inform our future teaching.

The language school where we’re working might also impose their own progress tests, for example at the end of a period of 8 weeks teaching, to see if a student is ready to “move up” to the next level in whatever system of levels they have in place.

3. Diagnostic and placement tests

Whereas progress tests look back, seeing how much of what’s been covered has been taken on board, diagnostic and placement tests look forward, testing a student’s strengths and weaknesses with English, normally before a course starts.

Diagnostic tests allow us to decide what we need to cover with a student on a course by “diagnosing” weaker areas.

Placement tests have the aim of deciding where on some pre-determined scale a student should be put. A language school might have classes at levels 1-10, and a placement test decides which of these levels the student will be in.

4. Proficiency and selection tests

All of the types of test we’ve seen so far – informal, progress, diagnostic and placement tests – are for teachers (and language schools), for the benefit of helping the student they have in front of them in the best way possible.

Proficiency and selection tests, on the other hand, are for selectors. They’re all about deciding whether a particular person is selected – or not – for a particular job, course or task, or is selected to hold a certain certificate.

Proficiency tests have a commonly agreed level which is independent of any course that the test takers might be studying. Everyone takes the test at the same level of difficulty. If they gain a sufficient level, they “pass”, and if not, they don’t. Or they might be given a score on a sliding scale depending on their performance in the test.

People whose job it is to select which people to offer a job, a task or a place on a course might use these proficiency tests, or another form of selection test whose aim is to divide students into “yes” and “no”, to make their selections. Selectors deciding who to admit on to a university course in an English-speaking country is one example of how these types of tests are used.

Examples of this type of test are TOEFL, TOEIC, and Cambridge English exams.

Should we test? What are the pros and cons of testing?

We now know why we test students, but does this automatically mean that we should test them? Let’s look at some of the pros and cons of testing.

Some advantages of testing

  1. Tests can be motivating for students
    Progress tests, for example, can give students a sense of achievement, motivating them to keep on learning and take their English to the next level. A looming selection test can keep students motivated with the promise of acceptance onto a course (and the threat of non-acceptance). There’s the flip side to this as well though – see under “Disadvantages” below.
  2. Tests can be useful for the teacher
    Both informal testing in the classroom and more formal tests can help to measure the effectiveness of a teaching course, telling us what we need to teach or revise.
  3. Tests can be useful for language schools
    …and as such are usually required by them in one form or another, to bring some order to the organisation of students into classes and levels.
  4. Tests can be useful for selectors
    Selectors need some way to decide who is going to fill their 50 places on a course, or take one of 5 jobs, out of thousands of applicants.
  5. There may be cultural or educational expectations of students
    (…and in the case of young learners, their parents) to have some form of test. Testing helps meet this expectation.

Some disadvantages of testing

On the other hand…

  1. Tests can be demotivating for students
    Tests can also be demotivating, particularly if they are not perceived to be fair (more on this later on) or if they don’t reflect the needs or objectives of the test takers.
  2. More formal tests may not be “necessary” for some learners; informal testing in the classroom may be sufficient to meet their objectives.
  3. The existence of the test can affect the course being taught
    “Teaching to the test” out of a desire for our students to pass the test, rather than teaching to the needs of the learners is called the “backwash” effect.
  4. Tests are very often invalid or unreliable
    More on what this means below…

How can we ensure that we’re doing it right?

The advantages of testing that we saw above make it inevitable in one form or another in most learning settings. So how can we ensure that when we do test, we’re doing it right? How can we make sure that we’re being fair to our students when we test them?

Well, there are four things to think about:

  • Validity
  • Reliability
  • Administration
  • Testing approach / technique

Test validity

This is probably the most important thing to think about when it comes to making a test.

A test is valid if it meets two criteria:

Firstly, it is valid if it tests what we want to test and not something else. This may sound like an easy thing to get right, but sometimes when we aim to test grammar, we instead end up testing vocabulary. Or when we want to test phonology, we end up testing grammar. Sometimes, our tests end up testing memory, or something equally unnecessary, like personality.

Secondly, to be valid, the means of testing – the way we carry out the test – need to be appropriate for the aims of the test.

So what we’re really asking with validity is whether or not the test is fair. If it tests what we want to test in a way that is appropriate for the aims of the test, it is valid and fair.

Let’s have a look at a few examples to see what we mean by all this:

You’ve been using a coursebook which contains only graded reading material. You’ve used this for, among other things, extracting the gist from a reading text. You want to test your students on the progress they’ve made in extracting gist. So you decide to use an authentic text like a newspaper article, of a similar length to the graded texts in the coursebook.

This test is not valid, because the way you’ve decided to test their progress, using an authentic text, does not match with what you are testing.

You want to find out if your students can discriminate between the two sounds /ɜː/ and /ɔː/ as in “work” and “walk”. You’re trying to decide between the following two ways to test this:

A. Dictate the following sentences:
“I like to walk in the mornings.”
“I like to work in the mornings.”

Students write down 1 or 2 according to a previously established numbering.

B. Dictate this sentence:
“I walked to work yesterday.”

Students write down the whole sentence.

Option B is a valid test, whereas option A is invalid. Why?

Well, in option B, students can arrive at the answer by their knowledge of grammar and vocabulary. “I worked to walk this morning” doesn’t make sense, and so they can deduce that the correct answer must be “I walked to work”. So option B is invalid because it doesn’t test what we want to test.

With option A, though, there is no way that the students can use their knowledge of grammar or vocabulary to arrive at the correct answer, because both sentences are correct grammatically. This means that the only way they can arrive at the correct answer is by listening for the different sounds.

You want to test your students’ ability to produce a question in the simple past using “How…?” You’re trying to decide between the following questions to test this:

  1. Make a question from this sentence:
    “He went to France by car.”
  2. Make a question for the answer below:
    “… to France last year?” “I went by car.”
  3. Make a question for the answer below:
    “By car.”
  4. Make a question about the phrase underlined.
    “I went to France by car”.
  5. Write the question out in full:
    How / go / France?

Only one of these tests is valid. All of the others have the possibility of testing something other than the grammar we want to test. Let’s have a look at them in turn.

  • In number 1, we could ask about anything: “Where’s France?” / “How often does he go to France?” / “When did he go to France” and so on.
  • Number 2, at first glance, looks to be valid, but in fact there is another question we could ask which doesn’t use “How”: “Who went to France?”
  • In number 3, we could ask a question in just about any tense, for example: “How are you getting to France?”
  • Number 4 is the only valid test. It’s the only one in which we can be absolutely sure that we are testing what we want to test.
  • Number 5 gives away the word “How”.

Here’s a type of test that often gets used as a practice activity during a class:

Complete the sentences in past continuous with a verb from the list:

  walk   play   work   sleep   eat
  While I ________ tennis, it started raining.
  She arrived when I _____ breakfast.

This looks valid at first, but it is in fact a test of vocabulary. Once the student has the form of the past continuous, the grammar is just a mechanical exercise, and what we’re really testing is their vocabulary – which verb fits in the context of each sentence. Very often we see worksheets or tests like this with the first answer filled in as an example. This then becomes even less valid, as the students just have to copy the example.

Face validity

Just as important as validity is what we call face validity. Face validity asks whether the test is perceived to be fair by the students. A test may be valid, but if it is perceived as being unfair, then it doesn’t have face validity.

Let’s say you have a detailed text about a very specific subject, like R&B music or Cristiano Ronaldo. You have a set of questions related to the text which only require knowledge of a particular language point to answer (the language point you’re testing), so you know it is a valid test. However, because the subject matter is so specific, students who know nothing about R&B music, or who have no interest in or knowledge of football, may perceive it as unfair – they may perceive that students who do have this knowledge or interest have an unfair advantage.

Test reliability

The second thing to think about when we make a test is reliability. In other words, do the results vary (or could they vary) if the test is marked by different people or at different times, or if it is taken again? This is especially important if the test is taken for official purposes, for example as a selection test.

So how can we ensure that a test is reliable? Well, there are a number of different things we can do:

  1. We can make sure that the test is not too short and that it tests the same area in different ways.
  2. We can ask a few native speakers to do the test – their answers should be the same.
  3. We can pilot the test, assess what is not clear, what questions students have and how long the test takes.
  4. We can have the test marked by several different people.

We can also choose different types of test, and we’ll see more on this under “Testing approach/technique” below.

Test administration

The third thing to think about when making a test is how easy it is to administer. By “administer” we mean to set up and actually run the test. Do you have adequate sound in the room for a listening test? If you are giving a speaking test, do you have enough time for everyone to take it individually with you? Do you have the resources required to mark the test?

So in other words, we need to make sure that the test is suitable for the context in which it will be used. The “cost” of administering a test (in terms of time and organisation) increases the more complex you make the test, so we normally need to find some kind of balance.

Testing approach / technique

The fourth thing to think about when making a test is the approach or technique that we choose. There are two main approaches to tests: objective and subjective.

Objective and subjective tests

Every test that we give our students is either objective, subjective or somewhere in between.

  • Fully objective tests allow for only one correct answer (we call these discrete item tests and we’ll see some examples later).
  • Subjective tests are those which are more communicative, for example writing a paragraph or an essay about a subject, or speaking about a subject. Communicative tests like these are subjective because they allow for a wide range of language to be used. They say “show me what you can do with the language”.

Let’s have a look at some of the advantages and disadvantages of objective and subjective tests, bearing in mind what we said about validity, reliability and administration above.

Have a look at these examples which are designed to test language for giving advice:

Test A:
Complete the sentence with a suitable word or phrase.
A: “Adam failed his driving test, you know.”
B: “Well, it’s his own fault. He ………………………… harder.”
Test B:
Read the extract from a letter below and follow the instructions.

“My English friend, Michael, had his first date with a Spanish girl, Carla, last Saturday. He played football in the afternoon and then went to a pub, so he didn’t have time to take a shower. He just brushed his teeth, changed his shirt and rushed to the cinema. He was 20 minutes late. Michael doesn’t speak Spanish and the film he had chosen turned out to be a violent war movie. After twenty minutes Carla asked to leave and Robert had an argument with her in the cinema. He made so much noise that the manager came and asked them to leave. Anyway, it was still early and Carla was very patient so they went to a restaurant, and at the end of the meal, Michael asked if she would pay half the bill. When they got outside, he said he wanted to go to the pub to meet his friends, and could she find her own way home? He called her yesterday but she said she never wanted to see him again.”

Although it’s too late now, what advice would you give Michael about last Saturday?

Test A is more objective (although not completely – several different answers are possible: “He should have worked harder” / “He’d better work harder” / “He could have worked harder”…).

Test B is more subjective.

So which is better? Well, there are pros and cons to each.

  1. Test A is quicker, both to take and to mark, and it tests one particular structure. But it doesn’t really give the student the opportunity to show what she is capable of.
  2. Test B says “show me what you can do with the language” and so is closer to what happens with language in real life, but the student may be able to avoid using the language that you “want” them to use.
  3. It would be possible for a student to pass Test A through knowing a structure for giving advice, but to fail Test B, because Test A doesn’t necessarily show understanding of a real life language situation, whereas Test B requires this.
  4. Test B could be less reliable (more on this below)
  5. Test B is more time-consuming and expensive to mark so poses more problems in terms of administration.

With these pros and cons of each, the choice of which to use comes down to the reason for testing and whether it’s possible to meet the challenges of validity, reliability and administration that we talked about above. Let’s look at one of these challenges with subjective tests.

Subjective tests and reliability

If we use a subjective test, reliability becomes more of a challenge. The range of language used is more open to the interpretation of the marker.

So how can we make subjective tests more reliable?

Well, we need to have clearly defined criteria for marking them. Very often, the criteria we use to evaluate communicative tests are quite limited and systemic. For example, when marking a piece of writing, without clear criteria, a marker might just focus on accuracy of grammar, or spelling, or correct choice of vocabulary. But if we just use these very limited systemic criteria, it can defeat the object of doing a communicative test in the first place.

So we can normally do better with the criteria we choose. Below are some possible marking criteria you can use to help with reliability of subjective tests. Some of these criteria may be more or less appropriate to use in different tests and contexts.

Possible criteria for subjective tests

  1. Accuracy
    As we just mentioned – to what extent has the student mastered correct usage, and how correct is the information she takes from a text or presents in a task?
  2. Appropriacy
    Has the student used language appropriate to the context you’re asking them to communicate in, meeting the expectations of the listener or reader?
  3. Independence
    How independent is the student from any reference sources or questioning of the tester?
  4. Repetition
    How much does the student need to re-read a text, or ask for something spoken to be repeated?
  5. Hesitation
    How much of a delay is there in a student starting a task and how much hesitation is there when performing it?
  6. Flexibility
    How well does the student adapt to switches in the features of a task?
  7. Speed
    How quickly does the student perform the task?
  8. Range and complexity
    Does the student use a variety of skills, functions, tones and styles of presentation in a speaking or writing task? How well does the student cope with this variety in a reading or listening task?
  9. Size
    How large is the text that the student produces or is asked to comprehend?

Types of objective test

If we choose an objective test, either for a formal test or for a practice activity in the classroom, we have quite a few different types we can choose from.

Here are some that you can use, depending on the language point that you’re testing. Most of these are discrete item tests (tests which have only one correct answer) and so are fully objective. With the last two, different answers are possible, but they are still fairly objective.

  1. Transformation
    These types of test ask the test taker to transform language into another form.

    He’s a slow swimmer.
    He swims ……….

    Change the word in capitals to fit the sentence given.
    There were a lot of ……………….. for the job. (APPLY)

    Complete the second sentence so that it has the same meaning as the first.
    “I’m thirsty”, Julie said.
    Julie said ………………..

  2. Insertion
    These ask the test taker to add a word or phrase into a sentence.

    Put the word in capitals into the right place in the sentence:
    She lives in an old farmhouse. (HUGE)
  3. Combination
    Join the sentences using the word in capitals:
    He had a cold. He sent swimming. (ALTHOUGH)
  4. Rearranging
    Jumbled words, sentences or paragraphs that students have to put in the correct order.
  5. Matching functions
    Match the sentence with when you use it:

    Could I come in there? Persuading
    Is there any way I could…? Generalising
    They tend to be short. Interrupting
  6. Split-sentence matching
    Match the halves of sentences below:

    Would you mind my close the window?
    Would you mind if closing the window?
    Could I I close the window?
  7. Skeleton sentences
    Make a sentence from the words below:
    This picture / paint / Picasso / long time ago
  8. Error recognition
    Which part of the sentence is wrong?

    I’m worried that he’ll feel angry to me.
    A B C D
  9. Sentence completion
    My bedroom would be all right if ………………..
    He’ll be a good tennis player if ………………..
  10. Situations
    You want a day off. How would you ask:
    a) your boss?
    b) a colleague who works with you in the office?

Depending on where and whom you teach, testing may make up a very large or only a small part of your job. However much you do testing though, it’s important, for our students’ sake, to do it right. I hope this article helps point you in the right direction.

If you’re looking for more guidance or information about English language testing and assessment, the English Language Testing Society is working to develop understanding of guidelines and best practices, and can be a valuable resource.

Written by Keith Taylor
Keith is the founder of Eslbase. He has been an English teacher and teacher trainer for over 15 years.

Leave a comment

Your email address will not be published.