As defined earlier, traditional assessment generally refers to written testing, such as multiple choice, matching, true/false, fill in the blank, etc. Learners typically complete written assessments within a specified time. There is a single, correct response for each item. The assessment, or test, assumes that all learners should learn the same thing, and relies on rote memorization of facts. Responses are often machine scored and offer little opportunity for a demonstration of the thought processes characteristic of critical thinking skills.

Traditional assessment lends itself to instructor centered teaching styles. The instructor teaches the material at a low level, and the measure of performance is limited. In traditional assessment, fairly simple grading matrices such as shown in Figure 1 are used. Due to this approach, a satisfactory grade for one lesson may not reflect a learner’s ability to apply knowledge in a different situation.

Traditional Assessment
Figure 1. Traditional grading

Still, tests of this nature do have a place in the assessment hierarchy. Multiple choice, supply type, and other such tests are useful in assessing the learner’s grasp of information, concepts, terms, processes, and rules—factual knowledge that forms the foundation needed for the learner to advance to higher levels of learning.

Characteristics of a Good Written Assessment (Test)

Whether or not an instructor designs his or her own tests or uses commercially available test banks, it is important to know the components of an effective test. (Note: This section is intended to introduce basic concepts of written-test design. Please see Developing a Test Item Bank post for testing and test-writing publications.)A test is a set of questions, problems, or exercises intended to determine whether the learner possesses a particular knowledge or skill. A test can consist of just one test item, but it usually consists of a number of test items. A test item measures a single objective and calls for a single response. The test could be as simple as the correct answer to an essay question or as complex as completing a knowledge or practical test. Regardless of the underlying purpose, effective tests share certain characteristics. [Figure 2]

Traditional Assessment
Figure 2. Effective tests have six primary characteristics

Reliability is the degree to which test results are consistent with repeated measurements. If identical measurements are obtained every time a certain instrument is applied to a certain dimension, the instrument is considered reliable. The reliability of a written test is judged by whether it gives consistent measurement to a particular individual or group. Keep in mind, though, that knowledge, skills, and understanding can improve with subsequent attempts at taking the same test, because the first test serves as a learning device.

Validity is the extent to which a test measures what it is supposed to measure, and it is the most important consideration in test evaluation. The instructor must carefully consider whether the test actually measures what it is supposed to measure. To estimate validity, several instructors read the test critically and consider its content relative to the stated objectives of the instruction. Items that do not pertain directly to the objectives of the course should be modified or eliminated.

Usability refers to the functionality of tests. A usable written test is easy to give if it is printed in a type size large enough for learners to read easily. The wording of both the directions for taking the test and of the test items needs to be clear and concise. Graphics, charts, and illustrations appropriate to the test items must be clearly drawn, and the test should be easily graded.

Objectivity describes singleness of scoring of a test. Essay questions provide an example of this principle. It is nearly impossible to prevent an instructor’s own knowledge and experience in the subject area, writing style, or grammar from affecting the grade awarded. Selection-type test items, such as true/false or multiple choice, are much easier to grade objectively.Comprehensiveness is the degree to which a test measures the overall objectives. Suppose, for example, an AMT wants to measure the compression of an aircraft engine. Measuring compression on a single cylinder would not provide an indication of the entire engine. Similarly, a written test must sample an appropriate cross-section of the objectives of instruction. The instructor makes certain the evaluation includes a representative and comprehensive sampling of the objectives of the course.

Discrimination is the degree to which a test distinguishes the difference between learners and may be appropriate for assessment of academic achievement. However, minimum standards are far more important in assessments leading to pilot certification. If necessary for classroom evaluation of academic achievement, a test must measure small differences in achievement in relation to the objectives of the course. A test designed for discrimination contains:

  1. A wide range of scores
  2. All levels of difficulty
  3. Items that distinguish between learners with differing levels of achievement of the course objectives

Please see Developing a Test Item Bank for information on the advantages and disadvantages of multiple choice, supply type, and other written assessment instruments, as well as guidance on creating effective test items.