How Do You Marry Language Arts Instruction Best Practice With State Testing

Abstract

This chapter focuses on key ideas for understanding literacy assessment to assist with educational decisions. Included is an overview of dissimilar literacy assessments, along with common assessment procedures used in schools and applications of cess practices to support effective teaching. Readers of the chapter volition gain an understanding of different types of assessments, how assessment techniques are used in schools, and how assessment results can inform teaching.

Learning Objectives

After reading this affiliate, readers volition be able to

explain how testing fits into the larger category of assessment;
draw unlike literacy assessments and how they are commonly used in schools;
discuss why cess findings are judged based on their validity for answering educational questions and making decisions;
explain the importance of reliability and validity of test scores and why psychometric properties are important for interpreting certain types of assessment results;
critique literacy assessments in terms of how they can be used or misused.

Introduction

When the topic of educational assessment is brought up, almost educators immediately think of high-stakes tests used to guess students' progress in coming together a ready of educational standards. Information technology makes sense that much of the dialogue apropos educational assessment centers on high-stakes testing because information technology is this kind of cess that is well-nigh controversial in the American education organisation, especially since the vast majority of states have adopted theMutual Core Country Standards for English language Linguistic communication Arts & Literacy in History/Social Studies, Science, and Technical Subjects (CCSS; National Governors Clan Center for Best Practices & Council of Main State School Officers [NGA & CCSSO], 2010), along with loftier stakes tests intended to assess students' proficiency in meeting them. Only loftier-stakes tests are actually just a fraction of cess procedures used in schools, and many other assessments are as important in influencing instructional decisions. This chapter discusses a wide telescopic of literacy assessments commonly used in kindergarten through twelfth form classrooms, forth with ways to use results to make educational decisions.

Literacy Assessment

To understand literacy assessment, we beginning need to think well-nigh the term "literacy," which is discussed throughout the capacity in this textbook. Literacy has traditionally been regarded every bit having to practice with the ability to read and write. More recently, literacy has evolved to encompass multidimensional abilities such as listening, speaking, viewing, and performing (NGA & CCSSO, 2010), along with cultural and societal factors (Snow, 2002) that can facilitate or constrain literacy development. This multidimensional definition of literacy requires educators and policy makers to anticipate literacy in complex means. Controversies arise when the richness of literacy is overly simplified by assessments that are not multidimensional or authentic, such as the overuse of multiple-choice questions. Educators may discover the lack of authenticity of these assessments frustrating when results do not appear to represent what their students know and can do. On the other hand, more authentic assessment methods, such equally observing students who are deliberating the meaning of texts during group discussions, do not precisely measure literacy skills, which tin can limit the kinds of decisions that tin can be fabricated.

Even though the assessment of literacy using multiple choice items versus more than authentic procedures seems like opposites, they do accept an important feature in common: they both tin can provide answers to educational questions. Whether ane approach is more than valuable than the other, or whether both are needed, depends entirely on the kind of questions beingness asked. So if someone asks you lot if a multiple pick test is a skilful examination or if observing a student'south reading is a better assessment procedure, your answer will depend on many different factors, such every bit the purpose of the assessment, along with the quality of the cess tool, the skills of the person who is using information technology, and the educational decisions needing to be fabricated. This chapter will help you learn more almost how to brand decisions nearly using literacy assessments and how to use them to amend educational activity and learning.

Taxonomy of Literacy Assessments

To empathise the purposes of dissimilar types of literacy assessment, it is helpful to categorize them based on their purposes. It should be noted that there is much more research on the assessment of reading compared to cess of other literacy skills, making examples in the affiliate somewhat weighted toward reading assessments. Examples of assessments not limited to reading have also been included, where advisable, as a reminder that literacy includes reading, writing, listening, speaking, viewing, and performing, consistent with the definition of literacy provided in Chapter 1 of this textbook.

Formal Assessments

One manner to categorize literacy assessments is whether they are formal or informal. Formal literacy assessments usually involve the utilise of some kind of standardized procedures that require administering and scoring the assessment in the same way for all students. An example of formal assessments is state tests, which evaluate proficiency in one or more than literacy domains, such as reading, writing, and listening. During the administration of state tests, students are all given the same test at their given grade levels, teachers read the same directions in the same way to all students, the students are given the same amount of fourth dimension to consummate the test (unless the student received examination accommodations due to a inability), and the tests are scored and reported using the same procedures. Standardization allows control over factors that can unintentionally influence students' scores, such as how directions are given, how teachers respond to students' questions, and how teachers score students' responses. Sure state exam scores are also usually classified as benchmark-referenced because they measure out how students achieve in reference to "a fixed ready of predetermined criteria or learning standards" (edglossary.org, 2014). Each land specifies standards students should see at each grade level, and land exam scores reflect how well students achieved in relation to these standards. For case, on a calibration of 1 to 4, if a student achieved a score of "2" this score would typically reflect that the student is not nevertheless coming together the standards for their grade, and he or she may exist eligible for extra aid toward meeting them.

Another example of a criterion-referenced score is the score achieved on a permit exam to drive a car. A predetermined cut score is used to decide who is prepare to get backside the cycle of a car, and it is possible for all test takers to come across the criterion (e.g., eighty% items right or higher). Benchmark-referenced test scores are contrasted with normatively referenced (i.e., norm-referenced) test scores, such equally an SAT score. How a student does depends on how other students score who take the examination, then there is no criterion score to meet or exceed. To score high, all a student has to do is do better than most everyone else. Norm-referenced scores are frequently associated with diagnostic tests, which volition be described in further item in the section of this affiliate under the heading "Diagnostic Literacy Assessments."

Informal Assessments

Informal literacy assessments are more flexible than formal assessments because they can be adapted according to the student beingness assessed or a item assessment context. Teachers make decisions regarding with whom informal assessments are used, how the assessments are washed, and how to interpret findings. Informal literacy assessments tin can easily contain all areas of literacy such as speaking, listening, viewing, and performing rather than focusing more exclusively on reading and writing. For example, a instructor who observes and records behaviors of a grouping of students who view and discuss a video is probable engaging in informal assessment of the pupil'southward reading, writing, speaking, listening, and/or performing behaviors.

Teachers engage in a multitude of informal assessments each time they collaborate with their students. Asking students to write down something they learned during an English language arts (ELA) class or something they are confused about is a form of informal assessment. Observing students engaging in cooperative learning group discussions, taking notes while they program a project, and fifty-fifty observing the expressions on students' faces during a group activity are all types of informal assessment. Besides, observing students' level of engagement during literacy tasks is informal assessment when procedures are flexible and individualized. Informal classroom-based self-assessments and student inventories used to determine students' attitudes nigh reading may be useful toward planning and adjusting education also (Afflerbach & Cho, 2011).

Methods for assessing literacy that fall somewhere between informal and formal include reading inventories, such every bit the Qualitative Reading Inventory- 5 (QRI-5; Leslie & Caldwell, 2010). Reading inventories require students to read word lists, passages, and answer questions, and although there are specific directions for how to administrate and score them, they offer flexibility in observing how students engage in literacy tasks. Reading inventories are frequently used to record observations of reading behaviors rather than to only measure out reading achievement.

Determinative Assessments

Another useful way to categorize literacy assessments is whether they are determinative or summative. Formative assessments are used to "form" a programme to improve learning. An example of formative literacy cess might involve a classroom teacher checking how many letters and sounds her students know equally she plans decoding lessons. Students knowing but a few letter sounds could be given texts that do not include letters and words they cannot decode to prevent them from guessing at words. Students who know well-nigh of their letter sounds could exist given texts that contain more messages and letter combinations that they can practice sounding out (due east.k., the words in their texts might include all the curt vowels and some digraphs they accept learned, such as sh, th, ck). In this example, using a formative letter of the alphabet-sound assessment helped the teacher to select what to teach rather than simply evaluate what the student knows. Formative assessment is intended to provide teachers with information to ameliorate students' learning, based on what students need.

Summative Assessments

Summative assessments are used to "sum up" if students have met a specified level of proficiency or learning objective. State tests autumn under the category of summative assessments because they are generally given to see which students have met a critical level of proficiency, every bit divers by standards adopted by a particular state. Unit tests are as well summative when they sum upwards how students did in meeting particular literacy objectives by using their knowledge related to reading, writing, listening, speaking, viewing, and performing. A spelling test tin can be both formative and summative. It is formative when the teacher is using the data to plan lessons such equally what to reteach, and it is summative if used to make up one's mind whether students showed mastery of a spelling dominion such as "dropping the 'eastward' and calculation '-ing'." And so the goal of formative assessment is mostly to inform teaching, whereas the goal of summative cess is to summarize the extent to which students surpass a certain level of proficiency at an end-point of instruction, such as at the end of an instructional unit or at the end of a school year.

Literacy Screenings

Another way to categorize assessments is whether they are used for screening or diagnostic purposes. Literacy screenings share characteristics with medical screenings, such as hearing and vision checks in the nurse's office or when a patients' blood pressure is checked at the starting time of a visit to the doc's office. Screenings are typically quick and given to all members of a population (e.g., all students, all patients) to identify potential problems that may not be recognized during twenty-four hours-to-day interactions. See Tabular array i for examples of normally used universal literacy screeners, along with links to information about their utilise.

Tabular array 1. Examples of Commonly Used Universal Literacy Screeners
Universal Literacy Screeners	Links to additional information
AIMSweb	http://www.aimsweb.com/
Dynamic Indicators of Basic Early on Literacy Skills—Next	https://dibels.uoregon.edu/
STAR Reading	http://www.renaissance.com/assess
Phonological Awareness Literacy Screening (PALS)	https://pals.virginia.edu/

Among the most pop literacy screeners used in schools are the Dynamic Indic a tors of B asic Early Literacy Skills—Side by side Edition (DIBELS Next; Good & Kaminski, 2011) and AIMSweb (Pearson, 2012). These screeners include sets of items administered to all children at sure grade levels (which is why they are often chosen "universal" literacy screeners) to do quick checks of their literacy development and identify potential problems that may not exist visible using less formal means. Literacy screenings require young children to consummate one-infinitesimal tasks such as naming sounds they hear in spoken words (e.g., "cat" has the sounds /c/ /a/ /t/), naming the sounds of letters they see (e.g., alphabetic character "p" says /p/), and starting in first class, reading words in brief passages. Universal literacy screenings such as DIBELS Adjacent and AIMSweb are often characterized as "fluency" assessments considering they measure both accuracy and efficiency in completing tasks. For these assessments, the right number of sounds, letters, or words is recorded and compared to a inquiry-established cut indicate (i.e., benchmark) to make up one's mind which students are not probable to exist successful in developing literacy skills without extra aid. If a student scores beneath the benchmark, it indicates that the chore was too hard, and detection of this difficulty tin signal a need for intervention to forestall future academic problems. Intervention typically involves more intensive ways of instruction, such as actress educational activity delivered to modest groups of students.

To learn more than about commercially available screenings such as DIBELS Next and AIMSweb, or to acquire about how to create your own personalized screenings, delight visit http://interventioncentral.org. This site enables teachers to create their own individualized screening probes to assess a diverseness of bones literacy skills, such as identifying letters and sounds, segmenting sounds in spoken words, sounding out nonsense words, reading existent words in connected text, and filling in blanks in reading passages (chosen "maze" procedures). Teachers can select the messages, words, and passages to be included on these individualized assessments. Probes to appraise students' math and writing skills tin can also be created; still, whatever customized screening probes should be used with caution, since they do not share the same measurement properties every bit well-researched screenings such as DIBELS Next and AIMSweb.

Diagnostic Literacy Assessments

The purposes of universal literacy screenings tin exist contrasted with those of diagnostic literacy assessments. Unlike literacy screeners, diagnostic tests are mostly not administered to all students but are reserved for students whose learning needs continue to be unmet, despite their receiving intensive intervention. Diagnostic literacy assessments typically involve the employ of standardized tests administered individually to students by highly trained educational specialists, such every bit reading teachers, special educators, spoken communication and language pathologists, and school psychologists. Diagnostic literacy assessments include subtests focusing on specific components of literacy, such as word recognition, decoding, reading comprehension, and both spoken and written language. Results from diagnostic assessments may be used formatively to assistance plan more targeted interventions for students who do not appear to be responding adequately, or results tin exist combined with those from other assessments to decide whether students may take an educational disability requiring special education services.

An case of a widely used diagnostic literacy test is the Wechsler Individual Achievement Examination-Third Edition (WIAT-III; Wechsler, 2009). The WIAT-Three is typically used to assess the achievement of students experiencing academic difficulties who have not responded to research-based interventions. The WIAT-III includes reading, math, and language items administered according to the age of the pupil and his or her current skill level. The number of items the educatee gets right (the raw score) is converted to a standard score, which is then interpreted according to where the educatee'southward score falls on a bell curve (encounter Effigy one) among other students the same age and grade level who took the same test (e.g., the normative or "norm" sample).

Figure 1. Bell curve showing the percentage of students who fall to a higher place and below the average score of 100 on a diagnostic achievement test.

Most students will score in the middle of the distribution, just some students volition achieve extreme scores—either college or lower than nearly other students. This is why the "tails" at either side of the bell curve slope downwards from the big hump in the centre—this illustrates the decreasing frequency of scores that are especially low or high. In other words, the more farthermost the score, the fewer students are likely to achieve information technology. When students achieve at either extreme, information technology can indicate the need for more than specialized teaching related to the private needs of the student (e.thou., intervention or gifted services).

Diagnostic achievement tests are frequently referred to as "norm-referenced" (edglossary.org, 2013) because their scores are compared to scores of students from a norm sample. A norm sample is a group of individuals who were administered the same test items in the same way (i.eastward., using standardized procedures) while the test was existence developed. Students who take the exam have their operation compared to that of students from the norm sample to brand meaning of the score. For instance, if a student were given a diagnostic assessment and the score fell within the same range every bit most of the students in the norm sample, and so his or her score would be considered "boilerplate." If the student'due south score vicious much college or lower than other students in the norm sample, then the score would not be considered boilerplate or typical because most of the other students did non score at either of these extremes.

Comparing students' scores to a norm sample helps identify strengths and needs. Then over again, just knowing where students' scores fall on a bell curve does nothing to explain why they scored that mode. An extremely low score may indicate a learning trouble, or, information technology may indicate a lack of motivation on the part of the student while taking the test. Perhaps a depression score could even be due to a scoring fault made by the tester. Even though a score from a diagnostic cess may exist quite precise, understanding why a student scored at a particular level requires boosted data. Did observations during testing show that the student was distracted, uncooperative, or was squinting at items? Information technology is often a combination of assessment data that helps identify why a pupil may have scored a sure way and is why testers ofttimes use their observations during testing to interpret the meaning of scores.

Group achievement tests such as The Iowa Test of Basic Skills (ITBS; Hoover Dunbar, & Frisbie, 2003) that include literacy subtests have properties that make them function somewhat like a screening and somewhat like a diagnostic test. Like screeners, they are administered to all students at a particular grade level, merely different most screeners, they accept more time to complete and are administered to entire classrooms rather than having at least some sections administered individually. Like diagnostic tests, they tend to produce scores that are norm-referenced. Students' performance is compared to a norm group to see how they compare among peers, but dissimilar diagnostic tests, the tester is not able to discern how well scores represent students' abilities because testers are not able to observe all of the students' testing behaviors that may impact the estimation of scores (e.thousand., levels of date, motivation).

For many diagnostic literacy tests, reviews are available through sources such every bit the Mental Measurements Yearbook (MMY). Versions of the MMY are available in hard copy at many libraries, too as online for costless for students at colleges and universities whose libraries pay a fee for admission. Reviews are typically completed by experts in various fields, including literacy and measurement experts. Reviews too include complete descriptions of the exam or assessment procedure, who publishes it, how long it takes to administer and score, a review of psychometric properties, and a critique of the test in reference to decisions people plan to brand based on findings. It is important for teachers and other educators who use tests to understand the benefits and problems associated with selecting ane test over another, and resource such as the MMY offering reviews that are quick to locate, relatively easy to comprehend (when one has some groundwork cognition in cess), and are written by people who do not profit from the publication and auction of the assessment.

Single Bespeak Estimates

Literacy assessments that are completed merely one time provide a unmarried point approximate of a student's abilities. An case of a single point gauge is a student'southward give-and-take identification score from a diagnostic achievement examination. If the student'south score is far beneath what is expected for his or her age or class level, and so the score signals a need to determine what is at the root of low performance. Alternatively, a single low score does non necessarily betoken a lack of ability to learn, since with a modify in instruction, the student might brainstorm to progress much faster and eventually take hold of up to his or her typical age-based peers. To assess a student'southward rate of learning, progress-monitoring assessments are needed.

Progress-Monitoring Literacy Assessments

To monitor a pupil's progress in literacy, assessments are needed that actually measure out growth. Rather than just taking a snapshot of the student'south achievement at a unmarried indicate in time, progress-monitoring assessments provide a baseline (i.eastward., the starting betoken) of a educatee's achievement, forth with periodic reassessment as he or she is progressing toward learning outcomes. Such outcomes might include achieving a benchmark score of correctly reading 52 words per minute on oral reading fluency passages or a goal of learning to "ask and answer cardinal details in a text" (CCSS.ELA-Literacy.RL.ane.2) when prompted, with 85% accuracy. The first issue of correctly reading 52 words per minute would likely be measured using progress-monitoring assessments, such as DIBELS Adjacent and AIMSweb. These screeners are not only designed to mensurate the extent to which students are at chance for future literacy-related problems at the offset of the schoolhouse twelvemonth but also to monitor changes in progress over fourth dimension, sometimes as often as every one or ii weeks, depending on individual student factors. The 2nd result of being able to "enquire and respond primal details in a text" could be monitored over fourth dimension using assessments such as country tests or responses on a qualitative reading inventory. Being able to work with key details in a text could also exist informally assessed past observing students engaged in classroom activities where this task is practiced.

Unlike assessments that are completed only one time, progress-monitoring assessments such as DIBELS Adjacent and AIMSweb feature multiple, equivalent versions of the aforementioned tasks, such as having 20 oral reading fluency passages that can be used for reassessments. Using unlike but equivalent passages prevents artificial increases in scores that would consequence from students rereading the aforementioned passage. Progress-monitoring assessments can be contrasted with diagnostic assessments, which are not designed to be administered oftentimes. Administering the same subtests repeatedly would non be an effective way to monitor progress. Some diagnostic tests have ii equivalent versions of subtests to monitor progress infrequently—mayhap on a yearly basis—simply they are but not designed for frequent reassessments. This limitation of diagnostic assessments is one reason why screeners similar DIBELS Adjacent and AIMSweb are and then useful for determining how students reply to intervention and why diagnostic tests are often reserved for making other educational decisions, such as whether a student may have an educational inability.

Progress-monitoring assessments have transformed how schools determine how a student is responding to intervention. For example, consider the hypothetical example of Jaime's progress-monitoring assessment results in second class, shown in Figure ii. Jaime was given oral reading fluency passages from a universal literacy screener, and then his progress was monitored to make up one's mind his response to a minor grouping literacy intervention started in mid-Oct. Data points show the number of words Jaime read correctly on each of the i-minute reading passages. Notice how at the beginning of the school year, his baseline scores were extremely low, and when compared to the start of the yr 2nd grade criterion (Dynamic Measurement Group, 2010) of 52¹ words per minute (Good & Kaminski, 2011), they signaled he was "at risk" of not reaching later benchmarks without receiving intensive intervention. Based on Jaime's baseline scores, intervention team members decided that he should receive a research-based literacy intervention to help him read words more easily and so that his oral reading fluency would increase at least ane word per week. This learning goal is represented past the "target slope" seen in Figure two. During the intervention stage, progress-monitoring information points show that Jaime began making improvements toward this goal, and the line labeled "gradient during intervention" shows that he was gaining at a charge per unit slightly faster than his i word per week goal.

Ch 5 figure 2

Figure 2. Progress-monitoring graph of response to a reading intervention.

When looking at Jaime's baseline information, notice how the data points form a plateau. If his progress connected at this same charge per unit, past the end of the school year, he would be fifty-fifty farther behind his peers and be at even greater risk for future reading problems. When interpreting the graph in Effigy 2, information technology becomes articulate that intensive reading intervention was needed. Find after the intervention began how Jaime's growth began to climb steeply. Although he appeared to exist responding positively to intervention, in reality, by the cease of 2d class, students whose reading ability progresses adequately should be reading approximately 90 words correctly per minute (Good & Kaminski, 2011). Based on this data, Jaime is not likely to reach the level of reading xc words correctly past the stop of second grade and volition probably but attain the criterion expected for a student at the beginning of 2nd grade. These cess data suggest that Jaime's intervention should exist intensified for the balance of second course to accelerate his progress further. It is also likely that Jaime volition need to proceed receiving intervention into third grade, and progress monitoring can determine, along with other assessment data, when his oral reading fluency improves to the indicate where intervention may exist changed, reduced, or fifty-fifty discontinued. Y'all may wonder how the intervention team would determine whether Jaime is progressing at an adequate footstep when he is in third course. Team members would proceed to monitor Jaime'southward progress and check to make sure his growth line shows that he volition meet benchmark at the stop of 3rd grade (i.e., correctly reading approximately 100 words per minute; Proficient & Kaminski, 2011). If his gradient shows a lack of adequate progress, his teachers can revisit the need for intervention to ensure that Jaime does not fall behind again.

Some schools monitor their students' progress using computer-adjusted assessments, which involve students responding to examination items delivered on a reckoner. Figurer-adapted assessments are designed to deliver specific test items to students, and so adapt the number and difficulty of items administered according to how students respond (Mitchell, Truckenmiller, & Petscher, 2015). Figurer-adapted assessments are increasing in popularity in schools, in part, considering they do not require a lot of time or effort to administrate and score, but they do require schools to accept an adequate technology infrastructure. The reasoning behind using these assessments is similar to other literacy screeners and progress-monitoring assessments—to provide effective teaching and intervention to come across all students' needs (Mitchell et al., 2014).

Although many literacy screening and progress-monitoring assessment scores accept been shown to be well-correlated with a diverseness of measures of reading comprehension (see, for example, Goffreda & DiPerna, 2010) and serve as reasonably proficient indicators of which students are at risk for reading difficulties, a persistent problem with these assessments is that they provide little guidance to teachers virtually what kind of literacy instruction and/or intervention a educatee actually needs. A student who scores depression at baseline and makes inadequate progress on oral reading fluency tasks may need an intervention designed to increase reading fluency, only in that location is also a take chances that the educatee lacks the ability to decode words and really needs a decoding intervention (Murray, Munger, & Clonan, 2012). Or information technology could be that the pupil does not know the significant of many vocabulary words and needs to build background noesis to read fluently (Adams, 2010-2011), which would crave the use of dissimilar assessment procedures specifically designed to appraise and monitor progress related to these skills. Fifty-fifty more vexing is when low oral reading fluency scores are caused past multiple, intermingling factors that need to be identified before intervention begins. When the problem is more complex, more specialized assessments are needed to disentangle the factors contributing to it.

A last note related to progress-monitoring procedures is the emergence of studies suggesting that there may be better means to measure students' progress on instruments such as DIBELS Next compared to using slope (Skillful, Powell-Smith, & Dewey, 2015), which was depicted in the example using Jaime'southward data. In a recent conference presentation, Practiced (2015) argued that the slope of a student's progress may be also inconsistent to monitor and adjust education, and he suggested a new (and somewhat mathematically complex) alternative using an index called a pupil growth percentile. A student growth percentile compares the charge per unit at which a student's achievement is improving in reference to how other students with the aforementioned baseline score are improving. For example, a pupil reading 10 correct words per minute on an oral reading fluency mensurate whose growth is at the 5th percentile is improving much more than slowly compared to the other children who also started out reading only 10 words correctly per minute. In this case, a growth percentile of 5 ways that the pupil is progressing only as well as or improve than 5 percent of peers who started at the same score, and also means that the current didactics is non meeting the student's needs. Preliminary research shows some promise in using growth percentiles to measure progress every bit an alternative to slope, and teachers should exist on the spotter for more research related to improving ways to monitor educatee progress.

Linking Assessment to Intervention

How can teachers figure out the details of what a pupil needs in terms of intervention? They would probable use a variety of breezy and formal assessment techniques to determine the student'due south strengths and needs. The situation might require the use of diagnostic assessments, a reading or writing inventory, the use of observations to determine whether the pupil is engaged during teaching, and/or the use of assessments to meliorate understand the educatee'south trouble-solving and other thinking skills. It may exist a combination of assessment techniques that are needed to match research-based interventions to the educatee's needs.

You may be starting to recognize some overlap among dissimilar types of assessments across categories. For example, state tests are ordinarily both formal and summative. Literacy screeners and progress-monitoring assessments are often formal and formative. And some assessments, such every bit portfolio assessments, have many overlapping qualities beyond the various assessment categories (e.g., portfolios can be used formatively to guide teaching and used summatively to determine if students met an bookish outcome).

In bringing upwardly portfolio assessments, this takes usa back to points raised at the beginning of this chapter related to the authenticity of literacy assessments. So why do multiple selection tests be if options such as portfolio assessment, which are so much more authentic, are an selection? High-quality multiple choice tests tend to take stronger psychometric properties (discussed in the next section) than performance assessments like portfolios, which make multiple option tests desirable when assessment time is express and scores need to take strong measurement properties. Multiple pick test items are often easy to score and practice non require a great deal of inference to translate (i.e., they are "objective"), which are some of the reasons why they are popularly used. Portfolio assessments oftentimes take longer to exercise merely as well reflect the utilize of many important literacy skills that multiple option items just cannot assess. Based on this discussion, you may wonder if portfolio assessments are superior to multiple option tests, or if the opposite is truthful. Every bit always, an answer about a preferred format depends on the purpose of the assessment and what kinds of decisions will exist made based on findings.

Psychometric Principles of Literacy Assessment

A chapter virtually literacy assessment would non exist complete without some discussion about psychometric properties of assessment scores, such equally reliability and validity (Trochim, 2006). Reliable assessment means that the data gathered is consistent and undecayed—that the aforementioned or like results would be obtained if the student were assessed on a dissimilar day, by a different person, or using a like version of the same assessment (Trochim, 2006). To call up well-nigh reliability in do, imagine you were observing a educatee's reading behaviors and adamant that the student was struggling with paying attending to punctuation marks used in a storybook. You charge per unit the educatee's proficiency as beingness a one on a one to iv calibration, meaning he or she reads equally though no punctuation marks were noticed. Your colleague observed the student reading the same volume at the same time you were observing, and he rated the educatee'due south proficiency as a "three," meaning that the student was paying attention to well-nigh of the punctuation in the story, just non all. The departure between your rating and your colleague's rating signals a lack of reliability among raters using that scale. If these same inconsistencies in ratings arose across other items on the reading behavior scale or with other students, you lot would conclude that the scale has problems. These problems could include that the scale is poorly constructed, or that in that location may simply be inter-rater reliability bug related to a lack of grooming or feel with the people doing the ratings.

Reliability of formal assessment instruments, such as tests, inventories, or surveys, is usually investigated through enquiry that is published in academic journal manufactures or exam manuals. This kind of inquiry involves administering the musical instrument to a sample of individuals, and findings are reported based on how those individuals scored. These findings provide "estimates" of the test'southward reliability, since indexes of reliability will vary to a sure degree, depending on the sample used in the research. The more than stable reliability estimates are across multiple various samples, the more teachers can count on scores or ratings being reliable for their students. When reliability is unknown, so decisions made based on assessment information may not be trustworthy. The need for potent reliability versus the need for authenticity (i.e., how well the assessment matches real life literacy situations) is a rivalry that underlies many testing debates.

In improver to assessments needing to be reliable, information gathered from assessments must likewise exist valid for making decisions. A test has bear witness of validity when research shows that information technology measures what it is supposed to measure (Trochim, 2006). For example, when a exam that is supposed to identify students at adventure for writing problems identifies students with actual writing problems, then this is evidence of the test'due south validity. A weekly spelling examination score may lack evidence of validity for applied spelling ability considering some students may just be skilful memorizers and not be able to spell the same words accurately or apply the words in their writing. When assessment information is not reliable, then it cannot exist valid, so reliability is a keystone for the evaluation of assessments.

Sometimes, a exam that seems to examination what it is supposed to test will have issues with validity that are not apparent. For example, if students are tested on math applications problems to see who may demand math intervention, a problem could ascend if the children may not be able to read the words in the problems. In this case, the students may get many items incorrect, making the math test more like a reading test for these students. It is research on validity and observations by astute educators that help uncover these sorts of bug and prevent the delivery of a math intervention when what may actually exist needed is a reading intervention.

The validity result described above is ane reason why some students may receive accommodations (e.thou., reading a test to students) considering accommodations can really increase the validity of a test score for certain students. If students with reading disabilities had the above math test read to them, then their resulting scores would likely be a truer indicator of math ability because the accommodation ruled out their reading difficulties. This same logic applies to English language learners (ELLs) who tin can understand spoken English much better than they can read it. If a loftier school exam assessing knowledge of biology is administered and ELL students are unable to pass it, is it because they practice not know biological science or is information technology because they do not know how to read English? If the goal is to assess their knowledge of biology, then the examination scores may non be valid.

Another instance of a validity issue occurs if a educatee with visual impairment were assessed using a reading chore featuring print in 12-point font. If the student scored poorly, would yous refer him or her for reading intervention? Hopefully, not. The student might actually need reading intervention, only there is a validity problem with the assessment results, so that in reality, you would need more information before making whatsoever decisions. Consider that when you reassess the student'due south reading using large impress that the student's score increases dramatically. You and so know that information technology was a print size problem and not a reading problem that impacted the educatee's initial score. On the other hand, if the pupil nonetheless scored low even with appropriately enlarged print, you would conclude that the student may accept a visual impairment and a reading problem, in which case providing reading intervention, along with the adaptation of big print cloth, would be needed.

Some Controversies in Literacy Assessment

While there is fiddling controversy surrounding literacy assessments that are breezy and part of normal classroom practices, formal assessments activate huge controversy in schools, in inquiry communities, on Cyberspace word boards, and in textbooks like this. When because the scope of educational assessment, one thing is clear: many school districts requite far likewise many tests to far as well many students and waste material far also many hours of educational activity gathering data that may or may non prove to accept any value (Nelson, 2013). The over testing problem is especially problematic when so much time and endeavor become into gathering information that do not fifty-fifty end upward being used. Whether a schoolhouse is overwhelmed with testing is not universal. School districts have a nifty deal of influence over the utilize of assessments, but all too often when new assessments are adopted, they are added to a collection of previously adopted assessments, and the commune becomes unsure about which assessments are still needed and which should be eliminated. Assessments besides are added based on policy changes at federal and state levels. For instance, the passing of the No Child Left Backside Human activity of 2001 (NCLB, 2002) expanded land testing to occur in all grades iii through eight, compared to previous mandates which were much less stringent.

Some tests are mandated for schools to receive funding, such equally state tests; however, the employ of other assessments is largely up to school districts. Information technology is important for educators and school leaders to periodically inventory procedures being used, discuss the extent to which they are needed, and make decisions that will provide answers without over testing students. In other words, the validity of assessments is not simply limited to how they are used with individual students but must be evaluated at a larger organization level in which benefits to the whole student body are as well considered. When assessments provide data that are helpful in making instructional decisions but also accept away weeks of instructional time, educators and schoolhouse leaders must work toward solutions that maximize the value of assessments while minimizing potential negative furnishings. Not liking test findings is a different issue than test findings not being valid. For case if a test designed to identify students behind in reading is used to modify instruction, then information technology may exist quite valuable, fifty-fifty if it is unpleasant to find out that many students are having difficulty.

As a gild, we tend to want indicators of student accountability, such as that a minimum standard has been met for students to earn a loftier school diploma. Often, earning a diploma requires students to pass high-stakes exit exams; still, this seemingly straightforward utilize of test scores can easily atomic number 82 to social injustice, particularly for students from culturally and linguistically diverse backgrounds. Because high-stakes tests may be inadequate at providing consummate information well-nigh what many students know and can practice, the International Reading Association (IRA, 2014) released a position statement that included the following recommendation:

High schoolhouse graduation decisions must exist based on a more than consummate picture of a student's literacy operation, obtained from a variety of systematic assessments, including informal observations, determinative assessments of schoolwork, and consideration of out-of-school literacies, likewise as results on standardized formal measures. (p. 2)

The IRA recommends that "teacher professional judgment, results from formative assessments, and student and family input, as well every bit results from standardized literacy assessments" (p. 5) serve as adequate additions in making graduation decisions. At that place is no easy answer for how to use assessments to precisely communicate how well students are prepared for college, careers, and life, and we are likely many reform movements away from designing a suitable plan. Nevertheless, the more than educators, families, and policy-makers know about assessments—including the inherent benefits and bug that accompany their use—the more progress tin be made in refining techniques to make informed decisions designed to enhance students' futures. Literacy assessments tin only be used to improve outcomes for students if educators have deep knowledge of research-based instruction, assessment, and intervention and can use that knowledge in their classrooms. For this reason, information from this chapter should be combined with other chapters from this book and other texts outlining the use of constructive literacy strategies, including students who are at risk for developing reading problems or who are English language learners.

Summary

Although literacy assessment is often associated with loftier-stakes standardized tests, in reality, literacy assessments encompass an array of procedures to assist teachers brand instructional decisions. This chapter highlighted how teachers can use literacy assessments to improve instruction, but in reality, assessment results are often used to communicate about literacy with a variety of individuals, including teams of educators, specialists, and family and/or community members. Knowing about the different kinds of assessments and their purposes volition let you to be a valuable addition to these important conversations.

Literacy assessments tin can be breezy or formal, formative or summative, screenings or diagnostic tests. They tin provide data at single points in time or to monitor progress over time. Regardless of their intended purpose, it is important that assessment information be trustworthy. It is also of import that teachers who utilise assessments understand associated benefits and difficulties of unlike procedures. An assessment that is ideal for use in i circumstance may be inappropriate in another. For this reason, teachers who have background in assessment will be meliorate equipped to select appropriate assessments which have the potential to benefit their students, and they also will exist able to critique the utilise of assessments in means that can improve assessment practices that are more arrangement-wide. Literacy assessments are an important part of educational decision making, and therefore, information technology is essential that teachers gain a thorough understanding of their uses and misuses, gain experience interpreting information obtained through assessment, and actively participate in reform movements designed not just to eliminate testing only to utilize assessments in thoughtful and meaningful ways.

Questions and Activities

Using some of the terms learned from this chapter, discuss some ordinarily used loftier-stakes literacy assessments, such as state-mandated tests or other tests used in schools.
Explain ways in which some forms of literacy assessment are more controversial than others and how the more than controversial assessments are impacting teachers, students, and the educational activity organisation.
What are the differences between formative and summative assessments? Listing some examples of each and how you currently use, or plan to use these assessments in your didactics.
A colleague of yours decides that she would like to utilise a diagnostic literacy test to assess all students in her heart schoolhouse to come across who has reading, spelling, and/or writing issues. The exam must be administered individually and volition accept approximately 45 minutes per student. Although there is but one form of the assessment, your colleague would like to administer the test 3 times per twelvemonth. After listening carefully to your colleague'south ideas, what other ideas do you have that might help meet your colleague's goal besides the use of a diagnostic literacy test?

References

Adams, Thousand. J. (2010-2011, Wintertime). Advancing our students' linguistic communication and literacy: The claiming of complex texts. American Educator, 34, 3-11, 53. Retrieved from http://world wide web.aft.org/sites/default/files/periodicals/Adams.pdf

Afflerbach, P., & Cho, B. Y. (2011). The classroom assessment of reading. In M. J. Kamil, P. D. Pearson, E. B. Moje, & P. P. Afflerbach (Eds.), Handbook of reading research (Vol. 4, pp. 487-514). New York, NY: Routledge.

Dynamic Measurement Group (2010, December 1). DIBELS Adjacent benchmark goals and composite scores. Retrieved from https://dibels.uoregon.edu/docs/DIBELSNextFormerBenchmarkGoals.pdf

Edglossary (2013, Baronial 29). Norm-referenced test [online]. Retrieved from http://edglossary.org/norm-referenced-test/

Edglossary (2014, April xxx). Criterion-referenced exam [online]. Retrieved from http://edglossary.org/criterion-referenced-test/

Goffreda, C. T., & DiPerna, J. C. (2010). An empirical review of psychometric testify for the Dynamic Indicators of Basic Early Literacy Skills. School Psychology Review, 39, 463-483. Available at http://www.nasponline.org/publications/periodicals/spr/volume-39/volume-39-issue-3/an-empirical-review-of-psychometric-evidence-for-the-dynamic-indicators-of-basic-early on-literacy-skills

Good, R. H. (2015, May xix). Improving the efficiency and effectiveness of instruction with progress monitoring and determinative evaluation in the outcomes driven model. Invited presentation at the International Conference on Cognitive and Neurocognitive Aspects of Learning: Abilities and Disabilities, Haifa, Israel. Retrieved from https://dibels.org/papers/Roland_Good_Haifa_Israel_2015_Handout.pdf

Good, R. H., & Kaminski, R. A. (Eds.). (2011). DIBELS Adjacent assessment manual . Eugene, OG: Dynamic Measurement Group, Inc. Retrieved from http://www.d11.org/edss/assessment/DIBELS%20NextAmplify%20Resources/DIBELSNext_AssessmentManual.pdf

Good, R. H., Powell-Smith, K. A., & Dewey, E. (2015, February). Making r eliable and s table p rogress decisions: Slope or pathways of p rogress ? Affiche presented at the Annual Pacific Declension Research Conference, Coronado, CA.

Hoover, H. D., Dunbar, S. B., & Frisbie, D. A. (2003). The Iowa Tests: Guide to inquiry and evolution. Chicago, IL: Riverside Publishing.

International Reading Association. (2014). Using high-stakes assessments for class retention and graduation decisions: A position argument of the International Reading Association.Retrieved from http://world wide web.literacyworldwide.org/docs/default-source/where-we-stand/high-stakes-assessments-position-argument.pdf

Leslie, L., & Caldwell, J. S. (2010). Qualitative reading inventory-5. Boston, MA: Pearson.

Mitchell, A. M., Truckenmiller, A., & Petscher, Y. (2015, June). Computer-adjusted assessments: Fundamentals and considerations. Communique, 43(eight), one, 22-24.

Murray, M. Southward., Munger, K. A., & Clonan, S. Thou. (2012). Assessment every bit a strategy to increment oral reading fluency. Intervention in Schools and Clinic, 4 seven, 144-151. doi:ten.1177/1053451211423812

National Governors Association Centre for Best Practices & Council of Chief Land School Officers. (2010). Common Cadre State Standards for English Linguistic communication Arts & Literacy in History/Social Studies, Scientific discipline, and Technical Subjects . Washington, DC: Author. Retrieved from http://world wide web.corestandards.org/assets/CCSSI_ELA%20Standards.pdf

Nelson, H. (2013). Test ing more than, teaching less: What American's obsession with student testing costs in coin and lost instructional time . Retrieved from http://www.aft.org/sites/default/files/news/testingmore2013.pdf

No Child Left Behind Act of 2001, Pub. Fifty. No. 107-110, 115 Stat. 1425 (2002).

Pearson. (2012). AIMS web technical manual (R-CBM and TEL). NCS Pearson, Inc. Retrieved from http://www.aimsweb.com/wp-content/uploads/aimsweb-Technical-Transmission.pdf

Snow, C. (Chair). (2002). RAND reading study group: Reading for understanding, toward an R&D programme in reading comprehension. Santa Monica, CA: RAND. Retrieved from http://world wide web.rand.org/content/dam/rand/pubs/monograph_reports/2005/MR1465.pdf

Trochim, Westward. K. (2006). Research methods knowledge base : Construct validity. Retrieved from http://www.socialresearchmethods.net/kb/relandval.php

Wechsler, D. (2009). Wechsler Private Accomplishment Test(3rd ed.). San Antonio, TX: Pearson.

Photo Credit

Image in Figure i by Wikimedia, CCBY-SA 3.0 https://upload.wikimedia.org/wikipedia/commons/three/39/IQ_distribution.svg

Endnotes

1: The benchmark of 52 words per minute is considered a "benchmark-referenced" score because a educatee's functioning is judged confronting a criterion—in this case, the benchmark. Recall that scores obtained on diagnostic literacy assessments are norm-referenced because they are judged confronting how others in a norm group scored. Some progress-monitoring assessments provide both criterion-referenced and norm-referenced scores to aid in decision-making when more than i blazon of score is needed. Render

mccarthyyounnoubt.blogspot.com

Source: https://courses.lumenlearning.com/literacypractice/chapter/5-types-of-literacy-assessment-principles-procedures-and-applications/