The Challenges Of Pre-K Assessment
In the United States, nearly 1.3 million children attend publicly-funded preschool. As enrollment continues to grow, states are under pressure to prove these programs serve to increase school readiness. Thus, the task of figuring out how best to measure preschoolers’ learning outcomes has become a major policy focus.
First, it should be noted that researchers are almost unanimous in their caution about this subject. There are inherent difficulties in the accurate assessment of very young children’s learning in the fields of language, cognition, socio-emotional development, and even physical development. Young children’s attention spans tend to be short and there are wide, natural variations in children’s performance in any given domain and on any given day. Thus, great care is advised for both the design and implementation of such assessments (see here, here, and here for examples). The question of if and how to use these student assessments to determine program or staff effectiveness is even more difficult and controversial (for instance, here and here). Nevertheless, many states are already using various forms of assessment to oversee their preschool investments.
It is difficult to react to this (unsurprising) paradox. Sadly, in education, there is often a disconnect between what we know (i.e., research) and what we do (i.e., policy). But, since our general desire for accountability seems to be here to stay, a case can be made that states should, at a minimum, expand what they measure to reflect learning as accurately and broadly as possible.
So, what types of assessments are better for capturing what a four- or a five- year old knows? How might these assessments be improved?
According to a recent Educational Testing Service (ETS) survey of state-funded Pre-K providers, most programs (50 of 54 surveyed) are already using one or more forms of student assessment. Traditional "direct assessments" are currently mandated in 4 states (Alabama, Alaska, Nevada, Virginia); 19 programs rely on observation checklists and scales only; and 8 use a combination of approaches. Some providers (n=19) did not specify the type of assessment they use. Only one program used samples of children’s work or children’s portfolios. The states of Maryland, Tennessee, Texas, and Wisconsin fund Pre-K programs for which there are no requirements to collect child outcome data.
Ackerman and Coley, the authors of the ETS study, argue that direct assessments can be useful to establish trends and to screen for disabilities, but that they don’t capture the full picture of a child’s skill set. Also, this type of measure does not produce useful information for improving instruction or informing professional development. In contrast, observations can be useful to document learning in a more natural fashion—i.e., as the child is engaged in everyday classroom activities. Moreover, the data collected can inform teacher practice and capture the entire developmental range. Portfolios or samples of children’s work can offer real-world, multi-source evidence of preschoolers’ skills and knowledge.
Even though observations and portfolios appear to be superior methodologies, they are also the most resource intensive. They require extensive training to ensure interrater reliability, and need to be administered, (in the case of portfolios) continuously updated and reviewed by regular classroom staff.
When facing issues of efficiency and data collection/management, one obvious question is: Can technology help and, if so, how? Perhaps. I can think of at least two possible ways.
First, certain tools can assist teachers in the collection and organization of information about their students. Both the observation and portfolio method may involve taking a photo or a video of student work as they are engaged in a classroom activities. This blogger describes the use of Mental Note, an iPhone/iPad application that can be used for these purposes. This software tool allows users to combine voice recordings, sketches, text, and pictures, all in the same note. Notes can be stored on the device, and/or shared via e-mail or cloud services.
Second, adaptive literacy and numeracy applications could be used to collect data on how children perform in age-appropriate activities and games. The information would later be analyzed by teachers to construct assessments. Students would not know if/when they are being assessed and the instrument could be administered frequently, thus reducing data reliability concerns.
Of course, collecting data on children - with or without technology’s involvement - raises ethical issues that would need to be addressed. Universities have ethics committees or Institutional Review Boards (IRBs) that oversee all research involving humans conducted at the institution. I have argued elsewhere that using this or a similar framework in educational settings would not only safeguard participants, but also result in improved study designs. Ethics are established by weighing benefits and costs for research participants and society more generally. The process involves asking the kinds of questions that are conducive to rigorous, theory-driven research – as opposed to the “collect data first, ask questions later” model. In fact, Ackerman and Coley recommend that, when choosing an early childhood assessment, researchers ask a set of questions*, most of which coincide with what is asked and answered when IRB standards apply.
According to Steven Barnett, director of the National Institute for Early Education Research, “even a high-quality measure should not be the sole yardstick used to assess children and programs." While a combination of methods and frequent administration may be the best approach, issues of cost and labor need to be considered. Before going down the cost-effective but also reductionist path of direct assessment, it is necessary to examine how technology may support the use of more appropriate methodologies like observations and portfolios.
- Esther Quintero
* The questions they propose are: 1) Will the measure be used for the purpose for which it was designed? 2) Will the measure provide valid and reliable data on students’ learning over time? 3) What kind of training will be needed by those administering the measure and interpreting the results? 4) What are the costs and benefits of administering, scoring, reporting, and interpreting the results of a single measure vs. multiple measures on a large-scale.