Having a child undergo assessment for learning disabilities is a complex and confusing process for most parents. Many parents want clear information about psychoeducational testing — what the tests involve and how to understand and interpret the resulting test scores. Because these tests will be used to determine the nature and severity of any underlying disorders, it’s important to understand what the test results mean.
In her book, Learning Disabilities from a Parent’s Perspective, author and parent Kim Glenchur does an excellent job of “demystifying” the process of psychoeducational testing. Following is an excerpt from her book which clearly explains how to understand and interpret the scores that result from psychoeducational testing.
Most psychological tests are formal statistical measures of behavioral responses to test items that, over time and professional experience, have been accepted as appropriate measures of abilities and achievements. The basic issue of psychological testing is whether the test truly fulfills its claims. In order to understand psychological testing, some underlying statistical concepts must be reviewed.
Representative norm group
Like a control group in any scientific experiment, a representative norm group establishes the range of normal performance on a test. The individuals must be chosen at random from a larger population, and be truly representative of individuals with certain characteristics of age, intelligence, and so on. For example, if the representative norm only consisted of male students, then the test results comparing a female student’s performance against this norm may be inaccurate. The accuracy of the performance range is also dependent on the size of the representative sample: The larger the norm group, the more accurately defined is the normal range.
With respect to psychological tests, revised tests anticipate IQ gains of the general American population each generation. Thus, a child could score lower on a restandardized test than on the version just retired.
At the statistical scoring extremes of a population, a few points of change can greatly affect school placement decisions, whereas a few points of change around the population average can be dismissed as random error.
Test reliability is about scoring accuracy. A reliable test reproduces the same results upon a second test administration, assuming no prior learning or actions that would alter the trait being tested. A reliable test is also longer rather than shorter; a large number of test items can reduce test problems such as a child’s confusion or attentional lapse with a particular question.
Standard error is another indicator of reliability. Test measurements of ability, achievement, and so on, are not single numerical scores but are really a range of possible outcomes as indicated by that test’s standard error of measurement. For example, a test score of 100 with a standard error of 5 suggests that the real score lies in the range of 95-105. A less reliable test could have a standard error of 10, meaning that the real score lies between 90-110.
Test validity is about effectively measuring the trait. According to David Wodrich, Clinical Director of Child Psychology at The Phoenix Children’s Hospital in Arizona, a test title is frequently a “poor guide” on what that test or subtest measures.1 Content validity indicates whether the test contains items that truly measure a certain trait. For example, an intelligence test limited only to math items would really be a test of quantitative ability. Special attention should be given to standardized national achievement tests, which rarely match local curricula exactly. Construct validity denotes how well a test captures characteristics of a trait, as predicted by a particular theory. Predictive validity is a measurement of a test’s usefulness to predict outcomes. For example, IQ tests began as an effort to identify people who would do well in college. Concurrent validity means that a test correlates well with other similar measures of the same trait.
Test uniformity and objectivity
Test uniformity and objectivity is the main difference between a formal standardized test and an informal test, such as asking a child the color of his shirt that day. Uniformity refers to one test being administered to a great number of people, and the test results can be used for later statistical analysis. Objectivity refers to unbiased scoring of test answers, a quality desirable, for example, in a baseball umpire calling balls and strikes.
Quantifiable scores support interpretation of the test results. Most psychological tests provide numerical scores, which allow statistical comparisons. Examples of tests without numerical scores are the Rorschach inkblot, projective drawings, and incomplete sentence tests.
Age and grade equivalent scores indicate the level of performance of the child. Thus, “10-3” represents the typical performance of a child of age 10 years and 3 months. Similarly, a fourth grade, third-month performance level would be represented by “4-3.” These are rough guides, however, because actual skills depend on the actual material presented in the classroom. A more reliable interpretation of a test result is the statistical difference of the individual’s score from the norm population’s average performance.
Percentile ranks indicates the percentage of students in the representative norm or sample group who scored below your child’s score. A 60th percentile indicates that the child scored better than 60% of the population norm. Percentile ranks are a numerical ordering of test scores, from 1 to 99. This ranking method, however, does not provide information on the intervals between percentile values. For example, a few points difference in the score around the middle could dramatically alter a child’s percentile ranking. Conversely, a few points difference at the very low or very high end may not result in any change in the child’s percentile rankings at all.
Mean (often symbolized by ì; known in the dictionary as mu) is the average of the test scores of the norm, calculated by adding the values of the test scores and then dividing this sum by the number of tests taken. The best psychological tests should have been developed using a large number of individuals that would be truly representative of a certain target population. For example, in a general ability or achievement test, an average result really would be the average score of the general population having the same characteristics as the test-taker.
Significant or meaningful differences are two standard deviations away from the mean. For example, if one standard deviation is a span of 15 points away from the mean on a particular test, then two standard deviations equals 30 points away from the mean. On IQ tests with SD = 15 and mean = 100, an IQ at or below 70 is “mentally retarded”, and an IQ at or above 130 is “gifted.”
Putting it all together: making the diagnosis
The Diagnostic and Statistical Manual of Mental Disorders (DSM-IV-TR) provides diagnostic guidelines for determining the presence of a learning disorder, but each state has its own set of rules for establishing the presence of a learning disability based on IDEA (Individuals with Disabilities Education Act). Many definitions use the concept of discrepancy, an underachievement difference between actual performance and expected performance in a child, given her or his intelligence, home environment, and school attendance, among other factors. A discrepancy of two or more standard deviations (of actual language or math performance below that expected for a given intelligence level) is considered significant, and is one way to qualify a student for special education. If the discrepancy is between one and two standard deviations, then other factors may be considered such as cognitive problems affecting intelligence testing, an emotional disorder like depression, a general medical problem, or cultural background. Conversely, if a sensory deficit exists, then the discrepancy must be greater than normally found in people with that same deficit.3
The significance of “statistically significant”
Laws have differentiated disability from disorder. IDEA neither states which evaluation measures are to be used in the identification process, nor clarifies what threshold levels determine the presence of a disorder or a disability.
Until the recent promise of medical imaging, medical diagnostic tests for LD have proven inadequate over the last century. Thus psychological tests, however flawed in inferring neurological problems, have been the best set of tools available to assess learning performance. Nevertheless, psychological tests are statistical instruments, and using statistical concepts has become accepted practice in fulfilling the law. According to Section 504, a learning disorder becomes a disability when learning performance is “substantially” limited. It is easy to understand, then, how substantially limited has become equivalent to statistically significant. Thus, disability — and therefore eligibility — have been defined as pertaining to those performing two standard deviations away from the mean, or the bottom 2-½%. Some school district definitions set the disability threshold at the bottom 5%, the same total figure of those who would lie either significantly below or above the mean. Thus, numbers have come to define disability.
Be wary of population figures that say that about 2%-5% of the school population has a certain type of learning disability, because disability may have been defined as being the bottom 2%-5% in the first place. This is the reason for the dissatisfaction in the identification process. Indeed, Sally Shaywitz has found the incidence of reading disorders approaching 20% in her Connecticut longitudinal study4, and Ruth Shalev acknowledged that her 5% cutoff underestimates the incidence of dyscalculia especially given that children initially scoring in the bottom 20% continued to perform poorly.5 (You can wonder what would happen if cancer treatments were available to only the worst 2%-5% of the population — depending on the local rules governing eligibility — and in which pessimistic professionals offered little help or hope because recovery seldom occurred.)
Children whose everyday school performance is not failing enough to flag a referral for an evaluation, for whatever reason — nightly tutoring by well-educated or sacrificing parents, extraordinary effort on the part of the child, environmental deprivations precluding any possibility of an unexpected discrepancy, or extreme giftedness compensating for the disorder — will probably not be regarded as falling below a 5% threshold.
- David L. Wodrich, Children’s Psychological Testing: A Guide for Nonpsychologists, 3rd Edition (Baltimore: Paul H. Brookes Publishing, 1997), p. 20.
- Wodrich, Children’s Psychological Testing, p. 336.
- Diagnostic and Statistical Manual of Mental Disorders, 4th Edition, Text Revision (Washington D.C.: American Psychiatric Association, 2000), pp. 49-50.
- Sally E. Shaywitz, “Dyslexia,” The New England Journal of Medicine, 29 January 1998, p. 307.
- Ruth S. Shalev, Orly Manor, Judith Auerbach, and Varda Gross-Tsur, “Persistence of Developmental Dyscalculia: What Counts?” The Journal of Pediatrics, September 1998, pp. 359-361.