The Summary Rating is a multi-measure school quality metric intended to reflect a school’s characteristics and quality across multiple dimensions, ultimately representing the school’s general quality in preparing students for postsecondary success. The Summary Rating is an aggregation of all of a school’s ‘sub-ratings,’ which include Test Score, Student Progress, Academic Progress, Equity, and College Readiness, as well as a flag for Discipline and Attendance issues.
GreatSchools currently produces 4 sub-ratings and one flag, all of which will be included in the Summary rating when available. The Test Score Rating and the Equity Rating are both based on student performance data gathered at the state level. The Student Progress Rating is calculated using state-level student growth data, this rating is replaced with the Academic Progress rating (GreatSchools’ student growth proxy) when state produced student growth is not available. The College Readiness Rating is a multi-measure rating based on high school graduation rate data, as well as college entrance exams (SAT and ACT), participation in advanced courses (Advanced Placement (AP), International Baccalaureate (IB), or Dual Enrollment) as well as AP exam data. Finally, the Discipline and Attendance flag uses data from the Civil Rights Data Collection to identify schools that have high rates of suspension and absenteeism as well as statistically significant differences in these rates between race/ethnic subgroups. Each separate sub-rating reflects a different dimension of school quality and provides distinct value to a school’s Summary Rating.
The construction of the Summary Rating is a two-step weighted average approach. The first step is to establish base weights for each sub-rating based upon a survey of the research linking the indicators measured by each sub-rating to postsecondary success, and/or according to GreatSchools’ mission and philosophy. The second step is to estimate the relative value of each sub-rating for each school in terms of the amount of information about school quality carried in the underlying data. The resulting weights are rebalanced in cases of missing components to calculate the final weighting structure.
The initial base-weights for each sub-rating are based on a meta-analysis of research covering the association between measures included in each sub-rating and postsecondary outcomes. For each sub-rating, we reviewed research literature in which sub-rating inputs were examined for postsecondary and later life outcomes, including college and university enrollment, matriculation, remediation, and the like, as well as incarceration, and wage outcomes where possible.
For all of the sub-rating inputs in our meta-analysis, a 1 – 5 scale indicates the strength of the relationship to postsecondary and later life outcomes, according to the external research. The 1 – 5 scale is cardinal not ordinal, meaning that every input could potentially receive a ‘5’ if the research indicated strong relationships for all inputs. For sub-ratings that consist of multiple different data inputs, the strength measures of the individual inputs were combined and averaged using the same method of aggregation as in the construction of the sub-rating.
In our meta-analysis, we found evidence of relationships connecting postsecondary and later life outcomes with all data inputs, although some more conclusively than others. As expected, a wealth of research points to the correlation between traditional proficiency metrics and a variety of outcomes. Test scores proved to have strong relationships, leading to a strength score of 5 for the Test Score Rating. The corresponding relationships for the data inputs for the College Readiness Rating were mostly strong as well, leading to an aggregated score of 5 for the sub-rating. Research is slightly less strong in connecting student growth with the outcomes in question, so the Student Progress Rating is assigned a score of 4. For the data inputs used to calculate the Equity Rating, the relationships to postsecondary and later life outcomes are weaker than the other inputs but still prevalent. Therefore, each rating received a strength score of 3. To convert the strength scores into base weights, each score was divided by the sum of all sub-rating scores.
In some states, student growth data is not made publicly available. In these cases, the student growth portion (Student Progress) is replaced with GreatSchools’ Growth Proxy rating. This is produced using an unmatched cohort residual gain model for proficiency data. Having found average correlations of 0.6 between this growth proxy and state produced growth data, the Growth Proxy is assigned a weight of 60% of the corresponding rate of the student growth component.
The Summary Rating contains up to four sub-ratings that are based on data gathered from state departments of education. Due to differences in assessments, policies, and other factors, the amount of data used in computing a given sub-rating will vary. In order to correct for this variance, the weight of a sub-rating increases relative to the number of data inputs it contains.
The definition of data inputs differs for each sub-rating. For test scores, each grade-subject test used to compute a school’s test score rating is an input. Similarly, for growth, each grade-subject with growth reported is an input. Most states produce growth data pre-aggregated to the school-level (meaning this would be considered a singular data input); although occasionally data at the grade-subject level is available. The Equity Rating is based on the same data as the Test Score Rating, and therefore each grade-subject tested is an input. College Readiness is based on an aggregation across multiple types of data inputs including graduation rates, College Entrance Exam data and advanced course data. Each of these data elements is considered an input.
To control for the differences in data input granularity across sub-ratings, the number of data inputs are converted into a percentile assessing the relative position of each school in the distribution of data inputs for each sub-rating. This approach means that the measure of the amount of data contained in a given sub-rating for a particular school is relative to the amount of data in that same sub-rating for all other schools in the state.
In addition to the amount of data being leveraged by each sub-rating, the amount of variability contained in each data-point impacts the sub-rating weights. Data points coming from distributions with higher degrees of variation provide a greater ability to differentiate schools from one another.
The information value weight for a sub-rating is based on both the amount of data inputs underlying the sub-rating and the amount of variation contained in that data. The amount of data inputs is calculated at the school-level and is the school’s percentile in the distribution of the count of data inputs for the particular sub-rating. A value of 0.5 is added to this percentile in order to center the distribution at 1 to create an identity multiplier. The amount of variation is calculated by taking the average standard deviation across distributions and scaling this measure of variation by the range of the distributions to arrive at a unit-less measure of variance as the proportion of the total range of a distribution within 1 standard deviation. The two values are multiplied to calculate the information value weight.
Baseline weights adjusted for information value are considered relative weightings that are then rebalanced to sum to one, taking into account missing data. This is done by considering each individual weight in proportion to the sum of all existing weights. The final weight for each sub-rating is calculated at the school level by multiplying the base weight by the information value weight, and dividing by the sum of the base weight, information value weight products for each sub-rating available for the school.
The Summary Rating is then calculated by multiplying each sub-rating by its weight and combining. All values are rounded to the nearest integer, so that all possible Summary Ratings are 1 – 10.
In order to receive a Summary Rating, a school must have either: at least two sub-ratings, or a College Readiness Rating. (This is because the College Readiness Rating is a multi-measure rating based on data from multiple sources.) Any school that does not meet these criteria does not receive a Summary Rating.
The GreatSchools Test Score Rating is computed based upon the percent of students scoring proficient or above on their state’s standardized assessment in each grade and subject. This process includes the computation of both overall school-level test score ratings and test score ratings for subgroups of students within each school.
The school Test Score rating is calculated by first computing a standardized proficiency rate for each school in a state. To do this, we compute the school’s position in each grade-subject distribution as a percentile. Each percentile is calculated by comparing a school’s score for each grade-subject (eg. 4th grade Math, 10th grade Science) to the distribution of scores from all other schools in the state on that same grade-subject test. For example, an elementary school’s rating might be an average of six percentiles (student assessment scores from 3rd, 4th, and 5th grades, in both English and Math). The school’s 3rd grade English percentile would be determined by comparing the school’s score to all other 3rd grade English scores in the state and finding the percentile value.
The average of all data available for a school yields an average percentile ranking for a school’s proficiency rates across grades and subjects. This average is computed using weights to avoid over- or under-representation by any small N grades or subjects that may be present at a school. For most schools, the tests taken are relatively evenly distributed across grades and subjects; however in order to account for cases in which a small group of remedial, or advanced, students are taking an above or below grade level test, or cases in which a specific grade represents a very small proportion of a particular school’s student body, we weight each subject-grade percentile by the number of students tested when computing the average. When data for number of students tested is not available, or the proportion of school-grade-subject records with missing values for number tested is greater than .5, the average percentile defaults to an equally weighted average.
Once we have this weighted average percentile, we check to make sure the school has at least one English and one Math test represented in their average. For those schools that do have at least one English and one Math test, we assign schools a 1-10 value based upon their average percentile (across grade-subjects), with averages between the 1st and 9th percentiles receiving a “1”, those between the 10th and 19th percentiles receiving a “2”, and so on until schools averaging between the 90th and 99th percentiles, which receive a “10”.
To compute subgroup ratings GreatSchools compares the performance of each subgroup-grade-subject in a given school to the performance of all students in the state. For each grade-subject GreatSchools compares the performance (% proficient and above) of students in each subgroup at a school to the distribution of performance of all students in the state in that grade-subject, locating where the performance of these students falls in the overall distribution as a percentile. We then average across these grade-subject percentiles for each subgroup to arrive at an average percentile across grades and subjects for each subgroup in a school.
Once we have the weighted average subgroup percentile, we check to make sure the school has at least one English and one Math test represented in their average. For those schools that do have at least one English and one Math test, we assign each school a subgroup rating by binning average percentiles in the same method used with the overall test score rating.
The GreatSchools Student Progress Rating is a measure of school quality focused on the performance of schools on standardized tests after taking into account factors associated with student academic outcomes that are not directly influenced by school performance. Student growth models vary considerably by state, but attempt to answer the same basic question: how much academic progress are students making at a particular school? Specifically, how much academic progress are students making relative to similar students in the state? The student growth data used by GreatSchools to produce our Student Progress Rating comes from State Departments of Education and are the results of each state’s own student growth model.
While student growth models vary, the same methodology is used to rate all types of continuous growth metrics (e.g., student growth percentiles, value-added scores, net growth, etc.). First, we find the position of each school in the distribution of student growth scores in each grade-subject as a percentile. Each percentile is calculated by comparing a school’s student growth score for each grade-subject (eg. 4th grade Math, 10th grade Science) to the distribution of student growth scores from all other schools in the state on that same grade-subject test.
We then average across all available percentiles for each school yielding an average percentile ranking for a school’s growth across grades and subjects.
Once we have this average percentile, we assign schools a 1-10 value, with averages between the 1st and 9th percentiles receiving a “1”, those between the 10th and 19th percentiles receiving a “2”, and so on until schools averaging between the 90th and 99th percentiles, which receive a “10”.
In states that do not make school-level growth data available, we create a school value-added estimate as a proxy measure for growth using a value-added approach with school-level data. This approach creates an estimate of expected proficiency rates for each grade-subject in each school based upon the various qualities of the school, including the proficiency of students at that school in the same subject in the prior grade and previous year. We then create a measure of distance from this expected value for each grade-subject at a school, and find the position of each school in the distribution of these “distance from expected” scores in each grade-subject as a percentile. Each percentile is calculated by comparing a school’s “distance from expected” score for each grade-subject (eg. 4th grade Math, 10th grade Science) to the distribution of “distance from expected” scores from all other schools in the state on that same grade-subject test. We then average across all available percentiles for each school yielding an average percentile ranking for a school’s growth across grades and subjects.
1. For more information about your state’s specific Student Growth model, check your state Department of Education website.
2. Previous iterations of the Growth Rating methodology averaged growth metrics over two years, when available. While using a multi-year average for any metric reduces its volatility, analysis does not conclude that the Growth Rating is particularly volatile. A simulation of one- and two-year Growth Ratings for Georgia found that 90% of schools’ one-year Growth Ratings were within 2 ± of their two-year Growth Rating.
Once we have this average percentile, we assign schools a 1-10 value, with averages between the 1st and 9th percentiles receiving a “1”, those between the 10th and 19th percentiles receiving a “2”, and so on until schools averaging between the 90th and 99th percentiles, which receive a “10”.
The GreatSchools College Readiness Rating is a measure of school quality focused on assessing the degree to which a school prepares students for entrance into postsecondary education. This rating process includes the computation of both overall school-level college readiness ratings as well as college readiness ratings for subgroups of students within each school. This ratings is comprised of three components:
The school College Readiness rating is calculated in four steps, outlined visually in Figure 1 below. First, each of the inputs available for a particular school is standardized. To do this, we compute the school’s position in the state-wide distribution of each metric as a percentile. Percentiles are calculated by comparing the values for each school’s performance on a particular metric to the state-wide distribution including all other schools in the state.
Next, within each component , an average of available metric percentiles is calculated resulting in a score for each of our three college readiness metric categories (high school graduation, college entrance exam, and advanced course participation).
3. Percent of Students Who Meet UC/CSU Entrance Requirements is a metric unique to California. Similar high school graduation-related metrics, where available, will be included here for other states.
Third, an average across the three components is taken. This approach ensures that equal weight is given to graduation rates, college entrance exams, and advanced courses when calculating a school’s college readiness rating. For schools with no available data in one of the components, the average across the other two components is taken. No combined average percentile is calculated for schools without data in at least two of the components, and these schools do not receive a college readiness rating.
Finally, the combined average percentiles are sorted low to high and converted into deciles. The bottom decile (1st – 9th percentiles) of schools receive an overall rating of “1”, the second decile (10 – 19th percentile) receive a rating of “2”, and so on, with the top decile (90 – 99th percentile) receiving a rating of “10”.
College Readiness Subgroup ratings are calculated using a similar process as overall ratings. Each metric is converted to a percentile, the percentiles are averaged within each component, and an average is taken across the components. Unlike the percentiles calculated for the overall rating however, subgroup metrics are converted to percentiles by comparing scores for a particular subgroup to the scores for the overall population throughout the state. For example, a school’s AP Course Participation rate for Hispanic students is converted into a percentile by comparing that rate to the AP Course Participation for all students in schools across the state. This approach allows the subgroup ratings to reflect the differences between subgroups within a school and those across the state.
After values are converted to percentiles, the percentiles are averaged within each component, and then an average across the components is taken. Unlike the process for the overall rating, the combined average percentiles are not sorted into deciles but instead are converted into ratings based on their value. Schools with a subgroup combined average percentile between 0 and 0.09 receive a rating of “1” for that subgroup. Combined average percentiles between 0.1 and 0.19 result in a subgroup rating of “2,” and so on, with a combined average percentile between 0.9 and 1 resulting in a subgroup rating of “10.” Because the subgroup ratings reflect a comparison to all students (and not a comparison to the subgroup), there will not be a uniform number of each rating.
Ratings are calculated for each school and for each subgroup (Hispanic, Asian, etc.) within a school, where sufficient data is available. To calculate a rating for a school or subgroup, data must be available for two of the three components, or no rating will be assigned. Additionally, schools designated as ‘Alternative’ by the Civil Rights Data Collection (CRDC) do not receive a rating and do not factor into other schools’ ratings.
If data from one of the metrics is only available for all students, and not disaggregated at the subgroup level, the data will still be used for calculating the overall rating. If non-disaggregated data results in an entire component not having data at the subgroup level, the combined average percentile for subgroups will be recalculated by comparing to a restricted combined average percentile for all students, which only consists of those components where subgroup data is available. For example, if college entrance exam data were not available for subgroups, the combined average percentile for subgroups would be an average of only two components: the high school graduation component and the advanced courses component. Because the College Entrance Exam component would factor into the overall rating but not the subgroup rating, the ratings at the school-level could be very different depending on college entrance exam performance. If a school’s college entrance exam metrics were well above average, that would increase the school’s overall rating but not their subgroup ratings, which would not be an accurate reflection of subgroup performance compared to overall performance. To correct for this issue, a restricted combined average percentile for all students is calculated, which only includes the high school graduation component and advanced courses component. Each combined average percentile for subgroups is recalculated by comparing to the restricted version for all students. This process produces subgroup ratings that more accurately reflect the differences between overall and subgroup performance.
The GreatSchools Equity Rating is a measure of a school’s success in serving disadvantaged students and in ensuring all students achieve the same level of academic performance. While there are multiple dimensions to consider in calculating a metric that reflects a school’s equity, the Equity Rating is computed based upon the performance of disadvantaged groups and relative size of in-school gaps. These two components allow us to evaluate a school’s success in educating disadvantaged groups compared to students throughout the state, as well as compared specifically to other students at the school. The two components are computed and combined in a series of four steps: 1. Determining statewide disadvantaged groups, 2. Aggregating school performance across disadvantaged groups, 3. Calculating in-school gap-weights, 4. Computing the equity rating.
When looking at equity in a broad sense, the Equity Rating specifically considers the gaps between more advantaged subgroups and disadvantaged subgroups as well as the overall performance of disadvantaged subgroups. A subgroup is classified as ‘disadvantaged’ or not by calculating average performance gaps between all subgroups and identifying those who face persistent gaps across schools, subgroup pairs, grades, and subjects.
The first step to compute gaps between subgroups is to compare the performance of each subgroup-grade-subject in a given school to the performance of all students in the state. For each grade-subject, the performance (% proficient and above) of students in each subgroup at a school is compared to the distribution of performance of all students in the state in that grade-subject, locating where the performance of these students falls in the overall distribution as a percentile. Next, at the school, grade, subject level, the difference between percentiles are calculated between each pair of subgroups. Across schools, percentile differences are averaged at the grade, subject level to find the state average percentile differences between subgroups at the grade, subject level. Then, each grade, subject is averaged across each subgroup row to calculate the average percentile difference across subgroup pairings for each subgroup at each grade, subject level. Finally, the average for each subgroup is calculated across grade, subject levels to find the average gap each subgroup faces across schools, subgroup pairs, grades, and subjects. The subgroups for whom this number is negative are the disadvantaged groups for the purposes of the Equity Rating.
The core of the Equity Rating is a school’s performance with disadvantaged groups. To assess a school’s level of performance with a particular subgroup, all scores are converted to percentiles by comparing to the state distribution of scores for all students at the grade, subject level. For each disadvantaged subgroup at each school making up at least 5% of student enrollment, an average is calculated of all these percentiles. These disadvantaged subgroup percentile scores are then averaged across subgroups to the school level to create an average disadvantaged subgroup performance measure. For all subgroups beyond the 5% threshold, all percentiles are equally weighted in a school’s average.
A school’s average percentile score for disadvantaged groups does not reveal whether or not in-school gaps in performance between subgroups exist at the school. Adding a measure of in-school performance gaps to the Equity Rating, and then weighting that measure, creates a more robust metric of a school’s success in providing equitable education.
These gap measures are calculated for both ethnicity and income subgroups separately. Similar to the average percentile for disadvantaged groups, all subgroups which comprise 5% or more of student enrollment are included for a school.
The first step in this process is the creation, at the school-grade-subject level, of subgroup percentile differences, just as we do in the first step of identifying disadvantaged subgroups. At the grade, subject level, any school data without at least one disadvantaged group and one non-disadvantaged group is not used. Each school’s data is then subsetted to include only gaps between disadvantaged subgroups and non-disadvantaged subgroups. Then, the state average gaps are subtracted from the gaps observed at the school to get differences between gaps in the school and gaps in the state. Finally, weights are calculated to give schools more credit for closing performance gaps which are larger at the state level. The weights are calculated by dividing the absolute value of state level gaps between each disadvantaged and non-disadvantaged group by the sum of those gaps. To apply the weights to the school, grade, subject level data, the weights are pared down to the corresponding subgroups and normalized so the weights sum to 1. The weights are multiplied by the difference from the state average, and then the average is calculated. This value is the gap score for the school, grade, subject level. The process is identical for income matrices, but no weights matrix is used because only one disadvantaged and one non-disadvantaged subgroup exist.
The gap score at the school, grade, subject level is the average difference in percentile gaps across pairs of disadvantaged and non-disadvantaged subgroups, weighted by the size of the state level percentile gap relative to other state level percentile gaps. Keeping income and ethnicity separate, the average across grade, subject levels is calculated for each school.
Given that several ethnicity subgroups exist compared to only two income subgroups, the distributions of ethnicity gap scores and income gap scores can be quite different. To account for this difference, gap scores are standardized by being converted to a z-score with a mean of 1. The average of the ethnicity z-score and income z-score is the school’s combined gap score. To be used as a weight, the gap score cannot have negative values and should be centered at a value of 1. Therefore, to calculate the gap-weight, the combined gap score is converted to a percentile and a value of 0.5 is added, which creates a midpoint of 1, preserving all other aspects of the distribution. Any school without a gap score (for example, a school serving only disadvantaged subgroups and therefore not having any applicable gaps) is assigned a gap-weight of 1. This process results in a measure giving schools which have performance gaps smaller than the state average a gap-weight greater than 1, schools with larger gaps a gap-weight less than 1, and schools with gaps equal to the state average, or those serving only disadvantaged subgroups a gap-weight of exactly 1.
Taking the product of the disadvantaged subgroup performance component computed in step 2 above and the gap-weight computed in step 3, for each school s gives us our equity score. To compute the Equity rating, the equity score is then converted into a 1 – 10 rating, where a ‘1’ is assigned any number between 0 and 0.09, a ‘2’ is assigned any number between 0.10 and 0.19, and so on. It is possible for a high-performing school with smaller than average gaps to receive a value greater than 1, and in this case, the school receives an Equity rating of 10.
Equity Ratings are calculated at the school-level, for every school with sufficient available data.
If a school has test score data but does not have test score data for any disadvantaged subgroups, the school is assigned the average Equity Rating for schools which have the same Test Score Rating. This lack of data could be due to the school not serving significant proportions of disadvantaged students or to missing data, and the estimated rating, called the Equity Adjustment Factor, allows for the Summary Ratings of schools with and without Equity Ratings to be more easily comparable.
The GreatSchools Discipline and Attendance Flags are indicators GreatSchools uses to identify schools with worrisome patterns of suspension and chronic absenteeism in their student body. Creating these flags involve two primary steps: 1) identify schools with high rates of suspension or absenteeism, 2) identify schools with significant differences in suspension or chronic absenteeism rates between race/ethnicity subgroups.
For both discipline (measured as out of out-of-school school suspension rates) and chronic absenteeism (measured as the proportion of students absent 15 days or more), we create separate flags for schools in four steps. First, due to severe right-skewing in this data, we remove schools with zero outcomes. Second, in order to deal with the fact that absenteeism and suspension are both significantly higher in secondary grades than primary or middle grades, we separate schools into “levels” based on their grade composition. Next, we divide these distributions into quartiles assigning schools a 1-4 value based upon which quartile they fall into, with first quartile (1-24th percentiles) receiving a “1”, the second quartile (25-49th percentile) receiving a “2”, and so on until the top quartile which receives a “4”. Finally, schools with zero outcomes are re-assigned into the first quartiles with a rating of “1”.
For both discipline and chronic absenteeism, we also examine the extent to which rates in each school are different between subgroups of students at that school. To do this we conduct statistical tests of independence between each of suspension and absenteeism rates and race/ethnic subgroups at each school for which there are at least 50 enrolled students. These tests are done using Chi-Squared tests on contingency tables formed by cross tabulating counts of student suspended and not-suspended (or chronically absent and not chronically absent) by subgroup. For instances in which contingency tables resulted in cells with expected counts less than 5, we substituted Fisher’s Exact test for the Chi-Squared test.
We create flags for worrisome patterns of suspension and chronic absenteeism, using the combined results from the quartile rating and the subgroup differences tests. Specifically, schools receive a flag if their rates put them in the top quartile for suspension or absenteeism and they also show statistically significant differences in these high rates across race/ethnic subgroup categories.