Help for package QuantileGradeR

Title:

Quantile-Adjusted Restaurant Grading

Version:

0.1.1

Date:

2017-02-06

Author:

Zoe Ashwood <zashwood@law.stanford.edu>, Becky Elias <Becky.Elias@kingcounty.gov>, Daniel E. Ho <dho@law.stanford.edu>

Maintainer:

Zoe Ashwood <zashwood@law.stanford.edu>

Description:

Implementation of the food safety restaurant grading system adopted by Public Health - Seattle & King County (see Ashwood, Z.C., Elias, B., and Ho. D.E. "Improving the Reliability of Food Safety Disclosure: A Quantile Adjusted Restaurant Grading System for Seattle-King County" (working paper)). As reported in the accompanying paper, this package allows jurisdictions to easily implement refinements that address common challenges with unadjusted grading systems. First, in contrast to unadjusted grading, where the most recent single routine inspection is the primary determinant of a grade, grading inputs are allowed to be flexible. For instance, it is straightforward to base the grade on average inspection scores across multiple inspection cycles. Second, the package can identify quantile cutoffs by inputting substantively meaningful regulatory thresholds (e.g., the proportion of establishments receiving sufficient violation points to warrant a return visit). Third, the quantile adjustment equalizes the proportion of establishments in a flexible number of grading categories (e.g., A/B/C) across areas (e.g., ZIP codes, inspector areas) to account for inspector differences. Fourth, the package implements a refined quantile adjustment that addresses two limitations with the stats::quantile() function when applied to inspection score datasets with large numbers of score ties. The quantile adjustment algorithm iterates over quantiles until, over all restaurants in all areas, grading proportions are within a tolerance of desired global proportions. In addition the package allows a modified definition of "quantile" from "Nearest Rank". Instead of requiring that at least p[1]% of restaurants receive the top grade and at least (p[1]+p[2])% of restaurants receive the top or second best grade for quantiles p, the algorithm searches for cutoffs so that as close as possible p[1]% of restaurants receive the top grade, and as close as possible to p[2]% of restaurants receive the second top grade.

URL:

http://www.kingcounty.gov/depts/health/environmental-health/food-safety/inspection-system/food-safety-rating.aspx

Depends:

R (≥ 3.2.3)

License:

GPL-2 | GPL-3 [expanded from: GPL (≥ 2)]

LazyData:

TRUE

RoxygenNote:

5.0.1

Imports:

stats

NeedsCompilation:

Packaged:

2017-02-06 18:47:12 UTC; zoeashwood

Repository:

CRAN

Date/Publication:

2017-02-06 21:22:48

Example Inspection Scores Matrix.

Description

A small dataset of inspection scores.

Usage

X.kc

Format

A matrix with 4 columns and ~1500 rows, where each row represents one business and each column is one inspection cycle. X.kc[i,j] represents the inspection score for the ith restaurant in the jth most recent inspection.

Details

X.kc contains restaurant inspection information from 11 randomly chosen ZIP codes in the King County (WA) jurisdiction. Establishments and ZIP codes are masked. Inspection information is limited to the 01-01-2012 to 03-25-2016 time period.

Create Cutoffs Dataframe

Description

createCutoffsDF is an internal function, which creates a dataframe with identical cutoff values for all ZIP codes (if type = "unadj"), or quantile cutoffs in a ZIP code (if type = "perc" or type = "perc.resolve.ties"). This function is called extensively by the findCutoffs function.

Usage

createCutoffsDF(X, z, gamma, type)

Arguments

X

Numeric matrix of size n x p, where n is the number is restaurants to be graded and p is the number of inspections to be used in grade assignment. Entry X[i,j] represents the inspection score for the ith restaurant in the jth most recent inspection.

z

Character vector of length n representing ZIP codes (or other subunits within a jurisdiction). z[i] is the ZIP code corresponding to the restaurant with inspection scores in row i of X.

gamma

Numeric vector representing absolute grade cutoffs or quantiles, depending on type variable value. Entries in gamma should be increasing, with gamma[1] <= gamma[2] etc (this is related to the "Warning" section and larger scores being associated with higher risk). If type = "perc" or type = "perc.resolve.ties", gamma values represent quantiles and should take on values between 0 and 1.

type

Character string that is one of "unadj", "perc", or "perc.resolve.ties", and that indicates the grading algorithm to be implemented.

Details

createCutoffsDF takes in a matrix of restaurants' scores and a vector corresponding to restaurants' ZIP codes, and outputs a data frame of cutoff scores to be used in grade classification. The returned ZIP code cutoff data frame has one row for each unique ZIP code and has (length(gamma)+1) columns, corresponding to one column for the ZIP code name, and (length(gamma)) cutoff scores separating the (length(gamma)+1) grading categories. Across each ZIP code's row, cutoff scores increase and we assume, as in the King County (WA) case, that greater risk is associated with larger inspection scores. (If scores are decreasing in risk, users should transform inspection scores before utilizing functions in the QuantileGradeR package with a simple function such as f(score) = - score.)

The way in which cutoff scores are calculated for each ZIP code depends on the value of the type variable. The type variable can take one of three values (see later).

Modes

type = "unadj" creates a ZIP code cutoff data frame with the same cutoff scores (meaningful values in a jurisdiction's inspection system that are contained in the vector gamma) for all ZIP codes. This ZIP code data frame can then be used to carry out "unadjusted" grading, in which a restaurant's most recent routine inspection score is compared to these cutoffs.

type = "perc" takes in a vector of quantiles, gamma, and returns a data frame of the scores in each ZIP code corresponding to these quantiles (using the "Nearest Rank" definition of quantile).

type = "perc.resolve.ties" takes in a vector of quantiles, gamma, and instead of returning (for B/C cutoffs, for example) the scores in each ZIP code that result in at least (gamma[2] x 100)% of restaurants in the ZIP code scoring less than or equal to these cutoffs, type = "perc.resolve.ties" takes into account the fact that ties exist in ZIP codes. Returned scores for A/B cutoffs are those that result in the closest percentage of restaurants in the ZIP code scoring less than or equal to the A/B cutoff to the desired percentage, (gamma[1] x 100)%. Similarly, B/C cutoffs are the scores in the ZIP code that result in the closest percentage of restaurants in the ZIP code scoring less than or equal to the B/C cutoff and more than the A/B cutoff to the desired percentage, ((gamma[2] - gamma[1]) x 100)%.

Find Cutoff Values.

Description

findCutoffs applies a quantile adjustment to inspection scores within a jurisdiction's subunits (e.g. ZIP codes) and creates a data frame of cutoff values to be used for grading restaurants or other inspected entities.

Usage

findCutoffs(X, z, gamma, resolve.ties = TRUE, restaurant.tol = 10,
  max.iterations = 20)

Arguments

X

z

Character vector of length n representing ZIP codes (or other subunits within a jurisdiction). z[i] is the ZIP code corresponding to the restaurant with inspection scores in row i of X.

gamma

Numeric vector representing absolute grade cutoffs. Entries in gamma should be increasing, with gamma[1] <= gamma[2] etc (this is related to the "Warning" section and larger scores being associated with higher risk).

resolve.ties

Boolean value that determines the definition of quantile to be used after optimal quantiles have been found with the percentileSeek function. See Modes below, as well as Appendix J of Ho, D.E., Ashwood, Z.C., and Elias, B. "Improving the Reliability of Food Safety Disclosure: A Quantile Adjusted Restaurant Grading System for Seattle-King County".

restaurant.tol

An integer indicating the maximum difference in the number of restaurants in a grading category between the unadjusted and adjusted grading algorithms (for the top length(gamma) grading categories).

max.iterations

The maximum number of iterations that the iterative algorithm (carried out by the internal percentileSeek function) should run in order to find optimal quantiles for ZIP cutoffs. The iterative algorithm is described in more detail below.

Details

In our documentation, we use the language "ZIP code" and "restaurant", however, our grading algorithm and our code can be applied to grade other inspected entities; and quantile cutoffs can be sought in subunits of a jurisdiction that are not ZIP codes. For example, it may make sense to search for quantile cutoffs in an inspector's allocated inspection area or within a census tract. We chose to work with ZIP codes in our work because area assignments for inspectors in King County (WA) tend to be single or multiple ZIP codes, and we desired to assign grades based on how a restaurant's scores compare to other restaurants assessed by the same inspector. We could have calculated quantile cutoffs in an inspector's allocated area, but inspector areas are not always contiguous. Because food choices are generally local, ZIP codes offer a transparent and meaningful basis for consumers to distinguish establishments. Where "ZIP code" is referenced, please read "ZIP code or other subunit of a jurisdiction" and "restaurant" should read "restaurant or other entity to be graded".

findCutoffs takes in a vector of cutoff scores, gamma, a matrix of restaurants' scores, X, and a vector corresponding to restaurants' ZIP codes, z, and outputs a data frame of cutoff scores to be used in the gradeAllBus function to assign grades to restaurants. findCutoffs first carries out "unadjusted grading" and compares restaurants' most recent routine inspection scores to the raw cutoff scores contained in gamma and assigns initial grades to restaurants. Grade proportions in this scheme are then used as initial quantiles to find quantile cutoffs in each ZIP code (or quantile cutoffs accommodating for the presence of score ties in the ZIP code, depending on the value of resolve.ties; see the Modes section). Restaurants are then graded with the ZIP code quantile cutoffs, and grading proportions are compared with grading proportions from the unadjusted system. Quantiles are iterated over one at a time (by the internal percentileSeek function, which uses a binary search root finding method) until grading proportions with ZIP code quantile cutoffs are within a certain tolerance (as determined by restaurant.tol) of the unadjusted grading proportions. This iterative step is important because of the discrete nature of the inspection score distribution, and the existence of large numbers of restaurants with the same inspection scores.

The returned ZIP code cutoff data frame has one row for each unique ZIP code and has (length(gamma)+1) columns, corresponding to one column for the ZIP code name, and (length(gamma)) cutoff scores separating the (length(gamma)+1) grading categories. Across each ZIP code's row, cutoff scores increase and we assume, as in the King County (WA) case, that greater risk is associated with larger inspection scores. (If scores are decreasing in risk, users should transform inspection scores with a simple function such as f(score) = - score before using any of the functions in QuantileGradeR.)

Modes

When resolve.ties = TRUE, in order to calculate quantile cutoffs in a ZIP code, we alter the definition of quantile from the usual "Nearest Rank" definition and use the "Quantile Adjustment (with Ties Resolution)" definition that is discussed in Appendix J of Ho, D.E., Ashwood, Z.C., and Elias, B. "Improving the Reliability of Food Safety Disclosure: A Quantile Adjusted Restaurant Grading System for Seattle-King County" (working paper). In particular, once we have found the optimal set of quantiles to be applied across ZIP codes, p, with the percentileSeek function, instead of returning (for B/C cutoffs, for example) the scores in each ZIP code that result in at least (p[2] x 100)% of restaurants in the ZIP code scoring less than or equal to these cutoffs, the mode resolve.ties = TRUE takes into account the ties that exist in ZIP codes. Returned scores for A/B cutoffs are those that result in the closest percentage of restaurants in the ZIP code scoring less than or equal to the A/B cutoff to the desired percentage, (p[1] x 100)%. Similarly, B/C cutoffs are the scores in the ZIP code that result in the closest percentage of restaurants in the ZIP code scoring less than or equal to the B/C cutoff and more than the A/B cutoff to the desired percentage, ((p[2] - p[1]) x 100)%.

When resolve.ties = FALSE, we use the usual "Nearest Rank" definition of quantile when applying the optimal quantiles, p, across ZIP codes.

Warning

findCutoffs will produce cutoff scores even for ZIP codes with only one restaurant: situations in which a quantile adjustment shouldn't be used. It is the job of the user to ensure that, if using the findCutoffs function, it makes sense to do so. This may involve only performing the quantile adjustment on larger ZIP codes and providing absolute cutoff points for smaller ZIP codes, or may involve aggregating smaller ZIP codes into a larger geographical unit and then performing the quantile adjustment on the larger area (the latter approach is the one we adopted).

As mentioned previously, findCutoffs was created for an inspection system that associates greater risk with larger inspection scores. If the inspection system of interest associates greater risk with reduced scores, it will be neccessary to perform a transformation of the scores matrix before utilizing the findCutoffs function. However a simple function such as f(score) = - score would perform the necessary transformation.

Examples


## ==== Quantile-Adjusted Grading =====
## ZIP Code Cutoffs
# In King County, meaningful scores in the inspection system are 0 and 30:
# more than 50% of restaurants score 0 points in a single inspection round,
# and 30 is the highest score that a restaurant can be assigned before it is
# subject to a return inspection, hence these values form our gamma vector.
# The output dataframe, zipcode.cutoffs.df, has ten rows and three columns: one
# row for every unique ZIP code in zips.kc, one column for the ZIP name, the
# second column for the A/B cutoff (Gamma.A) and the third column for the B/C
# cutoff (Gamma.B).

 zipcode.cutoffs.df <- findCutoffs(X.kc, zips.kc, gamma = c(0, 30))

## ==== Traditional Grading Systems ====
## ZIP Code Cutoffs
# Traditional (unadjusted) restaurant grading systems use the same cutoff scores
# for all ZIP codes. To allow comparison, an unadjusted ZIP code cutoff frame
# for King County is generated by the internal createCutoffsDF function:

 unadj.cutoffs.df <- createCutoffsDF(X.kc, zips.kc, gamma = c(0, 30), type = "unadj")

Grade Businesses.

Description

gradeAllBus takes in a vector of business inspection scores, business ZIP codes and a data frame of ZIP code cutoff scores (generated by the findCutoffs function) and returns a vector of business grades.

Usage

gradeAllBus(scores, z, zip.cutoffs)

Arguments

scores

Numeric vector of length n, where n is the number is restaurants to be graded. Each entry is the inspection score for one business.

z

Character vector of length n, where each entry is the ZIP code (or other geographic area) of a business. The order of businesses in z is the same as the order of businesses in scores.

zip.cutoffs

A dataframe with the first column containing all of the ZIP codes in z and later columns containing cutoff scores for each ZIP code for grade classification. Cutoff scores for each ZIP code should be ordered from lowest score in column 2 (representing the cutoff for the best grade) to the largest cutoff score in the final column (representing the cutoff inspection score for the second worst grade). This dataframe will most likely have been generated by the findCutoffs function.

Details

As explained in the findCutoffs documentation, we use the language "ZIP code" and "restaurant", however, our grading algorithm can be applied to grade other inspected entities. As with findCutoffs, where "ZIP code" is referenced, please read "ZIP code or other subunit of a jurisdiction" and "restaurant" should read "restaurant or other entity to be graded".

gradeAllBus takes a vector of inspection scores (one score for each restaurant: the score can be a mean across multiple inspections or the result of a single inspection), a vector of ZIP codes and a dataframe of ZIP code cutoffs (most likely generated by the findCutoffs function). It compares each restaurant's inspection score to cutoff scores in the restaurant's ZIP code. It finds the smallest cutoff score in the restaurant's ZIP code that the restaurant's inspection score is less than or equal to - let's say this is the (letter.index)th cutoff score - and returns the (letter.index)th letter of the alphabet as the grade for the restaurant. The returned vector of grades maintains the order of businesses in vector inputs scores and in z).

Value

A character vector of length n, with each entry corresponding to the grade that the restaurant received.

Examples



## ===== Quantile-Adjusted Grading =====
## ZIP Code Cutoffs (see findCutoffs documentation for an explanation of how
## these are calculated)

 zipcode.cutoffs.df <- findCutoffs(X.kc, zips.kc, gamma = c(0, 30))

## In King County, we use a restaurant's mean inspection score over the last
## four inspections for grading (see Ho, D.E.,
## Ashwood, Z.C., and Elias, B. "Improving the Reliability of Food Safety
## Disclosure: A Quantile Adjusted Restaurant Grading System for Seattle-King
## County" (working paper)). Calculate these mean scores:

 mean.scores <- rowMeans(X.kc, na.rm = TRUE)

## We then use the mean scores and the zipcode.cutoffs.df dataframe to perform
## grading:

 adj.grades <- gradeAllBus(mean.scores, zips.kc, zipcode.cutoffs.df)


## ===== Traditional Grading Systems =====
## For comparison, calculate grades as if we had used a traditional grading
## system in King County, with 0 and 30 as the A/B and B/C cutoffs for all ZIP
## codes.

## Cutoffs:

 unadj.cutoffs.df <- createCutoffsDF(X.kc, zips.kc, gamma = c(0, 30), type = "unadj")

## Grades (traditional grading systems only use the most recent inspection score
## for grading):

 unadj.grades <- gradeAllBus(scores = X.kc[,c(1)], zips.kc, zip.cutoffs = unadj.cutoffs.df)


## ===== Comparison: Quantile-Adjusted Grading and Traditional Grading ===
## Proportion of restaurants in each grading category varies dramatically
## between ZIPs in traditional compared to quantile-adjusted grading; these
## differences do not reflect sanitation differences, but rather differences in
## stringency across inpectors (see: Ho, D.E., Ashwood, Z.C., and Elias, B.
## "Improving the Reliability of Food Safety Disclosure: A Quantile Adjusted
## Restaurant Grading System for Seattle-King County" (working paper)).
## Tabulate restaurants in each ZIP code in each grading category and then
## divide by total number of restaurants in each ZIP to obtain proportions.
## Proportions are rounded to 2 decimal places.

## Traditional Grading

 foo1 <- round(table(zips.kc, unadj.grades)/apply(table(unadj.grades, zips.kc), 2, sum), 2)

## Quantile-Adjusted Grading

 foo2 <- round(table(zips.kc, adj.grades)/apply(table(adj.grades, zips.kc), 2, sum), 2)

Grade a Business.

Description

gradeBus takes in the inspection score for one restaurant, the ZIP code for the restaurant, a data frame of ZIP code cutoff information and returns the grade for the business in question.

Usage

gradeBus(x.bar.i, z.i, zip.cutoffs)

Arguments

x.bar.i

Numeric inspection score (or mean score) for restaurant in question.

z.i

Character representing ZIP code (or other geographic area) of business in question.

zip.cutoffs

A dataframe with the first column containing ZIP codes and later columns containing grade cutoff scores for each ZIP code. Cutoff scores for each ZIP code should be ordered from lowest score in column 2 (representing the cutoff for the best grade) to largest cutoff score in the final column (representing the cutoff inspection score for the second worst grade).

Details

gradeBus takes one inspection score for a restaurant (this may be a mean or the result of a single inspection), the restaurant's ZIP code and a dataframe of ZIP code cutoffs. It compares each restaurant's inspection score to cutoff scores in the restaurant's ZIP code. It finds the smallest cutoff score in the restaurant's ZIP code that the restaurant's inspection score is less than or equal to - let's say this is the (letter.index)th cutoff score - and returns the (letter.index)th letter of the alphabet as the grade for the restaurant. gradeBus is the function called by gradeAllBus in order to grade all businesses.

Value

A character representing the grade assigned to the restaurant in question ('A', 'B', 'C' etc).

Find percentile values (to match a set of global proportions).

Description

percentileSeek returns a set of percentiles to be applied across subunits (e.g. ZIP codes) of a larger area (e.g. a jurisdiction), so as to rank items within each subunit (e.g. restaurants) and group these items into grade categories. percentileSeek allows the user to set the desired global proportion of items in each grade category.

Usage

percentileSeek(scores, z, desired.props, restaurant.tol = 10,
  max.iterations = 20, resolve.ties = FALSE)

Arguments

scores

Numeric vector of size n, where n is the number is restaurants to be graded. scores[i] represents the mean or raw inspection score for restaurant i.

z

Character vector representing ZIP codes. z[i] is the ZIP code for restaurant i.

desired.props

Numeric vector representing desired global grade proportions across the entire jurisdiction. desired.props[j] is the desired proportion of total (gradeable) restaurants in the jth highest grading category.

restaurant.tol

Integer value representing the maximum difference in number of restaurants suggested by desired.props and the actual number of restaurants in each of the top (length(desired.props) - 1) grade categories.

max.iterations

Integer value specifying the maximum number of calls of the updateGamma percentile update function for each of the sought after percentiles.

resolve.ties

Boolean value specifying interpretation of how the function's returned percentiles will be applied across subunits. Should as close to (desired.props[1])% of restaurants in a ZIP code receive an "A" grade, and as close to (desired.props[2])% of restaurants in a ZIP code receive "B" grades (resolve.ties = TRUE case)? Or should the returned percentiles be interpretted as R quantile Type = 1 percentiles, and at least (desired.props[1])% of restaurants in a ZIP code receive "A" grades?

Details

In our documentation, we use the language “ZIP code” and “restaurant”, however, our algorithms and code can be applied much more broadly to other inspected or scored entities; and percentile cutoffs can be sought in subunits (of a larger area) that are not ZIP codes. Where “ZIP code” is referenced, please read “ZIP code or other subunit of a larger area” and “restaurant” should read “restaurant or other entity to be graded”.

percentileSeek was designed for situations in which a significant number of ties in the scores of items within subunits (e.g. ties in restaurant inspection scores in ZIP codes) result in the obvious choice of percentiles (namely those obtained from the desired proportions) not yielding the desired proportions globally. percentileSeek will iterate over different values for the first percentile (using the update process described in the updateGamma documentation) until the proportion of (gradeable) restaurants scoring “A” grades (when ZIP cutoffs are percentile values) is within (restaurant.tol/ no.gradeable.rests) of the desired proportion of As, where no.gradeable.rests is the number of gradeable restaurants, and gradeable restaurants are those that have both ZIP code and inspection score information. The algorithm will then seek to find a larger percentile to match the proportion of gradeable restaurants scoring “B” grades with the desired proportion of Bs and so on, until the proportions of restaurants gaining the top (lengh(desired.props) - 1) grades are within the required tolerance of their desired proportions. Note: there is thus no requirement that the proportion of restaurants gaining the worst grade matches the desired proportion for worst grade - these can be quite different (depending on the number of restaurants being graded and the number of grade categories) and no error will be reported.

Of course, percentileSeek can only find a solution if one exists. It could be the case that it is simply not possible with a particular set of scores to match the desired proportions. We have included some failsafes to catch some of the simplest instances in which no solution will exist. For instance, one possible reason for failure is selecting a desired proportion of “A” grades that is below the global minimum proportion of “A”s. Totaling the number of restaurants with the best inspection scores in their ZIP codes and dividing by the number of gradeable restaurants provides the global minimum proportion of “A”s. Running percentileSeek can be a useful way to test whether a solution is likely to exist. If reported results of the percentileSeek function are outwith the standard [0, 1] interval for percentiles, or if the number of iterations exceeds the maximum number of iterations, this could be indicative that no solution exists.

An example of when the percentileSeek function could be used outside of the restaurant context is if you were tasked with finding the top 3 percent of students in a state. We know that each school has its own GPA system and so comparing students by raw GPA does not make sense. We could thus desire to perform a percentile adjustment at each school and select the top 3 percent of students at each school. Unfortunately, some schools do not utilize the full spectrum of GPA scores available and so it may be the case that the top 5 percent of students at school 1 have the same GPA and cannot be distinguished from one another. Using percentileSeek with each restaurant replaced by a student, each restaurant's inspection score replaced by the student's GPA and each ZIP code replaced by a school, we could investigate whether it is possible to satisfy the 3 percent globally desired proportion. percentileSeek would reduce the percentile applied across schools (from the initial 3 percent), which would still select the 5 percent of students at school 1 for nomination, but would try to take advantage of the fact that some schools do use more of their GPA scale. Of course, issues of fairness do arise and one wonders why school 2, which distinguishes its students better than school 1, should have fewer students represented in the globally selected 3 percent. We only advocate the use of percentileSeek for situations in which there is good reason to demand certain global proportions. In the school selection case, this may be that there are only finite resources available to be given to the top 3 percent of students and it is simply not possible to extend these resources to the top 3 percent of students at each school. In the restaurant case, we desire to select the top restaurants in each ZIP code to be assigned an 'A' grade; however we also do not want to design a grading system that is seen to inflate grades compared to an unadjusted grading system (one based on absolute uniform grade cutoffs across the whole jurisdiction).

Value

A numeric vector with the percentiles to be applied to each ZIP code so as to achieve the desired proportion of grades.

Update Gamma (percentile value).

Description

updateGamma is the percentile update function called by percentileSeek as percentileSeek attempts to match a set of grade proportions to a set of desired.props.

Usage

updateGamma(scores, z, desired.props, gamma.perc, index.to.update,
  restaurant.tol = 10, iter = 1, max.iterations = 20, gamma_upper = NA,
  gamma_lower = NA, resolve.ties = FALSE)

Arguments

scores

Numeric vector of size n, where n is the number is restaurants to be graded. scores[i] represents the mean or raw inspection score for restaurant i.

z

Character vector representing ZIP codes. z[i] is the ZIP code for restaurant i.

desired.props

gamma.perc

Numeric vector representing an initial set of percentiles.

index.to.update

Integer value in the set 1:(length(desired.props)-1) that represents the particular percentile to be updated in the current run of updateGamma. (Percentiles are not updated simultaneously, but rather are updated sequentially with the smallest percentiles being the first to be updated.)

restaurant.tol

iter

Integer value representing the current iteration of updateGamma.

max.iterations

Integer value specifying the maximum number of calls of the updateGamma percentile update function for each of the sought after percentiles.

gamma_upper

Numeric or NA value representing a value of gamma.perc[index.to.update] that results in too many restaurants gaining the desired grade proportion.

gamma_lower

Numeric or NA value representing a value of gamma.perc[index.to.update] that results in too few restaurants gaining the desired grade proportion.

resolve.ties

Boolean value specifying interpretation of how the function's returned percentile will be applied across the subunits see: percentileSeek. Should as close to (desired.props[1])% of restaurants in a ZIP code receive an "A" grade, and as close to (desired.props[2])% of restaurants in a ZIP code receive "B" grades (resolve.ties = TRUE case)? Or should the returned percentiles be interpretted as R quantile Type = 1 percentiles, and at least (desired.props[1])% of restaurants in a ZIP code receive an "A" grade?

Details

updateGamma performs the update of gamma.perc[index.to.update]. In particular, gamma.perc[index.to.update] will be updated until either the number of updates has reached max.iterations, or the difference between the proportion of (gradeable) restaurants scoring the (index.to.update)th highest grade is within (restaurant.tol/ no.gradeable.rests) of the desired proportion, where no.gradeable.rests is the number of gradeable restaurants (restaurants that have both ZIP code and inspection score information). Initially, gamma.perc[index.to.update] is updated according to the rule gamma.perc[index.to.update] <- (gamma.perc[index.to.update] - diff.aj.desired), where diff.aj.desired is the difference between the actual proportion of restaurants assigned the grade of interest and the desired proportion. However, if the algorithm locates values of gamma.perc[index.to.update] that produce grade proportions that are both higher and lower than the desired proportion, gamma_upper and gamma_lower respectively, the update rule becomes gamma.perc[index.to.update]<- 0.5*(gamma_upper + gamma_lower), as in the bisection root finding method.

Value

A numeric value representing a percentile to be applied to each ZIP code so as to achieve a particular desired proportion of grades.

Example ZIP Code Vector.

Description

A vector of ZIP codes.

Usage

zips.kc

Format

A character vector with a length that matches the number of rows of X.kc (i.e. zips.kc has ~1500 elements). Each entry represents the ZIP code of one business.

Details

zips.kc[i] represents the ZIP code for the restaurant represented in the ith row of the X.kc inspection scores matrix. ZIP codes in zips.kc have the format "zip.j" where j is an integer between 1 and 11, i.e., ZIP codes are masked. In this masking step, we also demonstrate that our functions can be applied not solely over character vectors of real ZIP codes, but any vector of character strings representing the same facet for all restaurants can be used in the grading process.