In Proceedings of the 2019 International Conference on Management of Data (SIGMOD), ACM, 2019.
We often need to rank items based on more than one criterion. One common way to accomplish this is to assign a score to each item as a weighted sum of attribute values, for each attribute that represents a criterion of interest. Thereafter, items are easily ranked based on scores. Weighted linear combinations of attribute values are straightforward to compute and easy to understand. However, the specific weights chosen have a huge impact on the score and hence rank for an item: the way the attributes are combined into the score determines the ranking, and may highly impact decisions that take these rankings into account. The decisions may in turn impact the lives of individuals and even influence societal policies.
Surprisingly, despite the enormous impact of score-based rankers, attribute weights are usually assigned in an ad-hoc manner, based only on intuitive reasoning and common-sense of the human designers. This demonstration presents MithraRanking, a system for responsible ranking design. MithraRanking provides a user interface in which the user can (i) identify a dataset of items to be ranked, (ii) set up the goodness criteria, (iii) provide a weight vector as the initial ranking function, and (iv) specify an acceptable range of functions, in the form of a region of interest in weight space. Then, the system investigates the generated ranking in terms of the specified goodness criteria and, if needed, makes suggestions (within the region of interest) that better satisfy the desired goodness criteria.
The MithraRanking framework is designed to be extensible to accommodate a wide variety of goodness metrics. In the current system, we have focused on two specific classes of properties: fairness and stability.
Fairness is a complex concept, with a number of different possible definitions. We consider group fairness with respect to membership in a protected group, based, for example, on minority race or underrepresented gender, where group membership is readily ascertained by looking at an attribute value. For a given rank cut-off point
k, we wish to ensure that the number of protected group members ranked among the top-$k$ is proportional to their representation in the entire population, or to their desired proportion in the output (as is the case in affirmative action interventions).
Stability of a ranking specifies that slight changes to attribute weights in the scoring formula should not significantly perturb the ranked order. We worry about unstable rankings because such rankings are not robust, and so may be prone to tuning and manipulation by a vendor.