Keller Jordan blog

In this post I’ll show how to construct subsets of data which resist being valued (in the sense of the data valuation research program). Setup We will work with the cat/dog subset of CIFAR-10. This forms a binary classification task with 10,000 training examples and 2,000 test examples. We will construct two subsets of the training data, $A$ and $B$, which yield the following test-set accuracies: Train on just $A$: 86% accuracy....