Statistics derives its power from classifying data and comparing the resulting distributions. In this paper, I will use two historical examples to highlight the importance of such data practices for statistical reasoning. The two examples I will explore are Franz Boas’s anthropometric studies of native American populations in the early 1890s, which laid the foundation for his later critique of the race concept, and Wilhelm Johannsen’s experiments in barley breeding, which he carried out for the Carlsberg Laboratory around the same time and which prepared the ground for his later distinction of genotype and phenotype. Both examples will show that the manipulation of data depended on complex classificatory practices: the distinction and articulation of “tribes,” “races,” and “family lines” in the case of Boas, and the selection and construction of “populations” and “pure lines” in the case of Johannsen. They also reveal a fundamental difference between data practices in the human and the life sciences: whereas the latter are relatively free to construct populations in the laboratory, the field, or on paper, the former have to rely on social categories shaped by historical accident and self-perception of the subjects under study. This essay is part of a special issue entitled Histories of Data and the Database edited by Soraya de Chadarevian and Theodore M. Porter.

This content is only available via PDF.