Methodology

The core of the Census Tree comes from information provided by users of FamilySearch.org, an online genealogy platform. Users can attach digitized historical records to the profiles of their ancestors, including the decennial censuses from 1850-1940. Any time a user links two different census records to a single profile, this creates a census-to-census link. There are over 317 million user-provided links, which constitute a dataset we call the Family Tree.

We build on the Family Tree in two ways. First, we use the Family Tree as training data for a machine learning algorithm to create additional census-to-census links. Second, we add links from the Census Linking Project and the IPUMS Multigenerational Longitudinal Panel, and hints from FamilySearch. After filtering the links for quality and adjudicating conflicts, we have the Census Tree.

For a more detailed description of the methodology behind the Census Tree, please see Buckles, Haws, Price, and Wilbert (2024).

Training Data and Code