Guidelines for new data sets

I welcome contributions from volunteers who want to improve the city data on the site. Please review these guidelines before compiling a new data set to maximize the chances of it being accepted.

General principles

  • The data must be licensed so that I can use it legally, for instance under a Creative Commons license.
  • For the purposes of this site, a 'city' is any independent populated place, regardless of size. So it includes what would normally be called towns and villages, but not neighborhoods or districts of larger cities, because they are not independent. Nor does it include administrative constructs, like (most) townships in the United States, that don't cleanly correspond to an actual populated place. The definition is somewhat open to interpretation and decisions for individual data sets will be made on a case-by-case basis.
  • I prefer to update an entire country or quiz at once.
  • I prefer data sets that are comprehensive, not just incremental improvements over the existing data.
  • I prefer to focus on quizzes and countries where the data is notably bad.
  • I prefer data from official sources (e.g., a national census) over crowd-sourced information (e.g., GeoNames or Wikidata), although the latter may still be used in the absence of the former.
  • I prefer all the data to come from the same source, although sometimes it is necessary to use different sources for the population and coordinates, for instance.

Step-by-step instructions

  1. Send me an email about the data set you plan to compile. We need to make sure we are on the same page before you start working.
  2. Once we've agreed, you can go ahead and compile the data. Make sure to retain the original files the data can from (e.g., the census downloads).
  3. You'll need to reorganize the data in a format that I can use. Acceptable formats are CSV and JSON. The following data must be included for each city:
    • Stable identifier. Typically this is assigned by the country's census bureau. The idea is that when another census is undertaken, we can use the stable ID to match existing cities to their updated populations.
    • Name
    • Population
    • Country
    • Coordinates
    I also prefer to include the name of the city's province or state, but that is negotiable depending on the data set.
  4. Send me the original data file as well as the compiled data, and I will try to apply it to the database. There may be some back-and-forth to work out issues that sometimes arise.
  5. Once I've accepted the data set, you are entitled to a place of honor on the credits page, and you may also receive special recognition on the subreddit.