Gazetteer

Wikipedia defines a Gazetteer as:

A gazetteer is a geographical dictionary or directory used in conjunction with a map or atlas.

This is exactly what you get from this package, the gazetteer data that is released under public domain from U.S. Census Bureau all nicely packaged as R data structures.

Some Core Concepts

Before diving into the data it will be helpful if you familiarize yourself with some core concepts related to the U.S. census data. Instead of repeating the information here, I’m simply linking to pages from the census bureau site. If you are new to this data I highly encourage you to read the links below.

Every geo data uses some kind of ID to uniquely determine a geographic entity. The links below will help you understand the various types of geo IDs in use. Understanding these types of IDs is essential to joining the gazetteer data with geo-spatial data from other sources.

The Data

The package contains one data.frame called state.areas.2010 and 6 lists, gazetteer.2010, gazetteer.2012, gazetteer.2013, gazetteer.2014, gazetteer.2015, and gazetteer.2016.

state.area.2010 contains state level gazetteer data, compiled from data from U.S. Census Bureau, and Wikipedia. In all the data.frame has 77 rows, 50 states + DC + Territories.

Here’s a small sample of it.

State Type ISO3166 ANSI FIPS USPS USCG GNISID Total.Area.sqm IntPoint.Lat IntPoint.Long
Alabama State US-AL AL 01 AL AL 01779775 52420 32.73963 -86.84346
Alaska State US-AK AK 02 AK AK 01785533 665384 63.34619 -152.83707
Arizona State US-AZ AZ 04 AZ AZ 01779777 113990 34.20996 -111.60240
Arkansas State US-AR AR 05 AR AR 00068085 53179 34.89553 -92.44463
California State US-CA CA 06 CA CF 01779778 163695 37.14857 -119.54065
Colorado State US-CO CO 08 CO CL 01779779 104094 38.99358 -105.50777

You get various geo-IDs for each state including FIPS/GNISID/USPS/USCG as well as land and water areas and coordinates for an internal point in the state.

In addition to the state level data, the gazetteer.20* lists each contain data.frames for gazetteer data for each type of geographic entity. Each element of the list is assigned a name that indicates the data that element holds. The suffix of the list name is indicative of the year for which the data was relased. e.g. gazetteer.2015 contains data valid from 2015.

e.g. The names of data stored in the gazetteer.2016 list is shown below.

Data.Name
115th Congressional Districts
American Indian Reservations, Statistical Areas, and Alaska Native Village Statistical Areas
Census Tracts
Core Based Statistical Areas
Counties
County Subdivisions
Current American Indian, Alaska Native, and Hawaiian Home Lands Legal and Statistical Areas
Current American Indian Off-Reservation Trust Lands
Current Hawaiian Home Lands
Places
School Districts - Elementary
School Districts - Secondary
School Districts - Unified
State Legislative Districts - Lower Chamber
State Legislative Districts - Upper Chamber
Urban Areas
ZIP Code Tabulation Areas

You can retrieve the names of any list using the standard names(<list>) call. e.g. names(gazetteer.2014)

To retrieve information from any particular gazetteer simply reference it from the list by its name.

counties.2016 <- gazetteer.2016$Counties
knitr::kable(head(counties.2016[,c(1:4,9:12)]),  format = 'html')
USPS GEOID ANSICODE NAME INTPTLAT INTPTLONG ATOTAL ATOTAL_SQMI
AL 01001 00161526 Autauga County 32.53224 -86.64644 1565358957 604.388
AL 01003 00161527 Baldwin County 30.65922 -87.74607 5250714521 2027.312
AL 01005 00161528 Barbour County 31.87025 -85.40510 2342683357 904.515
AL 01007 00161529 Bibb County 33.01589 -87.12715 1621769533 626.169
AL 01009 00161530 Blount County 33.97736 -86.56644 1685119333 650.628
AL 01011 00161531 Bullock County 32.10176 -85.71726 1619114159 625.144

Using the Gazetteer Data

While the data on its own is not very useful, it is very much useful when working with geo-spatial data obtained from other sources. It allows you to map names of geographic entities (states,counties, congressional districts etc.) to their corresponding geoids. It also gives you some useful data like land/water/total area of each entity as well as latitude/longitude values of an internal point in that said entity. These can be used to normalize geo-data by area (using the land/water/total area) or place labels/markers on geographic entities when mapping them.

Lastly this is perhaps the authoritative source of data of this kind. So while there are other packages that offer similar data, none offer so much comprehensive data in one place. This package along with the acs package, which allows you to query dynamic census data, and the tigris package which allows you to pull census shapefiles, provide comprehensive resources for any need you may have when it comes to obtaining/analyzing/plotting data from the U.S. Census Bureau.