Wikipedia defines a Gazetteer as:
A gazetteer is a geographical dictionary or directory used in conjunction with a map or atlas.
This is exactly what you get from this package, the gazetteer data that is released under public domain from U.S. Census Bureau all nicely packaged as R data structures.
Before diving into the data it will be helpful if you familiarize yourself with some core concepts related to the U.S. census data. Instead of repeating the information here, I’m simply linking to pages from the census bureau site. If you are new to this data I highly encourage you to read the links below.
Every geo data uses some kind of ID to uniquely determine a geographic entity. The links below will help you understand the various types of geo IDs in use. Understanding these types of IDs is essential to joining the gazetteer data with geo-spatial data from other sources.
The package contains one data.frame called state.areas.2010
and 6 lists, gazetteer.2010
, gazetteer.2012
, gazetteer.2013
, gazetteer.2014
, gazetteer.2015
, and gazetteer.2016
.
state.area.2010 contains state level gazetteer data, compiled from data from U.S. Census Bureau, and Wikipedia. In all the data.frame has 77 rows, 50 states + DC + Territories.
Here’s a small sample of it.
State | Type | ISO3166 | ANSI | FIPS | USPS | USCG | GNISID | Total.Area.sqm | IntPoint.Lat | IntPoint.Long |
---|---|---|---|---|---|---|---|---|---|---|
Alabama | State | US-AL | AL | 01 | AL | AL | 01779775 | 52420 | 32.73963 | -86.84346 |
Alaska | State | US-AK | AK | 02 | AK | AK | 01785533 | 665384 | 63.34619 | -152.83707 |
Arizona | State | US-AZ | AZ | 04 | AZ | AZ | 01779777 | 113990 | 34.20996 | -111.60240 |
Arkansas | State | US-AR | AR | 05 | AR | AR | 00068085 | 53179 | 34.89553 | -92.44463 |
California | State | US-CA | CA | 06 | CA | CF | 01779778 | 163695 | 37.14857 | -119.54065 |
Colorado | State | US-CO | CO | 08 | CO | CL | 01779779 | 104094 | 38.99358 | -105.50777 |
You get various geo-IDs for each state including FIPS/GNISID/USPS/USCG as well as land and water areas and coordinates for an internal point in the state.
In addition to the state level data, the gazetteer.20*
lists each contain data.frames for gazetteer data for each type of geographic entity. Each element of the list is assigned a name that indicates the data that element holds. The suffix of the list name is indicative of the year for which the data was relased. e.g. gazetteer.2015 contains data valid from 2015.
e.g. The names of data stored in the gazetteer.2016
list is shown below.
Data.Name |
---|
115th Congressional Districts |
American Indian Reservations, Statistical Areas, and Alaska Native Village Statistical Areas |
Census Tracts |
Core Based Statistical Areas |
Counties |
County Subdivisions |
Current American Indian, Alaska Native, and Hawaiian Home Lands Legal and Statistical Areas |
Current American Indian Off-Reservation Trust Lands |
Current Hawaiian Home Lands |
Places |
School Districts - Elementary |
School Districts - Secondary |
School Districts - Unified |
State Legislative Districts - Lower Chamber |
State Legislative Districts - Upper Chamber |
Urban Areas |
ZIP Code Tabulation Areas |
You can retrieve the names of any list using the standard names(<list>)
call. e.g. names(gazetteer.2014)
To retrieve information from any particular gazetteer simply reference it from the list by its name.
counties.2016 <- gazetteer.2016$Counties
knitr::kable(head(counties.2016[,c(1:4,9:12)]), format = 'html')
USPS | GEOID | ANSICODE | NAME | INTPTLAT | INTPTLONG | ATOTAL | ATOTAL_SQMI |
---|---|---|---|---|---|---|---|
AL | 01001 | 00161526 | Autauga County | 32.53224 | -86.64644 | 1565358957 | 604.388 |
AL | 01003 | 00161527 | Baldwin County | 30.65922 | -87.74607 | 5250714521 | 2027.312 |
AL | 01005 | 00161528 | Barbour County | 31.87025 | -85.40510 | 2342683357 | 904.515 |
AL | 01007 | 00161529 | Bibb County | 33.01589 | -87.12715 | 1621769533 | 626.169 |
AL | 01009 | 00161530 | Blount County | 33.97736 | -86.56644 | 1685119333 | 650.628 |
AL | 01011 | 00161531 | Bullock County | 32.10176 | -85.71726 | 1619114159 | 625.144 |
While the data on its own is not very useful, it is very much useful when working with geo-spatial data obtained from other sources. It allows you to map names of geographic entities (states,counties, congressional districts etc.) to their corresponding geoids. It also gives you some useful data like land/water/total area of each entity as well as latitude/longitude values of an internal point in that said entity. These can be used to normalize geo-data by area (using the land/water/total area) or place labels/markers on geographic entities when mapping them.
Lastly this is perhaps the authoritative source of data of this kind. So while there are other packages that offer similar data, none offer so much comprehensive data in one place. This package along with the acs
package, which allows you to query dynamic census data, and the tigris
package which allows you to pull census shapefiles, provide comprehensive resources for any need you may have when it comes to obtaining/analyzing/plotting data from the U.S. Census Bureau.