Table of Contents
Where does this dataset come from?
Is this the original revised data or the revised revised data?
Keeping track of the provenance of data can be a challenge, especially when drawing on published sources. Keeping a record of the origin, the date accessed, the transformations applied (e.g., converting from .xls to cvs and converting character strings such as “$1,250,321.21” to floats or date strings to date objects), subsequent changes, who handled the data object and where it can be found in a repository are all things that enhance the analyst’s own ability to reproduce results.
Unfortunately, notes go missing, files get mis-filed and all the other hazards that can befall research can happen. Often, one wishes for R objects with built-in metadata for that purpose.
Using mostattributes() to do attach metadata
Scott Chamerlain at ropensci.org brought attr to my attention, which is the built-in way I was looking for originally. He also pointed me to EML, a much more elaborate approach suited for publication projects.
A minimal example
Create data frame and a separate metadata list
## Loading required package: jsonlite
fips <- read.csv("https://tuva.s3-us-west-2.amazonaws.com/state_fips_postal.csv", header = FALSE)
colnames(fips) = c("state", "fip", 'id')
require(jsonlite) # easier to use JSON to write metadata
meta <- fromJSON("https://tuva.s3-us-west-2.amazonaws.com/2015-07-31-meta.json")
The json source file looks like this
[
{
"Accessed": "2015-07-31",
"GitBlame": "Richard Careaga",
"Contact": "technocrat@twitter",
"Preprocessing": "FIPS Codes for the States and District of Columbia table captured manually and converted to cvs file",
"Source": "https://www.census.gov/geo/reference/ansi_statetables.html",
"Repository": "unassigned",
"Version": "1.0"
}
]
Associate the metadata with the data frame using mostattributes
x <- fips
mostattributes(x) <- list(meta = meta)
Now metadata is displayed by default
x
## [[1]]
## [1] "Alabama" "Alaska" "Arizona"
## [4] "Arkansas" "California" "Colorado"
## [7] "Connecticut" "Delaware" "District of Columbia"
## [10] "Florida" "Georgia" "Hawaii"
## [13] "Idaho" "Illinois" "Indiana"
## [16] "Iowa" "Kansas" "Kentucky"
## [19] "Louisiana" "Maine" "Maryland"
## [22] "Massachusetts" "Michigan" "Minnesota"
## [25] "Mississippi" "Missouri" "Montana"
## [28] "Nebraska" "Nevada" "New Hampshire"
## [31] "New Jersey" "New Mexico" "New York"
## [34] "North Carolina" "North Dakota" "Ohio"
## [37] "Oklahoma" "Oregon" "Pennsylvania"
## [40] "Rhode Island" "South Carolina" "South Dakota"
## [43] "Tennessee" "Texas" "Utah"
## [46] "Vermont" "Virginia" "Washington"
## [49] "West Virginia" "Wisconsin" "Wyoming"
##
## [[2]]
## [1] 1 2 4 5 6 8 9 10 11 12 13 15 16 17 18 19 20 21 22 23 24 25 26 27 28
## [26] 29 30 31 32 33 34 35 36 37 38 39 40 41 42 44 45 46 47 48 49 50 51 53 54 55
## [51] 56
##
## [[3]]
## [1] "AL" "AK" "AZ" "AR" "CA" "CO" "CT" "DE" "DC" "FL" "GA" "HI" "ID" "IL" "IN"
## [16] "IA" "KS" "KY" "LA" "ME" "MD" "MA" "MI" "MN" "MS" "MO" "MT" "NE" "NV" "NH"
## [31] "NJ" "NM" "NY" "NC" "ND" "OH" "OK" "OR" "PA" "RI" "SC" "SD" "TN" "TX" "UT"
## [46] "VT" "VA" "WA" "WV" "WI" "WY"
##
## attr(,"meta")
## Accessed GitBlame Contact
## 1 2015-07-31 Richard Careaga technocrat@twitter
## Preprocessing
## 1 FIPS Codes for the States and District of Columbia table captured manually and converted to cvs file
## Source Repository Version
## 1 https://www.census.gov/geo/reference/ansi_statetables.html unassigned 1.0