--- title: "Introduction to the isdparser package" author: "Scott Chamberlain" date: "2020-01-31" output: html_document: toc: true toc_float: true vignette: > %\VignetteIndexEntry{Introduction to the isdparser package} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- `isdparser` is an parser for ISD/ISD NOAA files Code liberated from `rnoaa` to focus on ISD parsing since it's sorta complicated. Has minimal dependencies, so you can parse your ISD/ISH files without needing the deps that `rnoaa` needs. Will be used by `rnoaa` once on CRAN. Documentation at ftp://ftp.ncdc.noaa.gov/pub/data/noaa/ish-format-document.pdf Package API: * `isd_parse()` - parse all lines in a file, with parallel option * `isd_parse_line()` - parse a single line - you choose which lines to parse and how to apply the function to your lines * `isd_transform()` - transform ISD data variables * `isd_parse_csv()` - parse csv format files `isd_parse_csv()` parses NOAA ISD csv files, whereas `isd_parse()` and `isd_parse_line()` both handle compressed files where each row of data is a string that needs to be parsed. `isd_parse_csv()` is faster than `isd_parse()` because parsing each line takes some time - although using `isd_parse(parallel = TRUE)` option gets closer to the speed of `isd_parse_csv()`. ## Install Stable from CRAN ```r install.packages("isdparser") ``` Dev version ```r remotes::install_github("ropensci/isdparser") ``` ```r library("isdparser") ``` ## isd_parse_csv: parse a CSV file Using a csv file included in the package: ```r path <- system.file('extdata/00702699999.csv', package = "isdparser") isd_parse_csv(path) #> # A tibble: 6,843 x 68 #> station date source latitude longitude elevation name #> #> 1 7.03e8 2017-02-10 14:04:00 4 0 0 7026 WXPO… #> 2 7.03e8 2017-02-10 14:14:00 4 0 0 7026 WXPO… #> 3 7.03e8 2017-02-10 14:19:00 4 0 0 7026 WXPO… #> 4 7.03e8 2017-02-10 14:24:00 4 0 0 7026 WXPO… #> 5 7.03e8 2017-02-10 14:29:00 4 0 0 7026 WXPO… #> 6 7.03e8 2017-02-10 14:34:00 4 0 0 7026 WXPO… #> 7 7.03e8 2017-02-10 14:39:00 4 0 0 7026 WXPO… #> 8 7.03e8 2017-02-10 14:44:00 4 0 0 7026 WXPO… #> 9 7.03e8 2017-02-10 14:49:00 4 0 0 7026 WXPO… #> 10 7.03e8 2017-02-10 14:54:00 4 0 0 7026 WXPO… #> # … with 6,833 more rows, and 61 more variables: report_type , #> # call_sign , quality_control , wnd , cig , vis , #> # tmp , dew , slp , wind_direction , #> # wind_direction_quality , wind_code , wind_speed , #> # wind_speed_quality , ceiling_height , #> # ceiling_height_quality , ceiling_height_determination , #> # ceiling_height_cavok , visibility_distance , #> # visibility_distance_quality , visibility_code , #> # visibility_code_quality , temperature , #> # temperature_quality , temperature_dewpoint , #> # temperature_dewpoint_quality , air_pressure , #> # air_pressure_quality , automated_atmospheric_condition_code , #> # quality_automated_atmospheric_condition_code , coverage_code , #> # coverage_quality_code , base_height_dimension , #> # base_height_quality_code , cloud_type_code , #> # cloud_type_quality_code , connective_cloud_attribute , #> # vertical_datum_attribute , base_height_upper_range_attribute , #> # base_height_lower_range_attribute , coverage , #> # opaque_coverage , coverage_quality , lowest_cover , #> # lowest_cover_quality , low_cloud_genus , #> # low_cloud_genus_quality , lowest_cloud_base_height , #> # lowest_cloud_base_height_quality , mid_cloud_genus , #> # mid_cloud_genus_quality , high_cloud_genus , #> # high_cloud_genus_quality , altimeter_setting_rate , #> # altimeter_quality_code , station_pressure_rate , #> # station_pressure_quality_code , speed_rate , quality_code , #> # rem , eqd ``` Download a file first: ```r path <- file.path(tempdir(), "00702699999.csv") x <- "https://www.ncei.noaa.gov/data/global-hourly/access/2017/00702699999.csv" download.file(x, path) isd_parse_csv(path) #> # A tibble: 6,843 x 68 #> station date source latitude longitude elevation name #> #> 1 7.03e8 2017-02-10 14:04:00 4 0 0 7026 WXPO… #> 2 7.03e8 2017-02-10 14:14:00 4 0 0 7026 WXPO… #> 3 7.03e8 2017-02-10 14:19:00 4 0 0 7026 WXPO… #> 4 7.03e8 2017-02-10 14:24:00 4 0 0 7026 WXPO… #> 5 7.03e8 2017-02-10 14:29:00 4 0 0 7026 WXPO… #> 6 7.03e8 2017-02-10 14:34:00 4 0 0 7026 WXPO… #> 7 7.03e8 2017-02-10 14:39:00 4 0 0 7026 WXPO… #> 8 7.03e8 2017-02-10 14:44:00 4 0 0 7026 WXPO… #> 9 7.03e8 2017-02-10 14:49:00 4 0 0 7026 WXPO… #> 10 7.03e8 2017-02-10 14:54:00 4 0 0 7026 WXPO… #> # … with 6,833 more rows, and 61 more variables: report_type , #> # call_sign , quality_control , wnd , cig , vis , #> # tmp , dew , slp , wind_direction , #> # wind_direction_quality , wind_code , wind_speed , #> # wind_speed_quality , ceiling_height , #> # ceiling_height_quality , ceiling_height_determination , #> # ceiling_height_cavok , visibility_distance , #> # visibility_distance_quality , visibility_code , #> # visibility_code_quality , temperature , #> # temperature_quality , temperature_dewpoint , #> # temperature_dewpoint_quality , air_pressure , #> # air_pressure_quality , automated_atmospheric_condition_code , #> # quality_automated_atmospheric_condition_code , coverage_code , #> # coverage_quality_code , base_height_dimension , #> # base_height_quality_code , cloud_type_code , #> # cloud_type_quality_code , connective_cloud_attribute , #> # vertical_datum_attribute , base_height_upper_range_attribute , #> # base_height_lower_range_attribute , coverage , #> # opaque_coverage , coverage_quality , lowest_cover , #> # lowest_cover_quality , low_cloud_genus , #> # low_cloud_genus_quality , lowest_cloud_base_height , #> # lowest_cloud_base_height_quality , mid_cloud_genus , #> # mid_cloud_genus_quality , high_cloud_genus , #> # high_cloud_genus_quality , altimeter_setting_rate , #> # altimeter_quality_code , station_pressure_rate , #> # station_pressure_quality_code , speed_rate , quality_code , #> # rem , eqd ``` ## isd_parse_line: parse lines from an ASCII strings file ```r path <- system.file('extdata/024130-99999-2016.gz', package = "isdparser") lns <- readLines(path, encoding = "latin1") isd_parse_line(lns[1]) #> # A tibble: 1 x 38 #> total_chars usaf_station wban_station date time date_flag latitude longitude #> #> 1 0054 024130 99999 2016… 0000 4 +60750 +012767 #> # … with 30 more variables: type_code , elevation , #> # call_letter , quality , wind_direction , #> # wind_direction_quality , wind_code , wind_speed , #> # wind_speed_quality , ceiling_height , #> # ceiling_height_quality , ceiling_height_determination , #> # ceiling_height_cavok , visibility_distance , #> # visibility_distance_quality , visibility_code , #> # visibility_code_quality , temperature , #> # temperature_quality , temperature_dewpoint , #> # temperature_dewpoint_quality , air_pressure , #> # air_pressure_quality , #> # AW1_present_weather_observation_identifier , #> # AW1_automated_atmospheric_condition_code , #> # AW1_quality_automated_atmospheric_condition_code , REM_remarks , #> # REM_identifier , REM_length_quantity , REM_comment ``` Or, give back a list ```r head( isd_parse_line(lns[1], as_data_frame = FALSE) ) #> $total_chars #> [1] "0054" #> #> $usaf_station #> [1] "024130" #> #> $wban_station #> [1] "99999" #> #> $date #> [1] "20160101" #> #> $time #> [1] "0000" #> #> $date_flag #> [1] "4" ``` Optionally don't include "Additional" and "Remarks" sections in parsed output. ```r isd_parse_line(lns[1], additional = FALSE) #> # A tibble: 1 x 31 #> total_chars usaf_station wban_station date time date_flag latitude longitude #> #> 1 0054 024130 99999 2016… 0000 4 +60750 +012767 #> # … with 23 more variables: type_code , elevation , #> # call_letter , quality , wind_direction , #> # wind_direction_quality , wind_code , wind_speed , #> # wind_speed_quality , ceiling_height , #> # ceiling_height_quality , ceiling_height_determination , #> # ceiling_height_cavok , visibility_distance , #> # visibility_distance_quality , visibility_code , #> # visibility_code_quality , temperature , #> # temperature_quality , temperature_dewpoint , #> # temperature_dewpoint_quality , air_pressure , #> # air_pressure_quality ``` ## isd_parse: parse an ASCII strings file Downloading a new file ```r path <- file.path(tempdir(), "007026-99999-2017.gz") y <- "ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2017/007026-99999-2017.gz" download.file(y, path) isd_parse(path) #> # A tibble: 6,843 x 72 #> total_chars usaf_station wban_station date time date_flag latitude #> #> 1 0157 007026 99999 2017… 1404 4 +00000 #> 2 0157 007026 99999 2017… 1414 4 +00000 #> 3 0157 007026 99999 2017… 1419 4 +00000 #> 4 0157 007026 99999 2017… 1424 4 +00000 #> 5 0157 007026 99999 2017… 1429 4 +00000 #> 6 0144 007026 99999 2017… 1434 4 +00000 #> 7 0157 007026 99999 2017… 1439 4 +00000 #> 8 0157 007026 99999 2017… 1444 4 +00000 #> 9 0172 007026 99999 2017… 1449 4 +00000 #> 10 0157 007026 99999 2017… 1454 4 +00000 #> # … with 6,833 more rows, and 65 more variables: longitude , #> # type_code , elevation , call_letter , quality , #> # wind_direction , wind_direction_quality , wind_code , #> # wind_speed , wind_speed_quality , ceiling_height , #> # ceiling_height_quality , ceiling_height_determination , #> # ceiling_height_cavok , visibility_distance , #> # visibility_distance_quality , visibility_code , #> # visibility_code_quality , temperature , #> # temperature_quality , temperature_dewpoint , #> # temperature_dewpoint_quality , air_pressure , #> # air_pressure_quality , GF1_sky_condition , GF1_coverage , #> # GF1_opaque_coverage , GF1_coverage_quality , #> # GF1_lowest_cover , GF1_lowest_cover_quality , #> # GF1_low_cloud_genus , GF1_low_cloud_genus_quality , #> # GF1_lowest_cloud_base_height , #> # GF1_lowest_cloud_base_height_quality , GF1_mid_cloud_genus , #> # GF1_mid_cloud_genus_quality , GF1_high_cloud_genus , #> # GF1_high_cloud_genus_quality , MA1_atmospheric_pressure , #> # MA1_altimeter_setting_rate , MA1_altimeter_quality_code , #> # MA1_station_pressure_rate , MA1_station_pressure_quality_code , #> # REM_remarks , REM_identifier , REM_length_quantity , #> # REM_comment , OC1_wind_gust_observation_identifier , #> # OC1_speed_rate , OC1_quality_code , #> # GA1_sky_cover_layer_identifier , GA1_coverage_code , #> # GA1_coverage_quality_code , GA1_base_height_dimension , #> # GA1_base_height_quality_code , GA1_cloud_type_code , #> # GA1_cloud_type_quality_code , GE1_sky_condition , #> # GE1_connective_cloud_attribute , GE1_vertical_datum_attribute , #> # GE1_base_height_upper_range_attribute , #> # GE1_base_height_lower_range_attribute , #> # AW1_present_weather_observation_identifier , #> # AW1_automated_atmospheric_condition_code , #> # AW1_quality_automated_atmospheric_condition_code ``` ### Parallel ```r isd_parse(path, parallel = TRUE) ``` ### Progress > note: Progress not printed if `parallel = TRUE` ```r isd_parse(path, progress = TRUE) #> #> |========================================================================================| 100% #> # A tibble: 2,601 × 42 #> total_chars usaf_station wban_station date time date_flag latitude longitude type_code #> #> 1 54 024130 99999 2016-01-01 0000 4 60.75 12.767 FM-12 #> 2 54 024130 99999 2016-01-01 0100 4 60.75 12.767 FM-12 #> 3 54 024130 99999 2016-01-01 0200 4 60.75 12.767 FM-12 #> 4 54 024130 99999 2016-01-01 0300 4 60.75 12.767 FM-12 #> 5 54 024130 99999 2016-01-01 0400 4 60.75 12.767 FM-12 #> 6 39 024130 99999 2016-01-01 0500 4 60.75 12.767 FM-12 #> 7 54 024130 99999 2016-01-01 0600 4 60.75 12.767 FM-12 #> 8 39 024130 99999 2016-01-01 0700 4 60.75 12.767 FM-12 #> 9 54 024130 99999 2016-01-01 0800 4 60.75 12.767 FM-12 #> 10 54 024130 99999 2016-01-01 0900 4 60.75 12.767 FM-12 #> # ... with 2,591 more rows, and 33 more variables: elevation , call_letter , quality , #> # wind_direction , wind_direction_quality , wind_code , wind_speed , #> # wind_speed_quality , ceiling_height , ceiling_height_quality , #> # ceiling_height_determination , ceiling_height_cavok , visibility_distance , #> # visibility_distance_quality , visibility_code , visibility_code_quality , #> # temperature , temperature_quality , temperature_dewpoint , #> # temperature_dewpoint_quality , air_pressure , air_pressure_quality , #> # AW1_present_weather_observation_identifier , AW1_automated_atmospheric_condition_code , #> # AW1_quality_automated_atmospheric_condition_code , N03_original_observation , #> # N03_original_value_text , N03_units_code , N03_parameter_code , REM_remarks , #> # REM_identifier , REM_length_quantity , REM_comment ``` ### Additional data Optionally don't include "Additional" and "Remarks" sections in parsed output. ```r isd_parse(path, additional = FALSE) #> # A tibble: 6,843 x 31 #> total_chars usaf_station wban_station date time date_flag latitude #> #> 1 0157 007026 99999 2017… 1404 4 +00000 #> 2 0157 007026 99999 2017… 1414 4 +00000 #> 3 0157 007026 99999 2017… 1419 4 +00000 #> 4 0157 007026 99999 2017… 1424 4 +00000 #> 5 0157 007026 99999 2017… 1429 4 +00000 #> 6 0144 007026 99999 2017… 1434 4 +00000 #> 7 0157 007026 99999 2017… 1439 4 +00000 #> 8 0157 007026 99999 2017… 1444 4 +00000 #> 9 0172 007026 99999 2017… 1449 4 +00000 #> 10 0157 007026 99999 2017… 1454 4 +00000 #> # … with 6,833 more rows, and 24 more variables: longitude , #> # type_code , elevation , call_letter , quality , #> # wind_direction , wind_direction_quality , wind_code , #> # wind_speed , wind_speed_quality , ceiling_height , #> # ceiling_height_quality , ceiling_height_determination , #> # ceiling_height_cavok , visibility_distance , #> # visibility_distance_quality , visibility_code , #> # visibility_code_quality , temperature , #> # temperature_quality , temperature_dewpoint , #> # temperature_dewpoint_quality , air_pressure , #> # air_pressure_quality ```