--- title: "How the Peru checklist changed from 2025 to 2026" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{How the Peru checklist changed from 2025 to 2026} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 8, fig.height = 4.8, fig.align = "center" ) ``` ## Introduction This vignette compares the two most recent checklist objects shipped with `avesperu`: `aves_peru_2025_v5` and `aves_peru_2026_v1`. The goal is not only to show that the dataset changed, but also to answer four practical questions: 1. How large was the update? 2. Which status categories changed the most? 3. Which species entered or left the checklist? 4. Where are the changes concentrated taxonomically? ```{r setup, echo=FALSE} library(avesperu) library(ggplot2) old <- aves_peru_2025_v5 new <- aves_peru_2026_v1 old_date <- attr(old, "version_date", exact = TRUE) new_date <- attr(new, "version_date", exact = TRUE) added <- new[!(new$scientific_name %in% old$scientific_name), ] removed <- old[!(old$scientific_name %in% new$scientific_name), ] shared_species <- intersect(old$scientific_name, new$scientific_name) status_order <- c( "Residente", "Endémico", "Migratorio", "Divagante", "Introducido", "No confirmado", "Extirpado" ) count_status <- function(x, levels) { out <- table(factor(x, levels = levels)) as.integer(out) } status_tbl <- data.frame( status = status_order, n_2025 = count_status(old$status, status_order), n_2026 = count_status(new$status, status_order), stringsAsFactors = FALSE ) status_tbl$change <- status_tbl$n_2026 - status_tbl$n_2025 summary_tbl <- data.frame( dataset = c("aves_peru_2025_v5", "aves_peru_2026_v1"), version_date = c(old_date, new_date), species = c(nrow(old), nrow(new)), orders = c(length(unique(old$order_name)), length(unique(new$order_name))), families = c(length(unique(old$family_name)), length(unique(new$family_name))), stringsAsFactors = FALSE ) order_levels <- sort(unique(c(added$order_name, removed$order_name))) turnover_by_order <- data.frame( order_name = order_levels, added = as.integer(table(factor(added$order_name, levels = order_levels))), removed = as.integer(table(factor(removed$order_name, levels = order_levels))), stringsAsFactors = FALSE ) turnover_by_order$net_change <- turnover_by_order$added - turnover_by_order$removed turnover_by_order <- turnover_by_order[ turnover_by_order$added > 0 | turnover_by_order$removed > 0, ] fam_old <- table(old$family_name) fam_new <- table(new$family_name) family_levels <- sort(unique(c(names(fam_old), names(fam_new)))) family_delta <- data.frame( family_name = family_levels, n_2025 = as.integer(fam_old[family_levels]), n_2026 = as.integer(fam_new[family_levels]), stringsAsFactors = FALSE ) family_delta[is.na(family_delta)] <- 0L family_delta$change <- family_delta$n_2026 - family_delta$n_2025 family_delta <- family_delta[family_delta$change != 0, ] family_delta <- family_delta[order(family_delta$change, family_delta$family_name), ] plot_theme <- theme_minimal(base_size = 12) + theme( plot.title = element_text(face = "bold", size = 14), plot.subtitle = element_text(color = "#51606F"), panel.grid.minor = element_blank(), panel.grid.major.y = element_blank(), legend.title = element_blank(), legend.position = "top" ) ``` Between `r old_date` and `r new_date`, the checklist grew from `r nrow(old)` to `r nrow(new)` species. That is a net gain of `r nrow(new) - nrow(old)` species. At the same time, the overall structure of the database remained highly stable: `r length(shared_species)` species are shared by both versions, which means that `r round(length(shared_species) / nrow(old) * 100, 2)`% of the 2025 checklist was retained in the 2026 release. ## 1. High-level snapshot The first table summarizes the scale of the update. It shows that the number of orders remained constant, while the number of family labels increased slightly. ```{r summary-table} knitr::kable(summary_tbl, caption = "High-level comparison of the two checklist versions") ``` ```{r total-species-plot} summary_plot_tbl <- summary_tbl summary_plot_tbl$release <- c("2025 v5", "2026 v1") ggplot(summary_plot_tbl, aes(x = release, y = species, fill = release)) + geom_col(width = 0.62, color = NA) + geom_text(aes(label = species), vjust = -0.5, fontface = "bold", size = 4.2) + scale_fill_manual(values = c("2025 v5" = "#4C67B0", "2026 v1" = "#69B3E7")) + scale_y_continuous( expand = expansion(mult = c(0, 0.08)), labels = scales::comma ) + labs( title = "Net checklist growth between releases", subtitle = "The 2026 update adds 6 species relative to the 2025 release", x = NULL, y = "Number of species" ) + plot_theme + theme(legend.position = "none") ``` This graphic is useful as a first check for reproducibility: the update is not a complete restructuring of the package data, but a focused revision with a small and traceable net increase. ## 2. Changes in status composition The net increase is not distributed evenly across status categories. Most of the change is concentrated in `Divagante`, `Residente`, and `Endémico`, while `No confirmado` decreases. ```{r status-table} knitr::kable(status_tbl, caption = "Species counts by status in each dataset version") ``` ```{r status-delta-plot} status_plot_tbl <- status_tbl status_plot_tbl$direction <- ifelse(status_plot_tbl$change >= 0, "Increase", "Decrease") status_plot_tbl$label <- ifelse( status_plot_tbl$change > 0, paste0("+", status_plot_tbl$change), as.character(status_plot_tbl$change) ) status_plot_tbl$status <- factor(status_plot_tbl$status, levels = rev(status_plot_tbl$status)) ggplot(status_plot_tbl, aes(x = status, y = change, fill = direction)) + geom_col(width = 0.72) + geom_hline(yintercept = 0, linetype = 2, color = "#7A8793") + geom_text( aes( label = label, hjust = ifelse(change >= 0, -0.15, 1.15) ), size = 4 ) + coord_flip() + scale_fill_manual(values = c("Increase" = "#4B8A5F", "Decrease" = "#B34A3C")) + scale_y_continuous(expand = expansion(mult = c(0.08, 0.12))) + labs( title = "Net change by status category", subtitle = "Vagrants and residents explain most of the checklist growth", x = NULL, y = "Change in number of species" ) + plot_theme ``` Three patterns stand out: - `Divagante` increases by `r status_tbl$change[status_tbl$status == "Divagante"]` species, the largest category-level change in the update. - `Residente` increases by `r status_tbl$change[status_tbl$status == "Residente"]` species, showing that the revision is not restricted to occasional records. - `No confirmado` decreases by `r abs(status_tbl$change[status_tbl$status == "No confirmado"])` species, suggesting that some previously uncertain records were either excluded or reclassified in the updated source. ## 3. Species turnover The 2026 release adds `r nrow(added)` species and removes `r nrow(removed)`. Because the shared core remains so large, the update is best understood as a targeted revision rather than a replacement of the whole checklist. ### Added species ```{r added-table} knitr::kable( added[, c("scientific_name", "english_name", "status", "family_name", "order_name")], caption = "Species added in aves_peru_2026_v1" ) ``` ### Removed species ```{r removed-table} knitr::kable( removed[, c("scientific_name", "english_name", "status", "family_name", "order_name")], caption = "Species removed from the previous checklist version" ) ``` Some of these additions and removals are especially informative. For example, the replacement of `Camptostoma obsoletum` by `Camptostoma sclateri` and `Camptostoma napaeum` is consistent with a taxonomic split in the source checklist. Likewise, the replacement of `Tunchiornis ochraceiceps` and `Turdus albicollis` by more specific taxa suggests an update in species limits or taxonomic circumscription. That interpretation is an inference from the before/after pattern in the data, not an explicit annotation embedded in the dataset itself. ```{r turnover-order-plot} turnover_plot_tbl <- rbind( data.frame(order_name = turnover_by_order$order_name, movement = "Added", n = turnover_by_order$added), data.frame(order_name = turnover_by_order$order_name, movement = "Removed", n = turnover_by_order$removed) ) turnover_plot_tbl <- turnover_plot_tbl[turnover_plot_tbl$n > 0, ] turnover_plot_tbl$order_name <- factor( turnover_plot_tbl$order_name, levels = turnover_by_order$order_name[order(turnover_by_order$net_change, decreasing = TRUE)] ) ggplot(turnover_plot_tbl, aes(x = order_name, y = n, fill = movement)) + geom_col(position = position_dodge(width = 0.72), width = 0.62) + geom_text( aes(label = n), position = position_dodge(width = 0.72), vjust = -0.45, size = 3.8 ) + scale_fill_manual(values = c("Added" = "#69B3E7", "Removed" = "#D98C6A")) + scale_y_continuous(expand = expansion(mult = c(0, 0.1))) + labs( title = "Species turnover by order", subtitle = "Most additions and all removals occur in Passeriformes", x = NULL, y = "Number of species" ) + plot_theme + theme(axis.text.x = element_text(angle = 20, hjust = 1)) ``` This plot shows that turnover is concentrated in `Passeriformes`, which accounts for `r turnover_by_order$added[turnover_by_order$order_name == "Passeriformes"]` of the `r nrow(added)` additions and all `r nrow(removed)` removals. ## 4. Where the taxonomic changes are concentrated At a broad level, the checklist still contains `r summary_tbl$orders[summary_tbl$dataset == "aves_peru_2025_v5"]` orders in both versions. The family layer changes more subtly, from `r summary_tbl$families[summary_tbl$dataset == "aves_peru_2025_v5"]` to `r summary_tbl$families[summary_tbl$dataset == "aves_peru_2026_v1"]` distinct family labels. The next table isolates only the family labels whose counts changed. ```{r family-table} knitr::kable( family_delta, caption = "Families with non-zero net change between versions" ) ``` ```{r family-delta-plot} family_plot_tbl <- family_delta family_plot_tbl$direction <- ifelse(family_plot_tbl$change > 0, "Increase", "Decrease") family_plot_tbl$label <- ifelse( family_plot_tbl$change > 0, paste0("+", family_plot_tbl$change), as.character(family_plot_tbl$change) ) family_plot_tbl$family_name <- factor( family_plot_tbl$family_name, levels = family_plot_tbl$family_name ) ggplot(family_plot_tbl, aes(x = family_name, y = change, fill = direction)) + geom_col(width = 0.7) + geom_hline(yintercept = 0, linetype = 2, color = "#7A8793") + geom_text( aes( label = label, hjust = ifelse(change > 0, -0.12, 1.12) ), size = 3.8 ) + coord_flip() + scale_fill_manual(values = c("Increase" = "#F3C94D", "Decrease" = "#C96B5C")) + scale_y_continuous(expand = expansion(mult = c(0.08, 0.12))) + labs( title = "Family-level concentration of checklist updates", subtitle = "Only a small subset of family labels changes between releases", x = NULL, y = "Net change in species count" ) + plot_theme ``` Two practical takeaways emerge from this comparison: - Most families do not change at all, which helps preserve compatibility with analyses built on the previous release. - The largest localized revision occurs around the `Camptostoma` label, while several other families gain a single species. ## 5. What this means for users For most workflows, the 2026 update is a refinement rather than a disruptive schema change. The implications are straightforward: - If you are reproducing an analysis built with `aves_peru_2025_v5`, most names remain unchanged and directly comparable. - If your data include any of the removed species, you should re-run matching with `search_avesperu()` to align them with the current checklist. - If you work with vagrants, endemics, or recent country records, the 2026 release is especially relevant because these are the categories with the most visible shifts. In short, `aves_peru_2026_v1` preserves continuity with the previous checklist while incorporating a small but meaningful set of taxonomic and occurrence updates.