--- title: "Getting Started with panelbuild" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting Started with panelbuild} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # Introduction `panelbuild` provides tools for auditing, validating, and preparing panel datasets before statistical analysis. Panel datasets often contain duplicate unit-time observations, missing time periods, irregular gaps, and imbalance. These issues can affect fixed effects models, difference-in-differences designs, event studies, and other panel-data methods. The goal of `panelbuild` is to help users identify these issues before estimation. # Load the package ```{r} library(panelbuild) ``` # Example panel dataset `panelbuild` includes a small example dataset called `example_panel`. ```{r} data(example_panel) example_panel ``` The dataset intentionally includes: - a duplicate unit-time observation - missing unit-time cells - an unbalanced panel structure This makes it useful for demonstrating panel-data diagnostics. # Audit the panel The main function is `audit_panel()`. ```{r} audit_panel(example_panel, id = id, time = year) ``` This gives a quick overview of the panel structure, including whether the panel is balanced and whether there are missing or duplicate unit-time cells. # Find duplicate observations Duplicate unit-time observations are a common problem in panel datasets. ```{r} duplicate_summary(example_panel, id = id, time = year) ``` # Summarize gaps `gap_summary()` identifies missing time periods by panel unit. ```{r} gap_summary(example_panel, id = id, time = year) ``` # Flag row-level issues `flag_panel_issues()` adds diagnostic flags to the data. ```{r} flag_panel_issues(example_panel, id = id, time = year) ``` # Complete a panel grid `complete_panel()` creates a complete unit-time grid. It does not impute missing outcome values. Because `complete_panel()` requires unique unit-time cells, we first remove duplicate id-time observations from the example dataset. ```{r} example_panel_unique <- example_panel |> dplyr::distinct(id, year, .keep_all = TRUE) complete_panel(example_panel_unique, id = id, time = year) ``` # Typical workflow A typical `panelbuild` workflow is: ```{r, eval = FALSE} library(panelbuild) audit_panel(my_data, id = unit_id, time = year) duplicate_summary(my_data, id = unit_id, time = year) gap_summary(my_data, id = unit_id, time = year) clean_data <- my_data |> dplyr::distinct(unit_id, year, .keep_all = TRUE) complete_panel(clean_data, id = unit_id, time = year) ``` # Summary `panelbuild` is designed to provide a transparent and reproducible workflow for panel-data quality assurance. Use it before fitting panel models, difference-in-differences designs, event studies, or other longitudinal-data analyses.