--- title: "Batch Processing & Quota Management" author: "tuber maintainers" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Batch Processing & Quota Management} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE ) ``` ## Understanding YouTube API Quotas The YouTube Data API v3 uses a quota system to manage usage. Every project is allocated 10,000 quota units per day by default. ### Quota Costs by Operation Different operations have different quota costs: | Operation | Quota Cost | Example Function | |-----------|-----------|------------------| | Read (list, search) | 1 unit | `get_video_details()`, `yt_search()` | | Search | 100 units | `yt_search()` | | Write (insert) | 50 units | `create_playlist()`, `upload_video()` | | Update | 50 units | `update_video_metadata()` | | Delete | 50 units | `delete_videos()`, `delete_comments()` | | Upload video | 1,600 units | `upload_video()` | **Important:** Search operations (`yt_search()`, `yt_topic_search()`) cost 100 units each, so you can only perform 100 searches per day with the default quota! ### Tracking Quota Usage tuber provides built-in quota tracking: ```{r quota-tracking, eval=FALSE} library(tuber) # Check current quota usage quota_status <- yt_get_quota_usage() print(quota_status) # View details quota_status$quota_used # Units used today quota_status$quota_limit # Your daily limit (default: 10,000) quota_status$quota_remaining # Units remaining quota_status$reset_time # When quota resets (midnight Pacific Time) ``` ### Setting Custom Quota Limits If you've requested a quota increase from Google, update your limit: ```{r set-quota, eval=FALSE} # Set a custom quota limit yt_set_quota_limit(50000) # Reset quota tracking (e.g., after midnight) yt_reset_quota() ``` ## Authentication Strategies tuber supports two authentication methods, each with different use cases. ### API Key Authentication Best for **read-only public data**: - Searching videos - Getting public video statistics - Fetching channel information - Reading public comments ```{r api-key, eval=FALSE} # Set up API key (get one from Google Cloud Console) yt_set_key("YOUR_API_KEY") # Use API key for read operations video_stats <- get_video_details( video_ids = "dQw4w9WgXcQ", auth = "key" ) channel_stats <- get_channel_stats( channel_ids = "UCuAXFkgsw1L7xaCfnd5JJOw", auth = "key" ) ``` ### OAuth2 Authentication Required for: - Accessing private/unlisted content - Writing data (creating playlists, uploading videos) - Deleting resources - Accessing user's own channel data ```{r oauth, eval=FALSE} # Set up OAuth2 (opens browser for authentication) yt_oauth("YOUR_CLIENT_ID", "YOUR_CLIENT_SECRET") # Write operations require OAuth create_playlist( title = "My New Playlist", description = "A collection of favorites", status = "private" ) # Access your own channel's unlisted videos list_my_videos() ``` ### Which Authentication to Use? | Task | API Key | OAuth2 | |------|---------|--------| | Search public videos | Yes | Yes | | Get public video stats | Yes | Yes | | Get public channel info | Yes | Yes | | Read public comments | Yes | Yes | | Access unlisted videos (own) | No | Yes | | Access private videos (own) | No | Yes | | Create/update playlists | No | Yes | | Upload videos | No | Yes | | Delete content | No | Yes | | Manage comments | No | Yes | ## Batch Processing When working with many videos or channels, batch processing is essential for efficiency. ### Processing Multiple Videos ```{r batch-videos, eval=FALSE} # Get details for multiple videos at once video_ids <- c("dQw4w9WgXcQ", "M7FIvfx5J10", "kJQP7kiw5Fk") # Single API call for multiple videos (more efficient) videos <- get_video_details( video_ids = video_ids, part = c("snippet", "statistics"), auth = "key" ) # Analyze results head(videos) ``` ### Using Bulk Analysis Functions tuber provides high-level functions for common analysis tasks: ```{r bulk-analysis, eval=FALSE} # Comprehensive video analysis analysis <- bulk_video_analysis( video_ids = video_ids, include_comments = FALSE, auth = "key" ) # Access results analysis$video_data # Detailed video information analysis$benchmarks # Performance percentiles analysis$summary # Overall statistics # Channel analysis channel_analysis <- analyze_channel( channel_id = "UCuAXFkgsw1L7xaCfnd5JJOw", max_videos = 50, auth = "key" ) # Compare multiple channels comparison <- compare_channels( channel_ids = c("UC1", "UC2", "UC3"), metrics = c("subscriber_count", "video_count", "view_count"), auth = "key" ) ``` ### Pagination Handling tuber automatically handles pagination for large result sets: ```{r pagination, eval=FALSE} # Request more items than API allows per page (50) # tuber automatically makes multiple API calls playlist_items <- get_playlist_items( playlist_id = "PLrAXtmErZgOeiKm4sgNOknGvNjby9efdf", max_results = 200, # Will make 4 API calls auth = "key" ) # Get many comments with automatic pagination comments <- get_comment_threads( filter = c(video_id = "dQw4w9WgXcQ"), max_results = 500, # Will paginate automatically auth = "key" ) ``` ## Extracting Data from Results YouTube API responses contain nested JSON. Here's how to work with them. ### Using simplify = TRUE (Default) Most functions flatten nested data into data frames: ```{r extract-simple, eval=FALSE} # Get simplified output videos <- get_video_details( video_ids = "dQw4w9WgXcQ", simplify = TRUE, auth = "key" ) # Access fields directly videos$title videos$viewCount videos$likeCount videos$channelTitle ``` ### Using simplify = FALSE for Complex Data When you need the full nested structure: ```{r extract-raw, eval=FALSE} # Get raw API response videos_raw <- get_video_details( video_ids = "dQw4w9WgXcQ", simplify = FALSE, auth = "key" ) # Navigate nested structure video <- videos_raw$items[[1]] video$snippet$title video$snippet$thumbnails$high$url video$statistics$viewCount video$contentDetails$duration ``` ### Common Field Access Patterns ```{r field-patterns, eval=FALSE} # Video details videos$snippet.title # Title videos$snippet.description # Description videos$statistics.viewCount # View count videos$statistics.likeCount # Like count videos$contentDetails.duration # Duration (ISO 8601) # Channel details channels$snippet.title # Channel name channels$statistics.subscriberCount channels$statistics.videoCount channels$brandingSettings.channel.description # Comment extraction comments$snippet.topLevelComment.snippet.textDisplay comments$snippet.topLevelComment.snippet.authorDisplayName comments$snippet.topLevelComment.snippet.likeCount ``` ## Error Handling & Retries ### Using with_retry for Transient Errors ```{r retry, eval=FALSE} # Automatic retry with exponential backoff result <- with_retry( get_video_details(video_ids = "dQw4w9WgXcQ", auth = "key"), max_retries = 3, base_delay = 1 ) ``` ### Handling Quota Exhaustion ```{r quota-handling, eval=FALSE} # Check before making requests quota <- yt_get_quota_usage() if (quota$quota_remaining < 100) { warning("Low quota! Consider waiting until reset at: ", quota$reset_time) } # Wrap expensive operations tryCatch({ results <- yt_search(term = "R programming", max_results = 50) }, error = function(e) { if (grepl("quota", e$message, ignore.case = TRUE)) { message("Quota exceeded. Try again after: ", yt_get_quota_usage()$reset_time) } }) ``` ### Rate Limiting Best Practices ```{r rate-limiting, eval=FALSE} # Add delays between requests video_ids <- c("id1", "id2", "id3", "id4", "id5") results <- lapply(video_ids, function(vid) { Sys.sleep(0.5) # 500ms delay between requests get_video_details(video_ids = vid, auth = "key") }) ``` ## Caching for Performance tuber includes built-in caching for frequently accessed data: ```{r caching, eval=FALSE} # Configure cache tuber_cache_config( enabled = TRUE, max_size = 100, ttl = 3600 # 1 hour TTL ) # Cached functions (no API call if recently fetched) cats <- list_videocats_cached(auth = "key") langs <- list_langs_cached(auth = "key") regions <- list_regions_cached(auth = "key") channel <- get_channel_info_cached(channel_id = "UCxyz", auth = "key") # Check cache status tuber_cache_info() # Clear cache when needed tuber_cache_clear() ``` ## Practical Examples ### Example 1: Analyze a Channel's Performance ```{r example-channel, eval=FALSE} # Full channel analysis analysis <- analyze_channel( channel_id = "UCuAXFkgsw1L7xaCfnd5JJOw", max_videos = 100, auth = "key" ) # Summary statistics cat("Channel:", analysis$channel_info$title, "\n") cat("Subscribers:", analysis$channel_info$subscriberCount, "\n") cat("Average views:", analysis$performance_metrics$avg_views_per_video, "\n") cat("Engagement rate:", analysis$performance_metrics$engagement_rate, "\n") ``` ### Example 2: Trending Analysis ```{r example-trending, eval=FALSE} # Analyze trending topics trends <- analyze_trends( search_terms = c("machine learning", "AI", "data science"), time_period = "month", max_results = 25, region_code = "US", auth = "key" ) # View trend summary print(trends$trend_summary) # Most trending term best_trend <- trends$trend_summary[1, ] cat("Top trending:", best_trend$search_term, "\n") cat("Total views:", best_trend$total_views, "\n") ``` ### Example 3: Efficient Video Processing ```{r example-batch, eval=FALSE} # Get all videos from a playlist playlist_videos <- get_playlist_items( playlist_id = "PLrAXtmErZgOeiKm4sgNOknGvNjby9efdf", max_results = 100, auth = "key" ) # Extract video IDs video_ids <- sapply(playlist_videos$items, function(x) { x$contentDetails$videoId }) # Get detailed stats for all videos in one call video_stats <- get_video_details( video_ids = video_ids, part = c("statistics", "contentDetails"), auth = "key" ) # Analyze performance total_views <- sum(as.numeric(video_stats$viewCount), na.rm = TRUE) avg_duration <- mean(as.numeric(video_stats$duration), na.rm = TRUE) ``` ## Troubleshooting ### Common Issues **"quotaExceeded" error:** - Check quota with `yt_get_quota_usage()` - Wait until `reset_time` or request quota increase from Google **"forbidden" error:** - Ensure YouTube Data API is enabled in Google Cloud Console - Check API key/OAuth credentials are correct - Verify the resource isn't private **"videoNotFound" or empty results:** - Video may be private, deleted, or region-restricted - Double-check the video/channel ID format **Rate limiting (429 errors):** - Add delays with `Sys.sleep()` between requests - Use `with_retry()` for automatic backoff ### Getting Help ```{r help, eval=FALSE} # Check function documentation ?get_video_details ?yt_search ?with_retry # View package vignettes browseVignettes("tuber") ```