YouTube content frequently contains emojis, special Unicode
characters, and text in various languages. The tuber
package provides built-in functions for detecting, extracting, and
manipulating emojis without external dependencies.
library(tuber)
# Get comments from a video
comments <- get_all_comments(video_id = "your_video_id")
# Check which comments contain emojis
comments$has_emoji <- has_emoji(comments$textDisplay)
# Count emojis per comment
comments$emoji_count <- count_emojis(comments$textDisplay)
# Filter to emoji-rich comments
emoji_comments <- comments[comments$emoji_count > 0, ]The package provides five main functions for working with emojis:
has_emoji() - Check for emoji presencecount_emojis() - Count emojis in textextract_emojis() - Get emojis from textremove_emojis() - Strip emojis from textBeyond emojis, tuber handles Unicode text
consistently:
safe_utf8() - Ensure UTF-8 encodingYour R environment may not support UTF-8 display. The data is still correct; only the display is affected. Try:
Compound emojis (like family emojis or skin tone modifiers) may be counted as multiple characters. This is due to how Unicode encodes these as sequences of code points.
The emoji pattern covers most common Unicode emoji blocks. Very new emojis added in recent Unicode versions may not be detected until the pattern is updated.