Title: | Data for the Book "R by Example" |
Version: | 0.0.100 |
Description: | Data for the examples and exercises in the book "R by Example". Jim Albert and Maria Rizzo (2012, ISBN 978-1-4614-1365-3). |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
Maintainer: | Maria Rizzo <mrizzo@bgsu.edu> |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.1 |
Depends: | R (≥ 2.10) |
LazyData: | true |
URL: | https://github.com/mariarizzo/RbyExample |
NeedsCompilation: | no |
Packaged: | 2024-04-18 19:37:43 UTC; maria |
Author: | Maria Rizzo [aut, cre], Jim Albert [aut] |
Repository: | CRAN |
Date/Publication: | 2024-04-19 10:53:02 UTC |
CPU Speed Data
Description
Maximum Intel CPU speed vs time from 1994 through 2004.
Usage
CPUspeed
Format
27 obs. of 6 variables:
- year
calendar year
- month
month
- day
day
- time
time in years
- speed
Max IA-32 Speed (GHz)
- log10speed
logarithm base 10 of speed
Etruscan-Italian Data
Description
This data provides measurements of ancient Etruscan skulls and modern Italian skulls.
Usage
EtruscanItalian
Format
154 obs. of 2 variables:
- x
skull measurement
- group
character: Etruscan or Italian
Cancer Survival Times Data
Description
Survival times of cancer patients with advanced cancer of the stomach, bronchus, colon, ovary or breast, whose treatment included supplemental ascorbate.
Usage
PATIENT
Format
17 obs. of 5 variables:
- stomach
survival times for stomach cancer patients
- bronchus
survival times for bronchus cancer patients
- colon
survival times for colon cancer patients
- ovary
survival times for ovary cancer patients
- breast
survival times for breast cancer patients
Details
See the text for details on how to input this data directly from the file PATIENT.DAT.
Note
This is the data from "PATIENT.DAT" with column headings added. As input, the data is in wide format and should be stacked (long format) for a one-way ANOVA. See the text for details.
Source
Hand et al. (1994).
References
Cameron and Pauling (1978).
NIST SiRstv Data
Description
Measurements of bulk resistivity of silicon wafers made at NIST with 5 probing instruments on each of 5 days.
Usage
SiRstv
Format
25 obs. of 2 variables:
- Instrument
replicate
- Resistance
resistance
Details
https://www.itl.nist.gov/div898/strd/anova/SiRstv_info.html
Source
https://www.itl.nist.gov/div898/strd/anova/SiRstv.html
References
NIST Standard Reference Datasets: https://www.itl.nist.gov/div898/strd/index.html
Batting Averages 2021
Description
Batting data for all Major League players with at least 300 at-bats for the 2021 season. Data is from the Lahman database available through the Lahman package.
Usage
batting_avg_2021
Format
231 obs. of 5 variables:
- Player
Name of player
- lgID
League
- H
Hits
- AB
At bats
- AVG
Batting average
Baseball Batting History Data
Description
Major League Baseball data on batting; number of hits, doubles, home runs by season. The data was extracted from baseball-reference.com website.
Usage
battinghistory
Format
140 obs. of 27 variables:
- Year
season
- Tms
number of teams
- N.Bat
number of players
- BatAge
batter's average age
- R
runs scored
- G
games played
- PA
plate appearances
- AB
at-bats
- H
hits
- X2B
doubles
- X3B
triples
- HR
home runs
- RBI
runs batted in
- SB
stolen bases
- CS
number caught stealing
- BB
walks
- SO
strikeouts
- BA
batting average
- OBP
on-base percentage
- SLG
slugging percentage
- OPS
OBP plus SLG
- TB
total bases
- GDP
ground into double plays
- HBP
hit by pitches
- SH
sacrifice hits
- SF
sacrifice flies
- IBB
intentional walks
Note
This version of the data is sorted in ascending order of Year. There are missing values, especially in early years.
Source
baseball-reference.com.
Men's and Women's NCAA Basketball Data
Description
Description: Game averages for NCAA basketball
Usage
bball
Format
43 obs. of 20 variables:
- Season
season
- Teams
number of teams
- G
average number of games played
- FG
average number of field goals
- FGA
average number of field goal attempts
FG%
field goal percentage
- 3P
average number of three pointers
- 3PA
average number of three point attempts
- 3P%
three-point percenrage
- FT
average number of free throws
- FTA
average number of free throw attempts
FT%
free-throw percentage
- TRB
average number of total rebounds
- AST
average number of assists
- STL
average number of steals
- BLK
average number of blocks
- TOV
average number of turnovers
- PF
average number of personal fous
- PTS
average number of points scored
- Year
Year season started
- Gender
factor: "M" or "W" (men or women)
Details
The data is from Sports Reference https://www.sports-reference.com/cbb/seasons/game-averages.html
Source
Sports Reference
Men's NCAA Basketball Data
Description
Description: Game averages for NCAA men basketball
Usage
bball.men
Format
77 obs. of 20 variables:
- Season
season
- Teams
number of teams
- G
average number of games played
- FG
average number of field goals
- FGA
average number of field goal attempts
FG%
field goal percentage
- 3P
average number of three pointers
- 3PA
average number of three point attempts
- 3P%
three-point percenrage
- FT
average number of free throws
- FTA
average number of free throw attempts
FT%
free-throw percentage
- TRB
average number of total rebounds
- AST
average number of assists
- STL
average number of steals
- BLK
average number of blocks
- TOV
average number of turnovers
- PF
average number of personal fouls
- PTS
average number of points scored
- Year
Year season started
Details
The data is from Sports Reference https://www.sports-reference.com/cbb/seasons/game-averages.html
Source
Sports Reference
Women's NCAA Basketball Data
Description
Description: Game averages for NCAA women basketball
Usage
bball.women
Format
43 obs. of 20 variables:
- Season
season
- Teams
number of teams
- G
average number of games played
- FG
average number of field goals
- FGA
average number of field goal attempts
FG%
field goal percentage
- 3P
average number of three pointers
- 3PA
average number of three point attempts
- 3P%
three-point percenrage
- FT
average number of free throws
- FTA
average number of free throw attempts
FT%
free-throw percentage
- TRB
average number of total rebounds
- AST
average number of assists
- STL
average number of steals
- BLK
average number of blocks
- TOV
average number of turnovers
- PF
average number of personal fous
- PTS
average number of points scored
- Year
Year season started
Details
The data is from Sports Reference https://www.sports-reference.com/cbb/seasons/game-averages.html
Source
Sports Reference
BGSU Enrollment
Description
BGSU Enrollment
Usage
bgsu
Format
Data frame of selected BGSU enrollment data: 16 obs. of 2 variables
- Year
Year.
- Enrollment
Enrollment.
Source
J. Albert
Brain Size and Intelligence Data
Description
Data from a study comparing brain size and intelligence.
Usage
brainsize
Format
40 obs. of 7 variables:
- Gender
Male or Female.
- FSIQ
Full Scale IQ scores based on four Wechsler (1981) subtests.
- VIQ
Verbal IQ scores based on four Wechsler (1981) subtests.
- PIQ
Performance IQ scores based on four Wechsler (1981) subtests.
- Weight
Body weight in pounds.
- Height
Height in inches.
- MRI_Count
total pixel count from the 18 MRI scans.
Note
There are missing values in Weight (2) and Height (1).
Source
Willerman et al (1991).
College Rating Data
Description
College Rating Data
Usage
college
Format
260 obs. of 11 variables:
- School
Name of Institution.
- Enrollment
Enrollment of Institution.
- Tier
Ranking in tiers 1, 2, 3, 4.
- Retention
Pct. of freshmen who return the following year
- Grad.rate
Pct. of freshmen who graduate in six years
- Pct.20
Pct. of classes with 20 or fewer students
- Pct.50
Pct. of classes with 50 or fewer students
- Full.time
Pct. of faculty hired full-time
- Top.10
Pct. of incoming students who were in top 10% of high school class
- Accept.rate
Acceptance rate of students who apply
- Alumni.giving
Pct. of alumni who contribute financially
Note
There are missing values.
Source
US News and World Report "America's Best Colleges" 2009 report, National Universities.
Draft Lottery Data
Description
Data from the 1970 military draft lottery. The lottery assigned numbers to potential draftees by their birth date. Those with lower draft numbers were drafted first.
Usage
draftlottery
Format
31 obs. of 13 variables
- Day
Day of month.
- Jan
Draft numbers for January birthdays by day of month.
- Feb
Draft numbers for February birthdays by day of month.
- Mar
Draft numbers for March birthdays by day of month.
- Apr
Draft numbers for April birthdays by day of month.
- May
Draft numbers for May birthdays by day of month.
- Jun
Draft numbers for June birthdays by day of month.
- Jul
Draft numbers for July birthdays by day of month.
- Aug
Draft numbers for August birthdays by day of month.
- Sep
Draft numbers for September] birthdays by day of month.
- Oct
Draft numbers for October birthdays by day of month.
- Nov
Draft numbers for November birthdays by day of month.
- Dec
Draft numbers for December birthdays by day of month.
Note
This is the data in "draft-lottery.txt".
References
Moore, David S. and George P. McCabe (1989). Introduction to the Practice of Statistics.
See Fienberg, S. E. (1971), Starr, N. (1997), and "Draft Lottery (1969)", Wikipedia.org for further discussion.
Flicker Data
Description
Critical flicker frequency and iris color of the eye for 19 individuals.
Usage
flicker
Format
19 obs. of 2 variables:
- Colour
Eye colour: Brown, Green, or Blue
- Flicker
Critical flicker frequency in cycles/sec.
Details
Critical flicker frequency is the highest frequency at which the flicker in a flickering light source can be detected by the individual.
Source
http://www.statsci.org/data/general/flicker.txt
https://gksmyth.github.io/ozdasl/general/flicker.html
References
Smyth, Gordon K (2011). Australasian Data and Story Library (OzDASL). https://gksmyth.github.io/ozdasl.
Four Players Home Plate Statistics
Description
Grouped hit and home run data over regions over the zone for four players over the 2018-2023 baseball seasons. From Baseball Savant https://baseballsavant.mlb.com/
Usage
four_players
Format
64 obs. of 12 variables:
- PX
interval of values of plate_x
- PZ
interval of values of plate_z
- BIP
count of balls in play
- H
count of hits
- HR
count of home runs
- H_Rate
hit rate
- HR_Rate
home run rate
- Z_H
z-score of hit rate
- Z_HR
z-score of home run rate
- Player
chr: Player name
- px
midpoint of PX interval
- pz
midpoint of PZ interval
Hubble Space Telescope Data
Description
Distances and velocities measured for 24 galaxies containing Cepheid stars to measure the Hubble constant.
Usage
hubble
Format
24 obs. of 3 variables:
- Galaxy
A label to identify the galaxy (a factor)
- Velocity
Relative velocity in kilometers per second
- Distance
Distance in Mega parsecs
Source
Freedman et al. 2001. The Astrophysical Journal 553:47-72: Tables 4 and 5.
References
Freedman et al. (2001) Final results from the Hubble space telescope key project to measure the Hubble constant. The Astrophysical Journal (553), 47-72. Wood, S.N. (2017) Generalized Additive Models: An Introduction with R. CRC
Massachusetts Lunatics Data
Description
Data from an 1854 survey by the Massachusetts Commission on Lunacy.
Usage
lunatics
Format
14 obs. of 6 variables:
- COUNTY
Name of county.
- NBR
Number of lunatics by county.
- DIST
Distance to nearest mental health center.
- POP
County population 1950 (thousands).
- PDEN
County population density per square mile.
- PHOME
Percent of lunatics cared for at home.
References
J.M. Hunter, "Need and Demand for Mental Health Care: Massachusetts 1854," The Geographic Review, 77:2 (April 1987), pp 139-156.
New York City Marathon Data
Description
Gender, age, and completion time (in minutes) for 276 people who completed the 2010 New York City Marathon.
Usage
nyc.marathon
Format
276 obs. of 3 variables:
- Gender
female or male
- Minutes
Time of runner in minutes
- Age
Age of runner
Peanuts Aflatoxin Data
Description
The peanuts data records levels of a toxin (aflatoxin) in batches of peanuts.
Usage
peanuts
Format
34 obs. of 2 variables:
- Percent
percentage of non-contaminated peanuts in the batch
- Aflatoxin
average level of aflatoxin in parts per billion
Source
Hand et al. (1994)
Poison Survival Data
Description
Survival times in units of 10 hours for animals exposed to different poisons.
Usage
poison
Format
48 obs. of 3 variables:
- Time
survival time in units of 10 hours
- Poison
poison: I, II, III
- Treatment
treatment: A, B, C, D
Source
Box, G. E. P., Hunter, W. G. and Hunter, J. S. (1978), Statistics for Experimenters: An Introduction to Design, Data Analysis, and Model Building, Wiley, New York.
Rounding First Base Data
Description
Times required to round first base for 22 baseball players using three styles: rounding out, a narrow angle and a wide angle. The goal is to determine if the method of rounding first base has a significant effect on times to round first base.
Usage
rounding
Format
66 obs. of 3 variables:
- times
time
- method
factor with 3 levels: NarrowAngle, RoundOut, WideAngle
- block
player ID (integer)
Source
Hollander and Wolfe (1999) Table 7.1, page 274.
Buffalo and Cleveland Snowfall Data
Description
Total snowfall in inches for the cities Buffalo and Cleveland for the seasons 1968-69 through 2008-09.
Usage
snowfall
Format
41 obs. of 3 variables:
- SEASON
character: winter season identified by years
- Cleveland
Cleveland snowfall
- Buffalo
Buffalo snowfall
Statistics Grades
Description
Grades from an undergraduate statistics class at BGSU.
Usage
statgrades
Format
23 obs. of 7 variables:
- ID
Student ID; integer 1:23
- Exam1
Percent grade on Exam 1
- Exam2
Percent grade on Exam 2
- HW
Percent grade on homework
- Final
Percent grade on Final Exam
- Major
Major coded 1, 2, 3
- Group
Group coded 1, 2
Twins IQ Data
Description
Twins IQ Data
Usage
twinIQ
Format
Data frame of Burt's IQ data for twins: 27 obs. of 3 variables
- Foster
IQ of twin raised with foster parents.
- Biological
IQ of twin raised with biological parents.
- Social
Social class of biological parents (high, low, middle)
Source
Burt, C. (1966). The genetic estimation of differences in intelligence: A study of monozygotic twins reared together and apart. Br. J. Psych., 57, 147-153. Data is provided in R packages faraway and UsingR.
Twins Income and Education Levels Data
Description
The data were collected at the 16th Annual Twins Day Festival in Twinsburg, Ohio, in August 1991. 495 adult twins were interviewed. The original study aimed to investigate 'By how much will another year of schooling most likely raise one's income?' Pairs of twins provide a control on confounding factors such as intelligence, family background, etc.
Usage
twins
Format
183 obs. of 16 variables:
- DLHRWAGE
the difference (twin 1 minus twin 2) in the logarithm of hourly wage, given in dollars.
- DEDUC1
the difference (twin 1 minus twin 2) in self-reported education, given in years.
- AGE
Age in years of twin 1.
- AGESQ
AGE squared.
- HRWAGEH
Hourly wage of twin 2.
- WHITEH
1 if twin 2 is white, 0 otherwise.
- MALEH
1 if twin 2 is male, 0 otherwise.
- EDUCH
Self-reported education (in years) of twin 2.
- HRWAGEL
Hourly wage of twin 1.
- WHITEL
1 if twin 1 is white, 0 otherwise.
- MALEL
1 if twin 1 is male, 0 otherwise.
- EDUCL
Self-reported education (in years) of twin 1.
- DEDUC2
the difference (twin 1 minus twin 2) in cross-reported education.
- DTEN
the difference (twin 1 minus twin 2) in tenure, or number of years at current job.
- DMARRIED
the difference (twin 1 minus twin 2) in marital status, where 1 signifies "married" and 0 signifies "unmarried".
- DUNCOV
the difference (twin 1 minus twin 2) in union coverage, where 1 signifies "covered" and 0 "uncovered".
Note
There are 183 cases; 147 complete cases. Twin 1's cross-reported education is the number of years of schooling completed by twin 1 as reported by twin 2. For data analysis, the logarithm of the hourly wage is typically used instead of hourly wage.
Source
Guido Imbens, PhD. UCLA, Department of Economics.
References
Ashenfelter, Orley and Krueger, Alan. "Estimates of the Economic Return to Schooling from a New Sample of Twins." The American Economic Review 84.5 (Dec. 1994) 1157-1173.
Chase Utley's Hitting Data for 2006
Description
Chase Utley's Hitting Data for 2006
Usage
utley2006
Format
160 obs. of 6 variables:
- Game
game
- Date
date
- PA
plate appearances
- AB
at-bats
- R
home runs
- H
hits
Details
During the 2006 baseball season, Chase Utley of the Philadelphia Phillies had a hitting streak of 35 games, which is one of the best hitting streaks in baseball history.
Source
J. Albert
Waste Run-up Data
Description
The 'Waste Run-up' data (Koopmans 1987, p. 86) reports weekly percentage waste of cloth by five different supplier plants of Levi-Strauss, relative to cutting from a computer pattern.
Usage
wasterunup
Format
22 obs. of 5 variables:
- PT1
weekly percentage waste of cloth for Plant 1
- PT2
weekly percentage waste of cloth for Plant 2
- PT3
weekly percentage waste of cloth for Plant 3
- PT4
weekly percentage waste of cloth for Plant 4
- PT5
weekly percentage waste of cloth for Plant 5
Note
There are missing values.
Webpage Hits Data
Description
The number of daily visits to the author's website was obtained using Google Analytics. The data is summarized by week.
Usage
webhits
Format
35 obs. of 2 variables:
- Week
Week number
- Hits
Number of web hits
Source
J. Albert
World Record Mile Data
Description
Mile run world record progression as recorded by the International Amateur Athletics Federation (IAAF). The dataset includes 32 world records for men ratified by the IAAF, and 29 world records for women both in the pre-IAAF and IAAF eras.
Usage
world.record.mile
Format
276 obs. of 3 variables:
- Gender
chr: female or male
- Time
chr: time as "mm:ss"
- mm
num: The whole minutes "mm" part of Time
- ss
num: The seconds "ss" part of Time
- seconds
num: time expressed in seconds
- Athlete
chr: Name
- Nationality
chr: nationality
- Date
chr: date
- Year
num: year
Source
Wikipedia page https://en.wikipedia.org/wiki/Mile_run_world_record_progression