Type: | Package |
Date: | 2021-06-04 |
Title: | Data, Functions and Support Materials from the Book "industRial Data Science" |
Version: | 0.1.0 |
Description: | Companion package to the book "industRial data science", J.Ramalho (2021) https://j-ramalho.github.io/industRial/. Provides data sets and functions to complete the case studies and contains the book original Rmd files and tutorials. |
URL: | https://github.com/J-Ramalho/industRial |
BugReports: | https://github.com/J-Ramalho/industRial/issues |
License: | GPL (≥ 3) |
Encoding: | UTF-8 |
LazyData: | true |
Imports: | ggplot2, stats, dplyr, tidyr, magrittr, rlang, lattice, SixSigma |
Depends: | R (≥ 3.5.0) |
RoxygenNote: | 7.1.1 |
Suggests: | glue, tibble, stringr, scales, purrr, janitor, patchwork, forcats, broom, viridis, learnr, DoE.base, qcc, car, qicharts2, rsm, ggforce, ggraph, tidygraph, igraph, bookdown, rmarkdown, knitr, agricolae, RcmdrMisc, gt, skimr, ggtext |
NeedsCompilation: | no |
Packaged: | 2021-06-10 15:12:26 UTC; joao |
Author: | Joao Ramalho [aut, cre] |
Maintainer: | Joao Ramalho <ramalho.joao@protonmail.com> |
Repository: | CRAN |
Date/Publication: | 2021-06-11 09:40:02 UTC |
industRial: companion package to the book "industRial data science"
Description
This package contains datasets and toy functions to run the examples from the book "industRial data science". It also contains all the book original Rmd files and the learnr Rmd original tutorial files.
Author(s)
João Ramalho
References
For complete case studies refer to https://j-ramalho.github.io/industRial/
Charging time of a lithium-ion battery.
Description
A data set with charging time in hours required to recharge a lithium-ion battery based on a full factorial design of experiment with four variables (A, B, C, D) coded as +/- 1. Design effects are coded as numerical variables in order to allow to build models without coding the contrasts and then to make predictions on a continuous range from -1 to +1.
- A
Variable A (numerical)
- B
Variable B (numerical)
- C
Variable B (numerical)
- D
Variable B (numerical)
- Replicate
The independent repeat of each unique factor combination.
- charging_time
Battery charging time [h]
Usage
battery_charging
Format
A tibble with 32 observations on 6 variables.
Source
Original data set.
References
For a complete case study application refer to https://j-ramalho.github.io/industRial/.
Examples
data(battery_charging)
head(battery_charging)
# Building a linear model:
battery_lm <- lm(
formula = charging_time ~ A * B * C,
data = battery_charging
)
summary(battery_lm)
Create a capability chart for statistical process control
Description
Generate a histogram type chart from a set of consecutive measurements.
Usage
chart_Cpk(data)
Arguments
data |
A dataset generated by the function |
Details
This type of chart is typically applied in product manufacturing to monitor
deviations from the target value over time. It is usually accompanied by
the statistical process control time series chart_I
and
chart_IMR
Value
This function returns an object of class ggplot
References
For a complete case study application refer to https://j-ramalho.github.io/industRial/
Create IMR chart for statistical process control
Description
Generate a single point time series chart from a set of consecutive measurements.
Usage
chart_I(data)
Arguments
data |
A dataset generated by the function |
Details
This type of chart is typically applied in product manufacturing to monitor
deviations from the target value over time. It is usually accompanied by
the chart_IMR
Value
This function returns an object of class ggplot
References
For a complete case study application refer to https://j-ramalho.github.io/industRial/
Create R MR chart for statistical process control
Description
Generate a moving range chart chart from a set of consecutive measurements.
Usage
chart_IMR(data)
Arguments
data |
A dataset generated by the function |
Details
This type of chart is typically applied in product manufacturing to monitor
deviations from the target value over time. It is usually accompanied by
the chart_IMR
Value
This function returns an object of class ggplot
References
For a complete case study application refer to https://j-ramalho.github.io/industRial/
Collection of visual defects on watch dial production.
Description
This data set contains observations of visual defects present
in watch dials such as indentations and scratches taken during production.
It provides a practical case to establish pareto charts typically with a
function like paretochart
.
- Operator
The shop floor operator collecting the data
- Date
Data collection date
- Defect
Defect type ("Indent", "Scratch")
- Location
Position on the watch dial refered to as the hour (1h, 2h)
- id
Part unique id number
Usage
dial_control
Format
An object of class tibble with 58 observations on 4 variables.
Source
Original data set.
References
For a complete case study application refer to https://j-ramalho.github.io/industRial/.
Examples
head(dial_control)
Cycles to failure of ebikes frames after temperature treatment.
Description
A data set with the results of aging tests on several groups of ebikes frames (g1, g2, ...). Each entry corresponds to the number of cycles to failure for each level of treatment temperature-
- temperature
Position of the part on the device
- g1
group 1, remaining groups have names g2 to g5
Usage
ebike_hardening
Format
A tibble with 4 observations on 6 variables.
Details
The ebike_hardening2 dataset contains alternative data that gives non significant results in the analysis of variance study.
Source
Original data set.
References
For a complete case study application refer to https://j-ramalho.github.io/industRial/.
Examples
data(ebike_hardening)
Formula expansion
Description
Takes a linear model formula and returns it expanded version.
Usage
expand_formula(formulae)
Arguments
formulae |
Takes as input object of class formula, e.g.: Y ~ A * B, see ?formula for syntax details |
Details
Supports verification and understanding of the creation of linear models syntax such as *,+ and other conventions.
Value
Returns a character vector such as A + B + A:B
References
For an example application refer to https://j-ramalho.github.io/industRial/
Dry matter content of different juices obtained with two different measurement devices.
Description
This data set contains laboratory measurements of the dry matter content of different fruit juices obtained with two different measurement devices. One of the devices is considered the reference (REF) and the other one is a new device (DRX) on which a linearity and bias study has to be performed.
- product
The juice base fruit ("Apple", "Beetroot")
- drymatter_TGT
Target drymatter content in [g]
- speed
Production line speed
- particle_size
Dry matter powder particle size [micrometers]
- part
Part number
- drymatter_DRX
Drymatter content measured with device DRX
- drymatter_REF
Drymatter content measured with reference device
Usage
juice_drymatter
Format
An object of class tibble with 108 observations on 7 variables.
Source
Adapted from a real gage bias and linearity study performed in 2021 on industrial beverages dry matter content measurement. The structure of the data corresponds to a full factorial design of 5 factors (3 with 3 levels and 2 with 2 levels).
References
For a complete case study application refer to https://j-ramalho.github.io/industRial/.
Examples
library(dplyr)
# Calculate the bias between the new device and the reference:
juice_drymatter <- juice_drymatter %>% dplyr::mutate(bias = drymatter_DRX - drymatter_REF)
# Establish the analysis of variance:
juice_drymatter_aov <- aov(
bias ~ drymatter_TGT * speed * particle_size,
data = juice_drymatter)
summary(juice_drymatter_aov)
Calculate percentage of out of specification for Statistical Process Control
Description
This function takes process variables and calculates the probability that parts are produced out of specification on the long run.
Usage
off_spec(UCL, LCL, mean, sd)
Arguments
UCL |
the process upper control limit |
LCL |
the process lower control limit |
mean |
the process mean |
sd |
the process standard deviation |
Value
This function returns an object of class numeric
References
For a complete case study application refer to https://j-ramalho.github.io/industRial/
Examples
off_spec(100, 0, 10, 3)
Correlation matrix of the input variables of an experiment design in perfume formulation.
Description
The data set contains the expected correlation (expressed in 1 to 10) of an experiment anonymized input variables. The dataset consists in a double entry table with the same variables in row and column. It is coded as a tibble but subsequent utilization in network plots requires it to be converted to a matrix format.
Usage
perfume_experiment
Format
A tibble with 22 observations on 23 variables.
Source
Original data set.
References
For a complete case study application refer to https://j-ramalho.github.io/industRial/.
Examples
data(perfume_experiment)
Tensile strength values on PET raw material for the clothing industry.
Description
Measurements of tensile strength of two different deliveries of PET raw material used in the clothing industry. The two data sets follow approximately a normal distribution.
- A
Tensile strenght measurements for product A [Mpa] (numeric)
- B
Tensile strenght measurements for product B [Mpa] (numeric)
Usage
pet_delivery
Format
An object of class tibble with 28 observations on 2 variables.
Source
Original data set.
References
For a complete case study application refer to https://j-ramalho.github.io/industRial/.
Examples
data(pet_delivery)
A factorial design for the improvement of PET film tensile strength.
Description
The data corresponds to full factorial design with two factors coded as +/- and 3 replicates for each combination.
- A
PET formulation A (factor)
- B
PET formulation B (factor)
- replicate
the measurement replicate I to III (factor)
- yield
the output variable measured on the PET, (numerical)
Usage
pet_doe
Format
An object of classes design and data.frame with 12 observations of 4 variables.
Source
Original data set generated with the function
fac.design
form the package DoE.base.
References
For a complete case study application refer to https://j-ramalho.github.io/industRial/
Examples
data(pet_doe)
contrasts(pet_doe$A)
Calculate process capability index for Statistical Process Control
Description
This function takes process variables and calculates the Cpk index which is a measure of the process centering and variability against specification.
Usage
process_Cpk(UCL, LCL, mean, sd)
Arguments
UCL |
the process upper control limit |
LCL |
the process lower control limit |
mean |
the process mean |
sd |
the process standard deviation |
Value
This function returns an object of class numeric
References
For a complete case study application refer to https://j-ramalho.github.io/industRial/
Examples
process_Cpk(100, 0, 10, 3)
Calculate summary statistics for Statistical Process Control
Description
This function takes process variables and calculates summary statistics and presents them in a easy readable table format.
Usage
process_stats(data, part_spec_percent)
Arguments
data |
This function takes the dataset tablet_thickness cleaned with the clean_names function from the janitor package |
part_spec_percent |
the process tolerance in percentage. |
Value
This function returns an object with class tibble (tbl_df)
References
For a complete case study application refer to https://j-ramalho.github.io/industRial/
Summary statistics table outputs for Statistical Process Control
Description
This function takes summary statistics and presents them in a easy readable table format.
Usage
process_stats_table(data)
Arguments
data |
A data set generated by the function |
Value
This function returns an object with classes gt_tbl and list
References
For a complete case study application refer to https://j-ramalho.github.io/industRial/
Yearly outputs and fills factor of solarcells of different types.
Description
A dataset with the energy output resulting from tests on solarcells made of three different configurations. The fill factor provides an indication of the cell quality and is a non controlled variable that can be taken into consideration in an analysis of covariance to better assess the output variation from material to material.
- material
The solar cell material (character)
- output
he yearly energy output (numberic)
- fillfactor
The fill factor measured for each cell (numberic)
Usage
solarcell_fill
Format
A tibble with 15 observations of 3 variables.
Source
Original data set.
References
For a complete case study application refer to https://j-ramalho.github.io/industRial/.
Examples
hist(solarcell_fill$output)
Yearly outputs of solarcells of different types.
Description
A dataset with the energy output resulting from tests on solarcells made of three different raw materials / configurations.
- material
The solar cell type (character)
- run
The test run (numberic)
- T-10
The yearly output for the test result at temperature of 10°C
- T20
The yearly output for the test result at temperature of 20°C
- T50
The yearly output for the test result at temperature of 50°C
Usage
solarcell_output
Format
A tibble with 12 observations of 5 variables.
Source
Original data set.
References
For a complete case study application refer to https://j-ramalho.github.io/industRial/.
Examples
data(solarcell_output)
Gage R & R plots
Description
Extracts stand alone plots from the ss.rr function of the SixSigma package.
Usage
ss.rr.plots(
var,
part,
appr,
lsl = NA,
usl = NA,
sigma = 6,
data,
main = "Six Sigma Gage R&R Study",
sub = "",
alphaLim = 0.05,
errorTerm = "interaction",
digits = 4
)
Arguments
var |
Measured variable |
part |
Factor for parts |
appr |
Factor for appraisers (operators, machines, ...) |
lsl |
Numeric value of lower specification limit used with USL to calculate Study Variation as %Tolerance |
usl |
Numeric value of upper specification limit used with LSL to calculate Study Variation as %Tolerance |
sigma |
Numeric value for number of std deviations to use in calculating Study Variation |
data |
Data frame containing the variables |
main |
Main title for the graphic output |
sub |
Subtitle for the graphic output (recommended the name of the project) |
alphaLim |
Limit to take into account interaction |
errorTerm |
Which term of the model should be used as error term (for the model with interation) |
digits |
Number of decimal digits for output |
Details
This is a modified version of the function ss.rr
from the SixSigma package that allows to extract the individual plots from
the output report. The input arguments of the function are the same
as the original function. See the original function help with ?ss.rr for
full documentation.
Value
Generates a list output that can be assigned to a user created variable. The plots can then be accessed with the syntax variable$plot1 to plot6.
References
For an example application refer to https://j-ramalho.github.io/industRial/
Production measurements of the inner diameter of syringes barrels.
Description
This dataset contains process control measurements of the barrel diameters of pharmaceutical syringes. The sampling rate is hourly and the sample size is 6 syringes.
- Hour
The sampling hour expressed as Hour1, Hour2 (character)
- Sample1
Syringe diameter of sample 1 (numerical)
- Sample2
Syringe diameter of sample 2 (numerical)
Usage
syringe_diameter
Format
A tibble with 25 observations on 7 variables.
Source
Original data set.
References
For a complete case study application refer to https://j-ramalho.github.io/industRial/.
Examples
data(syringe_diameter)
Thickness measurements of pharmaceutical tablets
Description
This data set contains physical measurements of pharmaceutical tablets (pills) including measurement room conditions. The data and the insights it provides are typical of an industrial context with high production throughput and stringent dimensional requirements.
Usage
tablet_thickness
Format
An object of class tibble with 675 observations on 11 variables
Details
The data set contains other variables not used in the text book related with to the measurement room conditions (not listed).
- Position
Position of the part on the measurement device
- Size
Size class (L, M, S)
- Tablet
Part number (L001, L002, ...)
- Replicate
Measurement replicate, a sequential numbers
- Day
Measurement Day, a sequential numbers
- Date [DD.MM.YYYY]
Measurement date (POSIXct)
- Operator
Operator name (ficticious)
- Thickness [micron]
Tablet thickness (micrometers)
- Temperature [°C]
Room temperature
Source
Based on a gage r&R (gage reproducibility and repeatability) study performed in 2020 on a physical measurement of parts coming out of a high throughput industrial equipment.
References
For a complete case study application refer to https://j-ramalho.github.io/industRial/
Examples
data(tablet_thickness)
Weight measurements of pharmaceutical tablets
Description
This data set contains weight measurements of pharmaceutical tablets (pills). The data and the #' insights it provides are typical of an industrial context with high production throughput and stringent dimensional requirements.
Usage
tablet_weight
Format
An object of class tibble with 137 observations on 3 variables
Details
The data set contains other variables not used in the text book related with to the measurement room conditions (not listed).
- part_id
Unique sequencial identifier given during production (numeric)
- Weight Target Value
Tablet weight target specification value in [mg] (numeric
- Weight Value
Tablet weight measured value [m] (numeric)
Source
Anonymized data based on statistical process control data obtained in a high volume production setup.
References
For a complete case study application refer to https://j-ramalho.github.io/industRial/
Examples
hist(tablet_weight$`Weight value`)
Custom theme "industRial" for the book industRial Data Science plots
Description
This theme aims at optimal balance between readability and precision. It has adapted from the package cowplot by Claus O.Wilke and reflects the principles of his book Fundamentals of Data Visualization
Usage
theme_industRial(
font_size = 14,
font_family = "",
line_size = 0.5,
rel_small = 12/14,
rel_tiny = 11/14,
rel_large = 16/14,
base_size = font_size,
base_family = font_family
)
Arguments
font_size |
defaults to 14 |
font_family |
defaults to "" |
line_size |
defaults to 0.5 |
rel_small |
defaults to 12/14 |
rel_tiny |
defaults to 11/14 |
rel_large |
defaults to 16/14 |
base_size |
internal arguments, defaults to font_size |
base_family |
internal arguments, defaults to font_family |
Details
Apply this theme by adding it at the end of the code of any ggplot
chart.
It basically combines the half open theme with a grid background from cowplot
Value
This function returns an object of classes theme and gg from the ggplot2 package
References
For a complete case study application refer to https://j-ramalho.github.io/industRial/
Examples
library(dplyr)
library(ggplot2)
pet_delivery %>%
ggplot(aes(x = A)) +
geom_histogram(color = "grey", fill = "grey90") +
labs(title = "PET clothing case study",
subtitle = "Raw data plot",
x = "Treatment",
y = "Tensile strength [MPa]") +
theme_industRial()
Custom theme "qcc" for the book industRial Data Science plots
Description
This theme provides a similar look and feel to the package qcc
statistical process control charts (SPC) which have themselves a resemblance with
Minitab charts. This theme aims at providing a layout that is familiar to readers
of Minitab chart to help in reducing transition to R build reports and charts.
Usage
theme_qcc(base_size = 12, base_family = "")
Arguments
base_size |
font size, defaults to 12 |
base_family |
font family defaults to "" |
Details
Apply this theme by adding it at the end of the code of any ggplot
chart.
It #' basically provides a grey background and some highlights to help reading key
process statistics such as the population mean.
Value
This function returns an object of classes theme and gg from the ggplot2 package
References
For a complete case study application refer to https://j-ramalho.github.io/industRial/
Examples
library(dplyr)
library(ggplot2)
pet_delivery %>%
ggplot(aes(x = A)) +
geom_histogram(color = "grey", fill = "grey90") +
labs(title = "PET clothing case study",
subtitle = "Raw data plot",
x = "Treatment",
y = "Tensile strength [MPa]") +
theme_qcc()