%\documentclass[10pt,a4paper]{article}
\documentclass[nojss]{jss}

\usepackage[utf8]{inputenc}

%\usepackage{a4wide}
\usepackage{amsfonts}
%\setlength{\parskip}{0.5ex plus0.1ex minus0.1ex}
%\setlength{\parindent}{0em}

%\usepackage[round,longnamesfirst]{natbib}
%\usepackage{hyperref}
%\usepackage{verbatim}
%%% for tabulars
%\usepackage{rotating}
%\usepackage{multirow}


%\newcommand{\strong}[1]{{\normalfont\fontseries{b}\selectfont #1}}
\newcommand{\class}[1]{\mbox{\textsf{#1}}}
\newcommand{\func}[1]{\mbox{\texttt{#1()}}}
%\newcommand{\code}[1]{\mbox{\texttt{#1}}}
%\newcommand{\pkg}[1]{\strong{#1}}
\newcommand{\samp}[1]{`\mbox{\texttt{#1}}'}
%\newcommand{\proglang}[1]{\textsf{#1}}
\newcommand{\set}[1]{\mathcal{#1}}

\newcommand{\mat}[1]{\mathbf{#1}}

%\usepackage{Sweave}
%% \VignetteIndexEntry{Visualizing Association Rules: Introduction to arulesViz}

%\title{Visualizing Association Rules: Introduction to the R-extension Package \pkg{arulesViz}}

\author{Michael Hahsler\\Southern Methodist University \And
	Sudheer Chelluboina\\Southern Methodist University}
\title{Visualizing Association Rules: Introduction to the R-extension Package \pkg{arulesViz}}
%% for pretty printing and a nice hypersummary also set:
\Plainauthor{Michael Hahsler, Sudheer Chelluboina} %% comma-separated
\Plaintitle{Visualizing Association Rules: Introduction to the R-extension Package rulesViz}
\Shorttitle{Visualizing Association Rules} %% a short title (if necessary)

%% an abstract and keywords
\Abstract{
Association rule mining is a popular data mining method available in R as the
extension package~\pkg{arules}.  However, mining association rules often results
in a very large number of found rules, leaving the analyst with the task to go
through all the rules and discover interesting ones. Sifting manually through
large sets of rules is time consuming and strenuous. Visualization has a long
history of making large data sets better accessible using techniques like
selecting and zooming. In this paper we present the R-extension
package~\pkg{arulesViz} which implements several known and novel visualization
techniques to explore association rules.
With examples we show how these visualization techniques can be used to analyze a data set.
    }
\Keywords{data mining, association rules, visualization}
\Plainkeywords{data mining, association rules, visualization} %% without formatting


\Address{
    Michael Hahsler\\
    Engineering Management, Information, and Systems\\
    Lyle School of Engineering\\
    Southern Methodist University\\
    P.O. Box 750123 \\
    Dallas, TX 75275-0123\\
    E-mail: \email{mhahsler@lyle.smu.edu}\\
    URL: \url{http://lyle.smu.edu/~mhahsler}
}


\begin{document}
<<echo=FALSE>>=
options(width = 75)
@
%% ------------------------------------------------------------------
%% ------------------------------------------------------------------
%\title{Visualizing Association Rules: Introduction to the R-extension Package \pkg{arulesViz}}
%\author{Michael Hahsler and Sudheer Chelluboina}
\maketitle
\sloppy
%% ------------------------------------------------------------------
%% ------------------------------------------------------------------
%\begin{abstract}
%Association rule mining is a popular data mining method available in R as the
%extension package~\pkg{arules}.  However, mining association rule often results
%in a very large number of found rules, leaving the analyst with the task to go
%through all the rules and discover interesting ones. Sifting manually through
%large sets of rules is time consuming and strenuous. Visualization has a long
%history of making large data sets better accessible using techniques like
%selecting and zooming. In this paper we present the R-extension
%package~\pkg{arulesViz} which implements several known and novel visualization
%techniques to explore association rules.
%With examples we show how these visualization techniques can be used to analyze a data set.
%\end{abstract}

%% ------------------------------------------------------------------
%% ------------------------------------------------------------------
%\clearpage
%\tableofcontents
%\clearpage
%% ------------------------------------------------------------------
%% ------------------------------------------------------------------

\section{Introduction}

Many organizations generate a large amount of transaction data
on a daily basis.
For example, a
department store like ``Macy's'' stores customer shopping information at a
large scale using check-out data.  Association rule mining is one of the major
techniques to detect and extract useful information from large scale transaction
data. Mining association rules
was fist introduced by ~\cite{arules:Agrawal+Imielinski+Swami:1993}
and can formally be defined as:

Let $I=\{i_1, i_2,\ldots,i_n\}$ be a set of $n$ binary attributes called
\emph{items}.  Let $\set{D} = \{t_1, t_2, \ldots, t_m\}$ be a set of
transactions called the \emph{database}. Each transaction in~$\set{D}$ has an
unique transaction ID and contains a subset of the items in~$I$.  A \emph{rule}
is defined as an implication of the form $X \Rightarrow Y$ where $X, Y
\subseteq I$ and $X \cap Y = \emptyset$.  The sets of items (for short
\emph{itemsets}) $X$ and $Y$ are called \emph{antecedent} (left-hand-side or
LHS) and \emph{consequent} (right-hand-side or RHS) of the rule.
Often rules are restricted to only a single item in the consequent.

\emph{Association rules} are rules which surpass a user-specified
minimum support
and minimum confidence threshold. The \emph{support}~$\mathrm{supp}(X)$ of an itemset~$X$
is defined as the proportion of transactions in the data set which contain the
itemset and the \emph{confidence} of a rule is defined
$\mathrm{conf}(X\Rightarrow Y) = \mathrm{supp}(X \cup Y) / \mathrm{supp}(X)$.
Therefore, an association rule $X\Rightarrow Y$ will satisfy:
$$\mathrm{supp}(X\cup Y) \ge \sigma$$
and
$$\mathrm{conf}(X\Rightarrow Y) \ge \delta$$
where $\sigma$ and $\delta$ are the minimum support and minimum confidence,
respectively.

Another popular measure for association rules used throughout this paper
is \emph{lift}~\citep{arules:Brin+Motwani+Ullman+Tsur:1997}.
The lift of a rule is defined as
$$\mathrm{lift}(X \Rightarrow Y) = \mathrm{supp}(X\cup Y)/(\mathrm{supp}(X)\mathrm{supp}(Y))$$
and can be interpreted as the deviation of the support of the whole
rule from the support expected under independence given the supports
of both sides of the rule.
Greater lift values ($\gg 1$) indicate stronger associations.
Measures like support, confidence and lift are generally called
interest measures because they help with focusing on
potentially more interesting rules.


For a more detailed treatment of association rules we refer the reader to
the in introduction paper for package~\pkg{arules}
\citep{arulesViz:Hahsler:2010, arulesViz:Hahsler:2005}
and the literature referred to there.

Association rules are typically generated in a two-step process. First, minimum
support is used to generate the set of all {\em frequent itemsets} for the data
set. Frequent itemsets are itemsets which satisfy the minimum support
constraint. Then, in a second step, each frequent itemsets is used to generate
all possible rules from it and all rules which do not satisfy the minimum
confidence constraint are removed.
Analyzing this process, it is easy to see that in
the worst case we will generate
$2^n-n-1$ frequent itemsets
with more than two items
from a database with $n$ distinct items.
%Then, if we restrict oursleves to single-item consequents, we can generate a
%maximum of
%$$\sum_{k=2}^{n}k\binom{n}{k} = n(2^{n-1}-1)$$
%\marginpar{check}
Since each frequent itemset will in the worst case generate at least two rules,
we will end up with a set of rules in the order of $O(2^n)$.
Typically, increasing minimum support is used to keep the number of association
rules found at a manageable size. However, this also removes potentially
interesting rules with less support. Therefore, the need to deal with
large sets of association rules is
unavoidable when applying association rule mining in a real setting.

Visualization is successfully used to communicate both abstract and concrete
ideas in many areas like education, engineering and
science~\citep{arulesViz:Prangsmal:2009}.
According to \cite{arulesViz:Chen:2008}, the application of visualization
falls into two phases.
First, the exploration phase where the
analysts will use graphics that are mostly incompatible for presentation
purposes but make it easy to find interesting and important features
of the data. The amount of interaction needed during exploration is very high
and includes filtering, zooming and rearranging data.
After key findings are discovered in the data, these findings must be presented
in a way suitable for presentation for a larger audience.
In this second phase it is important that the analyst can manipulate the
presentation to clearly highlight the findings.

Many researchers introduced visualization techniques
like scatter plots, matrix visualizations, graphs, mosaic plots
and parallel coordinates plots to analyze association rules
(see \cite{arulesViz:Bruzzese:2008} and \cite{arulesViz:Jentner:2019} for a recent overview).
This paper discusses existing techniques and
demonstrates how their implementation in \pkg{arulesViz}
can be used via a simple unified interface.
We extend most plots using techniques of color shading and reordering to
improve their interpretability.
Finally, this paper also introduces a completely new method
called ``grouped matrix-based visualization'' which is based on a novel
way of clustering rules.
Clustering rules is usually based on distances defined on the items included
in the rules or on shared transactions covered by the rules. However,
here we cluster rules especially for visualization using similarities between
sets of values of a selected interest measure.

The rest of the paper is organized as follows.
In Section~\ref{sec:prep} we give a very short example
of how to prepare data using package~\pkg{arules} and then introduce
the unified interface provided by~\pkg{arulesViz} for association
rule visualization.
In Sections~\ref{sec:scatter} to \ref{sec:doubledecker} we describe
the different visualization techniques and give examples.
Most of the techniques are enhanced using color shading and reordering.
Grouped matrix-based visualization in Section~\ref{sec:grouped} is
a novel visualization technique.
In Section~\ref{sec:comp} compares the presented visualization techniques.
Section~\ref{sec:conclusion} concludes the paper.


\section{Data preparation and unified interface of arulesViz}
\label{sec:prep}

Before we start, we set the number of displayed significant digits to 
two to make the output easier to read, and we set the seed for the random
number generator for predictability.

<<>>=
options(digits = 2)
set.seed(1234)
@

To use \pkg{arulesViz} we fist have to load the package.
The package automatically loads other needed packages like
\pkg{arules}~\citep{arulesViz:Hahsler:2010}
for handling and mining association rules.
For the examples in this paper we load
the ``Groceries'' data set which is included in \pkg{arules}.

<<>>=
library("arulesViz")
data("Groceries")
@

Groceries contains sales data from a local grocery store with 9835 transactions
and 169 items (product groups). The summary shows some basic statistics of
the data set. For example, that the data set is rather sparse
with a density just above 2.6\%, that
``whole milk'' is the most popular item and that the average transaction
contains less than 5 items.

<<>>=
summary(Groceries)
@

Next we mine association rules using the Apriori algorithm implemented
in \pkg{arules}.

<<>>=
rules <- apriori(Groceries, parameter = list(support = 0.001, confidence = 0.5))
rules
@

The result is a set of \Sexpr{length(rules)} association rules.
The top three rules
with respect to the lift measure,
a popular measure of rule strength, are:

<<>>=
inspect(head(rules, n = 3, by = "lift"))
@

However, it is clear that going through all the \Sexpr{length(rules)} rules
manually is not a viable
option. We therefore will introduce different visualization techniques
implemented in~\pkg{arulesViz}.
All implemented visualization techniques share the
following interface:

<<>>=
args(getS3method("plot", "rules"))
@

where \code{x} is the set of rules to be visualized,
\code{method} is the visualization method,
and \code{measure} and \code{shading} contain the interest measures
used by the plot.
Further arguments are described in the manual page.

Using \code{engine},
different plotting engines can be specified to render the plot.
the default engine typically uses \pkg{grid}, many plots can also be 
rendered using the engine \code{"htmlwidget"} resulting in an interactive 
HTML widget.

In the following sections we will introduce the different visualization
methods implemented in \pkg{arulesViz} and demonstrate how easy it is to
use them.

\section{Scatter plot}
\label{sec:scatter}

A straight-forward visualization of association rules is to use a scatter plot
with two interest measures on the axes.
Such a presentation can be found already in an early paper by
\cite{arulesViz:Bayardo:1999} when they discuss \emph{sc-optimal rules.}

The default method for \func{plot} for association rules
in \pkg{arulesViz}
is a scatter plot using support and confidence on the axes.
In addition a third measure (default: lift) is used as the color (gray level)
of the points. A color key is provided to the right of the plot.

<<scatterplot1, fig=TRUE, height=5, include=FALSE>>=
plot(rules)
@

\begin{figure}
\centering
\includegraphics[width=10cm]{arulesViz-scatterplot1}
\caption{Default scatter plot.\label{fig:scatterplot1}}
\end{figure}

This plot for the rules mined in the previous section is shown in
Figure~\ref{fig:scatterplot1}. We can see
that rules with high lift
have typically a relatively low support.
\cite{arulesViz:Bayardo:1999} argue that
the most interesting rules (sc-optimal rules)
reside on the support/confidence border, which can be clearly seen in this
plot. We will show later how the interactive features of this
plot can be used to explore these rules.

Any measure stored in the quality slot of the set of rules can be used
for the axes (vector of length 2 for parameter \code{measure})
or for color shading~(\code{shading}).
The following measures are available for our set of
rules.
<<>>=
head(quality(rules))
@

These are the default measures generated by Apriori. To add other measures
we refer the reader to the function \func{interestMeasure}
included in \pkg{arules}.
For example we can customize the plot by switching lift and confidence:

<<scatterplot2, fig=TRUE, height=5, include=FALSE>>=
plot(rules, measure = c("support", "lift"), shading = "confidence")
@

Figure~\ref{fig:scatterplot2} shows this plot with lift on the y-axis. Here it
is easy to identify all rules with high lift.

\begin{figure}
\centering
\includegraphics[width=10cm]{arulesViz-scatterplot2}
\caption{Scatter plot with lift on the y-axis.\label{fig:scatterplot2}}
\end{figure}

\cite{arulesViz:Unwin:2001} introduced a special version of a scatter plot
called \emph{Two-key plot.} Here support and confidence are used for the x and
y-axes and the color of the points is used to indicate ``order,'' i.e.,
the number of items contained in the rule. Two-key plots can be produced
using the unified interface by:

<<scatterplot3, fig=TRUE, height=5, include=FALSE>>=
plot(rules, method = "two-key plot")
@

\begin{figure}
\centering
\includegraphics[width=10cm]{arulesViz-scatterplot3}
\caption{Two-key plot.\label{fig:scatterplot3}}
\end{figure}

The resulting Two-key plot is shown in Figure~\ref{fig:scatterplot3}.
From the plot it is clear that order and support have a very strong
inverse relationship, which is a known fact for association rules
\citep{arulesViz:Seno:2005}.


In addition to using order for shading, we also give the plot a different
title (\code{main}).
Other control options including \code{pch} (best with filled symbols: 20--25),
\code{cex}, \code{xlim} and \code{ylim} are available and work in the
usual way expected by R-users.

For exploration,
the scatter plot method offers interactive features for selecting
and zooming. Interaction
is activated using \code{interactive=TRUE}.

<<eval=FALSE>>=
sel <- plot(rules,
  measure = c("support", "lift"), shading = "confidence",
  interactive = TRUE
)
@

Interactive features include:
\begin{itemize}
\item Inspecting individual rules by selecting them and clicking the
inspect button.
\item Inspecting sets of rules
by selecting a rectangular region of the plot and clicking the
inspect button.
\item Zooming into a selected region (zoom in/zoom out buttons).
\item Filtering rules using the measure used for shading
by clicking the filter button and selecting a cut-off point
in the color key. All rules with a measure lower than the cut-off
point will be filtered.
\item Returning the last selection for further analysis (end button).
\end{itemize}

%% fixme - new screenshot
\begin{figure}
\centering
\includegraphics[width=11cm]{scatterplot_interactive.png}
\caption{Interactive mode for scatter plot (inspecting rules with
high lift).\label{fig:interaction}}
\end{figure}

The result of an
example interaction is shown in Figure~\ref{fig:interaction}.
Using a box selection the rules with the highest lift are selected. Using the
inspect button, the rules are displayed in the terminal below the plotting
device.

\section{Matrix-based visualizations}
\label{sec:matrix-based}
Matrix-based visualization techniques organize the
antecedent and consequent itemsets
on the x and y-axes, respectively. A selected interest measure
is displayed at the intersection of the antecedent and
consequent of a given rule. If no rule is available for a
antecedent/consequent combination the intersection area is left blank.

Formally, the visualized matrix is constructed as follows.
We start with the set of association rules
$$\set{R} = \{
    \langle a_1,c_1,m_1\rangle, \ldots
    \langle a_i,c_i,m_i\rangle, \ldots
    \langle a_n,c_n,m_n\rangle\}$$
where $a_i$ is the antecedent, $c_i$ is the consequent and $m_i$ is the
selected interest measure
for the $i$-th rule for $i = 1,\ldots,n$.
    In $\set{R}$ we identify the set of $K$ unique
antecedents and $L$ unique consequent. We create a $L \times K$ matrix
$\mat{M}$ with one column for each unique antecedent and one row for each
unique consequent.  Finally, we populate the matrix by setting
$\mat{M}_{lk} = m_i$ for $i=1,\ldots,n$ and $l$ and $k$ corresponding
to the position of $a_i$ and $c_i$ in the matrix.
Note that $\mat{M}$ will contain many empty cells since
many potential association rules
will not meet the required minimum thresholds on
support and confidence.

\cite{arulesViz:Ong:2002} presented a version of the
matrix-based visualization
technique where a 2-dimensional matrix is used and the interest measure
is represented by color shading of squares at the
intersection.
An alternative visualization option is to use 3D bars at the intersection
\citep{arulesViz:Wong:1999,arulesViz:Ong:2002}.

For this type of visualization the number of rows/columns depends on the
number of unique itemsets in the consequent/antecedent in the set of
rules. Since large sets of rules typically have a large number
of different itemsets as antecedents (often not much smaller than the number of rules themselves), the size of the colored squares
or the 3D bars gets very small and hard to see.
We reduce the number of
rules here by filtering out all rules with a low confidence score.

<<>>=
subrules <- rules[quality(rules)$confidence > 0.8]
subrules
@

<<matrix1, fig=TRUE, height=3, width=10, include=FALSE, results=hide>>=
plot(subrules, method = "matrix", measure = "lift")
@

The resulting plot is shown in Figure~\ref{fig:matrix1}.
Since there is not much space
for long labels in the plot, we only show numbers as
labels for rows and columns and the complete itemsets are printed to
the terminal for look-up. We omit the complete output here,
since this plot and the next few plots print several hundred labels
to the screen. The output looks like:

\begin{verbatim}
Itemsets in Antecedent (lhs)
  [1] "{liquor,red/blush wine}"
  [2] "{curd,cereals}"
  [3] "{yogurt,cereals}"
  [4] "{butter,jam}"
  [5] "{soups,bottled beer}"
	    (lines omitted)
  [343] "{tropical fruit,root vegetables,rolls/buns,bottled water}"
  [344] "{tropical fruit,root vegetables,yogurt,rolls/buns}"
Itemsets in Consequent (rhs)
  [1] "{bottled beer}"     "{whole milk}"       "{other vegetables}"
  [4] "{tropical fruit}"   "{yogurt}"           "{root vegetables}"
\end{verbatim}

%The visual impression can be improved by
%reordering rows and columns in the matrix
%such that rules with similar values of the interest measure are presented
%closer together.
%This removes some of the fragmentation in the
%matrix display and therefore makes it easier to see structure.%

%<<matrix2, fig=TRUE, height=3, width=10, results=hide, include=FALSE>>=
%plot(subrules, method="matrix", measure="lift", control=list(reorder=TRUE))
%@
%
%In the resulting plot in Figure~\ref{fig:matrix2} we
%see the emergence of two large

%blocks of rules with two different consequents and then smaller blocks
%for the rest. With the interactive mode (\code{interactive=TRUE})
%colored squares can be selected
%and the rule it represents will be displayed on the screen.
%This can
%be used to explore different antecedents which have a similar impact on
%the same consequent in terms of the measure used in the plot.
%
%Reordering is done using the \pkg{seriation}
%package~\citep{arulesViz:Hahsler:2008} which provides a wide array of
%ordering methods for multivariate data. Typically, finding the optimal
%order subject to some defined objective function
%is a difficult combinatorial optimization problem which involves
%for $n$ objects to check many of the $O(n!)$ possible permutations.
%However, for visualization purposes
%some suboptimal but fast heuristics are acceptable and it is
%often sufficient to try several methods to find the most useful representation.
%However, it is useful to understand the different methods
%and the objective function they try to optimize.
%Some available useful methods are:
% \begin{itemize}
% \item "PCA" -- First principal component. Uses the first principal component
% for the data matrix and for the transposed matrix as order for
%     rows and columns. This is a very fast approach which avoids the expensive
%     distance matrix computation.
% 
% \item "TSP" -- Traveling salesperson problem solver. Computes distance matrices
%     between rows and between columns and solves two separate TSPs. By default
%     the nearest insertion heuristic is used. This method is reasonably fast for
%     small rule sets, but the distance matrix computation and the associated
%     memory requirements make it impractical for larger sets.
% 
% \item "HC" -- Hierarchical clustering. Computes distance matrices
%     for rows and columns and
%     then clusters twice. Distance matrix computation is again limiting
%     the approach for smaller sets.
% 
% \item "max", "avg" and "median" -- Reorder rows/columns by their maximum,
%     average or median value. This extremely simple and fast methods
%     provides sometimes good visualizations.
% \end{itemize}
% 
% Other available methods include "BEA", "MDS", "OLO", "GW", "ARSA" and "BBURCG".
% We refer the interested reader to \cite{arulesViz:Hahsler:2008} for
% detailed description of these methods.
% Different seriation
% methods can be selected via \code{reorderMethod} in the control list
% (default: TSP). For the methods which first compute dissimilarity matrices,
% the control list argument \code{reorderDist} can be used to specify the
% used measure (default: Euclidean distance).
% 
% Most seriation methods cannot handle missing values.
% However, the visualized matrix typically contains many missing values since
% minimum support discards of many rules during rule mining.  We use a very crude
% imputation approach by replacing the missing values in the matrix (values for
% rules not contained in the rules set) by a fixed value (0 by default).

\begin{figure}
\centering
\includegraphics[width=\linewidth]{arulesViz-matrix1}
\caption{Matrix-based visualization with colored squares.
\label{fig:matrix1}}

%\vspace{5mm}
%
%\includegraphics[width=\linewidth]{arulesViz-matrix2}
%\caption{Matrix-based visualization with colored squares (reordered).
%\label{fig:matrix2}}
\end{figure}


An alternative representation is to use
3D bars (method ``matrix'' with engine ``3d'') instead of colored rectangles.

<<matrix3D1, fig=TRUE, results=hide, include=FALSE>>=
plot(subrules, method = "matrix", engine = "3d", measure = "lift")
@

\begin{figure}
%\begin{minipage}{.48\linewidth}
\centering
\includegraphics[width=.48\linewidth]{arulesViz-matrix3D1}
\caption{Matrix-based visualization with 3D bars.
\label{fig:matrix3D1}}
%\end{minipage}
%\begin{minipage}{.48\linewidth}
%\includegraphics[width=\linewidth]{arulesViz-matrix3D2}
%\caption{Matrix-based visualization with 3D bars (reordered).
%\label{fig:matrix3D2}}
%\end{minipage}
\end{figure}

The 3D visualization is shown in Figure~\ref{fig:matrix3D1}.

\section{Grouped matrix-based visualization}
\label{sec:grouped}

Matrix-based visualization is limited in the number of
rules it can visualize effectively
since large sets of rules typically also have large sets of
unique antecedents/consequents. Here we introduce a new visualization
techniques~\citep{arulesViz:Hahsler2011, arulesViz:Hahsler2016c}
that enhances matrix-based visualization using grouping of rules
via clustering to handle a larger number of rules. Grouped rules
are presented as an aggregate in the matrix and can be explored
interactively by zooming into and out of groups.

A direct approach to cluster itemsets is to define a distance metric
between two itemsets $X_i$ and $X_j$.
A good choice is the Jaccard distance defined as

$$d_\mathrm{Jaccard}(X_i, X_j) = 1 -
\frac{|X_i \cap X_j|}{|X_i \cup X_j|}.$$

The distance simply is the number of items that $X_i$ and $X_j$ have in common
divided by the number of unique items in both sets.  For a set of $m$ rules we
can calculate the $m(m-1)/2$ distances and use them as the input for
clustering.  However, using clustering on the itemsets directly has several
problems. First of all, data sets typically mined for association rules are
high-dimensional, i.e., contain many different items. This high dimensionality
is carried over to the mined rules and leads to a situation referred is as the
``course of dimensionality'' where, due to the exponentially increasing volume,
distance functions lose their usefulness. The situation is getting worse
since minimum support used in association rule mining leads in addition to
relatively short rules resulting in extremely sparse data.

Several approaches to cluster association rules and itemsets to address the
dimensionality and sparseness problem were proposed in the literature.
\cite{arulesViz:Toivonen:1995}, \cite{arulesViz:Gupta:1999} and
\cite{arulesViz:Berrado:2007} propose clustering association rules by looking at
the number of transactions which are covered by the rules.
Using common covered
transactions avoids the problems of clustering sparse, high-dimensional binary
vectors. However, it introduces a strong bias towards clustering rules which
are generated from the same frequent itemset.  By definition of frequent
itemsets, two subsets of a frequent itemset will cover many common
transactions.  This bias will lead to mostly just rediscovering the already
known frequent itemset structure from the set of association rules.
%Yan 2005 suggests KL-divergence as distance between patterns.

Here we pursue a completely different approach.
We start with the matrix
$\mat{M}$
defined in Section~\ref{sec:matrix-based} which
contains the values of a selected interest measure of the rules
in set $\set{R}$. The columns/rows are the unique antecedents/consequents
in $\set{R}$, respectively.
Now grouping antecedents becomes the problem of grouping columns
in $\mat{M}$. To group the column vectors
fast and efficient into $k$ groups we use $k$-means clustering.
The default interest measure used is lift. The idea is that
antecedents that are statistically dependent on the same consequents
are similar and thus can be grouped together.
Compared to other clustering approaches for itemsets,
this method enables
us to even group antecedents containing substitutes
(e.g., butter and margarine)
which are rarely purchased together
since they will have similar dependence to the same consequents.

The same grouping method can be used for consequents. However,
since the mined rules
are restricted to a single item in the
consequent
there is no problem with combinatorial explosion and
such a grouping is typically not necessary.

To visualize the grouped matrix we use a balloon plot with antecedent
groups as columns and consequents as rows (see Figure~\ref{fig:clusterplot1}).
The color of the balloons
represent the aggregated interest measure in the group
with a certain consequent and the size of the balloon shows
the aggregated support. The default aggregation function is the median
value in the group. The number of antecedents
and the most important (frequent) items in the group are displayed
as the labels for the columns.
Furthermore, the columns and rows in the plot are reordered such that
the aggregated interest measure is decreasing from top down
and from left to right, placing the most interesting group in the
top left corner.

The matrix visualization with grouped antecedents
for the set of $\Sexpr{length(rules)}$ rules mined earlier
can be easily created by
<<clusterplot1, fig=TRUE, include=FALSE, height=8>>=
plot(rules, method = "grouped")
@

The resulting visualization is shown in Figure~\ref{fig:clusterplot1}.
The group of most interesting rules according to lift (the default measure)
are shown in the top-left corner of the plot. There are 3 rules
which contain ``Instant food products'' and up to 2 other items in the
antecedent and the consequent is ``hamburger meat.''

\begin{figure}
\centering
\includegraphics[width=12cm]{arulesViz-clusterplot1}
\caption{Grouped matrix-based visualization.\label{fig:clusterplot1}}
\end{figure}

To increase the number of groups we can change $k$ which defaults to 20.

<<clusterplot2, fig=TRUE, width=10, include=FALSE, height=8>>=
plot(rules, method = "grouped", control = list(k = 50))
@

The resulting, more detailed plot is shown in Figure~\ref{fig:clusterplot2}.

\begin{figure}
\centering
\includegraphics[width=20cm, angle=90]{arulesViz-clusterplot2}
\caption{Grouped matrix with $k=50$.\label{fig:clusterplot2}}
\end{figure}


An interactive version of the grouped matrix visualization is also available.

<<eval=FALSE>>=
sel <- plot(rules, method = "grouped", interactive = TRUE)
@

Here it is possible to zoom into groups and to inspect
the rules contained in a selected group.

\section{Graph-based visualizations}

Graph-based techniques~\citep{arulesViz:Klemettinen:1994,arulesViz:Rainsford:2000,arulesViz:Buono:2005,arulesViz:Ertek:2006} visualize association rules
using vertices and edges where vertices annodated with item labels
represent items, and itemsets 
or rules are reptesented as a second set of vertices. Items are connected
with itemsets/rules using arrows. For rules arrows pointing from
items to rule vertices indicate LHS items and an arrow from 
a rule to an item indicates the RHS.
Interest measures are typically added to the plot by using 
color or size of the vertices representing the itemsets/rules.

Graph-based visualization offers a very clear representation of rules but they
tend to
easily become cluttered and thus are only viable for very small sets of rules.
For the following plots we select the 10 rules with the highest lift.

<<>>=
subrules2 <- head(rules, n = 10, by = "lift")
@

\pkg{arulesViz} contains several graph-based visualizations
rendered using
either the \emph{igraph} library via package
\pkg{igraph}~\citep{arulesViz:Csardi2006}
or
the interface to the \emph{GraphViz} software in
package \pkg{Rgraphviz}~\citep{arules:Gentry:2010}.
By default igraph is used.
The following 
plot represents items and rules as vertices connecting them with directed edges
(shown in Figure~\ref{fig:graph1}).
This representation focuses on how the rules are composed of individual items
and shows which rules share items.


<<graph1, fig=TRUE, width=10, height=10, include=FALSE>>=
plot(subrules2, method = "graph")
@

\begin{figure}
\centering
\includegraphics[width=\linewidth]{arulesViz-graph1}
\caption{Graph-based visualization with items and rules as vertices.
\label{fig:graph1}}
\end{figure}


An interactive visualization is available in \pkg{arulesViz},
however,
the built-in graph based visualizations are only
useful for small set of rules.  To explore large sets of rules with graphs,
advanced interactive features like zooming, filtering, grouping and coloring
nodes are needed. Such features are available in interactive visualization
and exploration platforms for
networks and graphs like \emph{Gephi}~\citep{arules:Bastian:2009}.
From \pkg{arulesViz} graphs for sets of association rules can be exported
in the GraphML format
or as a Graphviz dot-file
to be explored in tools like Gephi. For example the
1000 rules with the highest lift are exported by:

<<eval=FALSE>>=
saveAsGraph(head(rules, n = 1000, by = "lift"), file = "rules.graphml")
@

Figure~\ref{fig:gephi} shows a screenshot of exploring these rules
interactively. Rules can be explored by zooming, filtering and coloring
vertices and edges.

\begin{figure}
\centering
\includegraphics[width=\linewidth]{gephi_1}
\caption{Visualization of 1000 rules with Gephi (Fruchterman Reingold layout, vertex and label size is proportional to the in-degree, i.e., the number of
rules the consequent participates in).
\label{fig:gephi}}
\end{figure}


\section{Parallel coordinates plot}

Parallel coordinates plots are designed to visualize multidimensional data
where each dimension is displayed separately on the x-axis and the
y-axis is shared. Each data point is represented by a line
connecting the values for each
dimension.
%%% MFH: what does the following mean?
Parallel coordinates plots were used previously to visualize
discovered classification rules \citep{arulesViz:cviz}
and association rules \citep{arulesViz:Yang:2003}. \cite{arulesViz:Yang:2003}
displays the items on the y-axis as nominal values and the x-axis represents
the positions in a rule, i.e., first item, second item, etc.  Instead of a
simple line an arrow is used where the head points to the consequent item.
Arrows only span enough positions on the x-axis to represent all the items in
the rule, i.e., rules with less items are shorter arrows.

<<pc1, fig=TRUE, include=FALSE, width=10>>=
plot(subrules2, method = "paracoord")
@

Figure~\ref{fig:pc1} shows a parallel coordinates plot for 10 rules.
The width of the arrows represents support and the intensity of the color
represent confidence.
It is obvious that
for larger rule sets
visual analysis
becomes difficult since with an increasing number of rules also
the number of crossovers
between the lines increases \cite{arulesViz:Yang:2003}.

The number of crossovers can be significantly reduced by reordering the items
on the y-axis. Reordering the items to minimize the number of
crossovers is a combinatorial problem with $n!$ possible permutations.
However, for visualization purposes a suboptimal but fast solution is
typically acceptable. We applies a variation of the well known
2-opt heuristic
\cite{arulesViz:Bently:1990} for travelers salesman problem
to the reordering problem. The objective function is to minimize the
number of crossovers. The simple heuristic uses the following steps:
\begin{enumerate}
\item Choose randomly two items and exchange them if it improves the
objective function.
\item Repeat step 1 till no improvement is found for a predefined number
of tries.
\end{enumerate}

Reordering is achieved with \code{reorder=TRUE}.
<<pc2, fig=TRUE, include=FALSE, width=10>>=
plot(subrules2, method = "paracoord", control = list(reorder = TRUE))
@

Figure~\ref{fig:pc2}
shows the parallel coordinates plot with reordered items
to reduce crossovers.

\begin{figure}
\centering
\includegraphics[width=15cm]{arulesViz-pc1}
\caption{Parallel coordinate plot.\label{fig:pc1}}

\includegraphics[width=15cm]{arulesViz-pc2}
\caption{Parallel coordinate plot (reordered).\label{fig:pc2}}
\end{figure}

\section{Double Decker plots}
\label{sec:doubledecker}

A double decker plot is a variant of a mosaic plot.  A mosaic plot
displays a contingency table using tiles on a rectangle created by recursive
vertical and horizontal splits. The size of each tile is proportional to the
value in the contingency table. Double decker plots use
only a single horizontal split.

\cite{arulesViz:Hofmann:2000} introduced
double decker plots to visualize a single association rule.
Here the displayed contingency table is computed for a rule by counting
the occurrence frequency for each subset of items in the antecedent and
consequent from the original data set.
The items in the antecedent are used for the vertical splits
and the consequent item is used for horizontal highlighting.

We randomly choose a single rule and visualize it with a double decker plot.
<<>>=
oneRule <- sample(rules, 1)
inspect(oneRule)
@

<<doubledecker1, fig=TRUE, include=FALSE>>=
plot(oneRule, method = "doubledecker", data = Groceries)
@
\begin{figure}
\centering
\includegraphics[width=8cm]{arulesViz-doubledecker1}
\caption{Double decker plot for a the rule
\Sexpr{labels(oneRule)}.\label{fig:doubledecker1}}
\end{figure}

Figure~\ref{fig:doubledecker1} shows the resulting plot.
The area of blocks gives the support and the height of the ``yes'' blocks is
proportional to the confidence for the rules consisting of the antecedent items
marked as ``yes.''
Items that show a significant jump in confidence when changed from
``no'' to ``yes'' are interesting. This is captured
by the interest measure \emph{difference of confidence} defined by
\cite{arulesViz:Hofmann:2001}.

%%% MFH: difference of confidence
%%% MFH: \marginpar{order of antecedents}

\section{Comparison of techniques}
\label{sec:comp}

In this section, we compare the visualization techniques
available in \pkg{arulesViz}
based on the size of the rule set
which can be analyzed, the number of interest measures which are shown
simultaneously,
if the technique offers interaction and reordering
and how intuitive each visualization technique is. Note that most of these
categories are only evaluated qualitatively here, and the results
presented in Table~\ref{tab:comp}
are only meant to guide the user towards the most suitable techniques for
a given application.

Scatterplot (including two-key plots) and grouped matrix plot are capable to
analyze large rule sets. These techniques are interactive to allow
the analyst to zoom and select interesting rules. Matrix-based
can accommodate rule sets of medium size. Reordering can be used to improve
the presentation.
To analyze small rule sets the matrix-based method with 3D bars,
graph-based methods and parallel coordinates
plots are suitable. Graphs for large rule sets can be analyzed using
external tools like Gephi.
Finally, double decker plots only visualize a single rule.

The techniques discussed in this paper can also be
categorized based on the number of interest measures simultaneously
visualized. Most methods can represent two measures and scatter plots
are even able to visualize three measures for each rule in one plot.

Scatter plot and graph based techniques are the most intuitive while
matrix-based visualization with two interest measures, parallel coordinates
and double decker require time to learn how to interpret them correctly.

%%% MFH: add fpvat
%\cite{arulesViz:Carson:2010}, provides good interactive tool for association
%rules to zoom and select and it supports reordering also.

\begin{table}
\centering
{\footnotesize
\begin{tabular}{l@{  }l@{  }c@{  }c@{  }c@{  }c@{  }c}
\hline
{\bf Technique} & {\bf Method} & {\bf Rule set} & {\bf Measures} & {\bf Interactive} & {\bf Reordering} & {\bf Ease of use}\\
\hline
Scatterplot& \tt "scatterplot" & large & 3 & \checkmark &  & ++ \\
Two-Key plot & \tt "scatterplot" & large & 2 + order & \checkmark &  & ++ \\
Matrix-based  & \tt "matrix" & medium & 1 & & \checkmark & 0\\
Matrix-b. (2 measures) & \tt "matrix" & medium & 2 & &  \checkmark & -- --\\
Matrix-b. (3D bar) & \tt "matrix (engine 3d)" &  small & 1 & &  \checkmark & +\\
Grouped matrix & \tt "grouped" &  large & 2 & \checkmark &  \checkmark & 0\\
Graph-based &  \tt "graph" & small & 2 & & & ++\\
Graph-b. (external) &  \tt "graph" & large & 2 &  \checkmark &  \checkmark & +\\
Parallel coordinates & \tt "paracoord" & small & 1 & & \checkmark & --\\
Double decker & \tt "doubledecker" & single rule & (2) & & & --\\
\hline
\end{tabular}
\caption{Comparison of visualization methods for association rules available in
\pkg{arulesViz}.\label{tab:comp}}
}
\end{table}

%
%      \begin{figure}
%      \centering
%      \renewcommand{\arraystretch}{1.2}
%{\footnotesize
%    \begin{tabular}{|l|l|l|p{2.5cm}|}
%    \hline
%    {\bf Method Name} & {\bf Name } & {\bf Figuare 4 } & {\bf Limitation}\\
%    \hline
%        Graph & Graph & - & Limited to Small Number of Itemsets \\
%        Parallel Coordinates & Parallel Coordinates & (g) & Limited to Small Number of Itemsets \\
%        \hline
%        \end{tabular}}
%        \caption{Comparison of Visualization Methods for Itemsets.\label{table:itemcompvistech1}}
%        \end{figure}
%
%

%
%From Tables ~\ref{table:compvistech} and ~\ref{table:itemcompvistech1}, all the methods are limited to one rule or some number of rules this is due to complexity in representation. As mentioned earlier association rule mining algorithms typically generate a large number of rules, matrix shading and cluster plot visualization techniques can accommodate more number of rules when compared to other techniques with this feature matrix shading and cluster plot visualization techniques could be the most effective among all.
%

%\section{To Improve Visualization}
%Visual representations are needed to facilitate exploratory analysis, but there are some difficulties in understanding the visual information in overcrowded and cluttered displays. So, to incorporate more number of rules and to avoid crossover/cluttering problems we are using two types of improvement techniques, one is reodering the items and the other is clustering the items. With this reordering and clustering methods the analyst can only inspect or analyze rules which are interesting.
%
%\subsection{Reordering Items}
%The above mentioned problems are caused due to the unorderedness of data, mostly the data we are using for visualization is not ordered. Basically, data analysis had a problem in arranging the objects to reveal structural information. From \citep{arulesViz:seriate}, they defined this problem as seriation and also proposed methods to overcome the seriation or sequencing problem. As discussed earlier ordering or organising the data can greatly imporve the quality of visualization.
%We integrated these seriation methods with some of our visualization techniques to understand the data in efficient way. We can not apply these seriation methods to all of our techniques since we are preprocessing the generated data differently for each of the visualization techniques. So, to avoid the overcrowded and cluttered displays for other techiniques we ordered the data by sorting or rearranging the data items.
%We applied these seriation methods to matrix shading technique to analyze the rules in more understanding way. As we are using matrix data structure to represent the data, so we applied some seriation methods discussed in \citep{arulesViz:seriate} and after convincing results we extended our work in using and designing ordering methods, which improved the quality of data representation.
%
%\subsection{Clustering Items}
%Too many rules cause overlapping problem for visual interpretation, which is due to limited space. Clustering can help in visualizing and ordering the items, by reducing the large number of association rules \citep{arulesViz:clusrules}, which is achieved by  clustering the rules based on quantitative attributes. For example the rules \{$age=16 \Rightarrow salary=0$\} and \{$age=17 \Rightarrow salary=0$\} after clustering the antecedent items based on consequent, now the rule changed as \{$17<age>16 \Rightarrow salary=0$\}. \citep{arulesViz:disclus}, they are clustering the association rules based on the distance metric and the distance could be found from the quality measures like support, confidence, etc,.
%This clustering can be applied to 3D matrix and cluster plot. As we are using this clustering to accomodate more number of rules and also to find which of the consequent items have the similar antecedent items.
%

\section{Further Reading}

Details on the grouped matrix plot are reported in
\cite{arulesViz:Hahsler2016c}.
Interactive visualization using htmlWidgets with examples is 
describe in \cite{arulesViz:Hahsler2017b}.

\section{Conclusion}
\label{sec:conclusion}

Association rule mining algorithms typically generate a large number of
association rules which poses a major problem for understanding and analyzing
rules.  In this paper we presented several visualization techniques
implemented in \pkg{arulesViz} which can be used to explore and present
sets of association rules.
In addition we present a new interactive visualization method called
grouped matrix-based visualization which can used to effectively
explore large rule sets.
%\bibliographystyle{abbrvnat}
\bibliography{arulesViz,arules}


\end{document}