Two categories of Statistics Layers Functions:
geom
. e.g. Bin, Smooth, Quantile, SumAs we seen before in Histograms, we’ve already encountered stats functions when we used geom_histogram()
.
Recall that under the hood, this function called stat = 'bin'
to summarize the total count & proportion in each group.
And we can get those inner data by using ..count..
or ..density..
.
Similarly, as we used geom_bar()
, its default stats is set to 'bin'
.
Let’s take a look at this example, we got 3 same plots with different codes.
As usual, the default of bin equals range/30
. As binwidth becomes larger, the graph becomes smoother.
library(ggplot2)
plot <- ggplot(iris, aes(x = Sepal.Width))
plot + geom_histogram()
plot + geom_bar()
plot + stat_bin()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Here are some other examples:
stat_ | geom_ |
---|---|
stat_bin() | geom_bar() |
stat_bin() | geom_histogram() |
stat_bin() | geom_freqpoly() |
stat_smooth() | geom_smooth() |
stat_boxplot() | geom_boxplot() |
stat_bindplot() | geom_dotplot() |
stat_bin2d() | geom_bin2d() |
stat_binhex() | geom_hex() |
stat_contour() | geom_contour() |
stat_quantile() | geom_quantile() |
stat_sum() | geom_count() |
geom_smooth()
, we call stat_smmoth()
simultaneously.se
: The standard error shown as a gray ribbon behind our smooth, is by default the 95 percent confidence interval.
se = Fales
: To remove this gray ribbon.fullrange
: Whether to show the prediction according your dataset. Error increases the further away from our dataset.method = 'auto'
method = 'loess'
method = 'gam'
Method = 'loess'
span
: Controls alpha (the degree of smoothing or windows size), smaller spans are more noisy.# Customize color
library(RColorBrewer)
myColors <- c(brewer.pal(3, "Dark2"), "black")
ggplot(mtcars, aes(x = wt, y = mpg, col = factor(cyl))) +
geom_point() +
stat_smooth(method = "lm", se = FALSE) +
stat_smooth(method = "loess",
aes(group = 1, col="All"),
se = FALSE, span = 0.7) +
# Add correct arguments to scale_color_manual
scale_color_manual('Cylinders', values = myColors)
library(car)
ggplot(Vocab, aes(x = education, y = vocabulary, col = year)) +
geom_jitter(alpha = 0.1, size = 1) +
stat_smooth(method = 'lm', se = F)
When mapping onto color you can sometimes treat a continuous scale, like year
, as an ordinal variable, but only if it is a regular series.
The better alternative is to leave it as a continuous variable and use the group
aesthetic as a factor to make sure your plot is drawn correctly.
# Specify the invisible group aesthetic so that our linear models are still calculated appropriately.
ggplot(Vocab, aes(x = education, y = vocabulary, col = year, group = factor(year))) +
stat_smooth(method = "lm", se = FALSE, alpha = 0.6, size = 2) +
scale_color_gradientn(colors = brewer.pal(9, "YlOrRd"))
We used the Vocab
dataset and applied linear models describing vocabulary
by education
for different years.
stat_quantile()
calculates quantiles according ypur data and draw them in linear models.ggplot(Vocab, aes(x = education, y = vocabulary, col = year, group = factor(year))) +
# Set the quantiles argument to 0.5 so that only the median is shown.
stat_quantile(alpha = 0.6, size = 2, quantiles = 0.5) +
scale_color_gradientn(colors = brewer.pal(9,"YlOrRd"))
stat_sum()
calculates the total number of overlapping observations and is another good alternative to overplotting.count
of each dot onto size
.ggplot(Vocab, aes(x = education, y = vocabulary)) +
geom_jitter(size = 0.1, alpha = 0.1) +
stat_sum() + # sum statistic
scale_size(range = c(1, 10)) + # set size scale
stat_smooth(method = "loess", aes(col = "LOESS"), se = F) +
stat_smooth(method = "lm", aes(col = "lm"), se = F) +
scale_color_discrete("Model")
As mentioned before, there are two categorical function in Statistics.
It’s time to look at the second group of stats function: Independent Stats.
stat_ | Discription |
---|---|
stat_summary() | Summarise y values at distinct x values. |
stat_function() | Compute y values from a function of x values. |
stat_qq() | Perform calculations for a quantile-quantile plot. |
Here we’ll look at stat_summary()
in action.
geom = 'pointrange'
.fun.data = mean_sdl, fun.args = list(mult = 1)
: Calculate mean & sd.fun.data = mean_cl_normal
: Calculate mean and 95% confidence interval.Before we start, there are some preparation need to be done.
# Require 'Hsimc' package here.
library(Hmisc)
# Convert cyl and am to factors
mtcars$cyl <- as.factor(mtcars$cyl)
mtcars$am <- as.factor(mtcars$am)
# Define positions
posn.d <- position_dodge(width = 0.1)
posn.jd <- position_jitterdodge(jitter.width = 0.1, dodge.width = 0.2)
posn.j <- position_jitter(width = 0.2)
# Base layers
wt.cyl.am <- ggplot(mtcars, aes(x = cyl, y = wt, col = am, fill = am, group = am))
# Mean and 95% CI - the easy way
wt.cyl.am +
geom_point(position = posn.jd, alpha = 0.6) +
# 95% CI: 'fun.data = mean_cl_normal'
# Default geom = 'pointrange'
stat_summary(fun.data = mean_cl_normal, position = posn.d) +
ggtitle('Mean and 95% CI - the easy way')
# Mean and SD - with T-tipped error bars
wt.cyl.am +
# Mean: 'fun.y = mean' -> Uses the y aesthetic
stat_summary(geom = 'point', fun.y = mean,
position = posn.d) +
# sd: 'fun.data = mean_sdl, fun.args = list(mult = 1)'
# fun.args = list(mult = 1) to have a errorbar that spans over one standard deviation in both directions.
stat_summary(geom = 'errorbar', fun.data = mean_sdl,
position = posn.d, fun.args = list(mult = 1), width = 0.1) +
ggtitle('Mean and SD - with T-tipped error bars')
Notice here, quantile()
returns a vector of 5 units: 0%, 25%, 50%, 75%, 100%.
So 1st quartile equals quantile(x)[2]
, 3rd quartile equals quantile(x)[4]
.
# Function to save range for use in ggplot
gg_range <- function(x) {
data.frame(ymin = min(x), # Min
ymax = max(x)) # Max
}
# Function to get quantiles
med_IQR <- function(x) {
data.frame(y = median(x), # Median
ymin = quantile(x)[2], # 1st quartile
ymax = quantile(x)[4]) # 3rd quartile
}
# Five-number summary(the minimum, 1st quartile, median, 3rd quartile, and the maximum) plot
wt.cyl.am +
# 1st quartile, 3rd quartile
stat_summary(geom = 'linerange', fun.data = med_IQR,
position = posn.d, size = 3) +
# minimum, maximum
stat_summary(geom = 'linerange', fun.data = gg_range,
position = posn.d, size = 3,
alpha = 0.4) +
# median
stat_summary(geom = 'point', fun.y = median,
position = posn.d, size = 3,
col = 'black', shape = 'X') +
ggtitle('Five-number Summary Plot')
coord_
function e.g. coord_cartesian()
facet_grid(row ~column)
scale_x_continuous(limits = c(*,*))
geom_smooth()
might be different.xlim(c(*,*))
coord_cartesin(xlim = c(*,*))
plot2 <- ggplot(mtcars, aes(x = wt, y = hp, col = am)) + geom_point() + geom_smooth()
# Original plot
plot2
When using scale_x_continuous(limits = c(*,*))
, you’ll get lots of warnings(do not show here.) and a weird graph.
# Zoom in using scale function.
plot2 +
scale_x_continuous(limits = c(3, 6), expand = c(0, 0))
coord_cartesin(xlim = c(*,*))
is just what we want.
# Zoom in using coord function.
plot2 +
coord_cartesian(xlim = c(3, 6))
Set the aspect ratio of a plot with coord_fixed()
or coord_equal()
. Both use ratio = 1
as a default.
ratio = x/y
base.plot <- ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, col = Species)) +
geom_jitter() +
geom_smooth(method = "lm", se = FALSE)
# Plot base.plot: default aspect ratio
base.plot
# Fix aspect ratio (1:1) of base.plot
base.plot +
coord_equal()
Use coord_polar(theta = '*axis')
to transform a bar char onto a polar coordinate system.
# Create an ordinary bar chart and convert it into pie chart.
wide.bar <- ggplot(mtcars, aes(x = 1, fill = cyl)) +
geom_bar()
wide.bar +
coord_polar(theta = 'y')
# Set limits on x_scale and convert it into a 'ring' type pie chart.
thin.bar <- ggplot(mtcars, aes(x = 1, fill = cyl)) +
geom_bar(width = 0.1) +
scale_x_continuous(limits = c(0.5, 1.5))
thin.bar +
coord_polar(theta = 'y')
The most straightforward way of using facets is facet_grid()
.
Here we just need to specify the categorical variable to use on rows and columns using standard R formula notation (rows ~ columns)
. Remember, when faceting in only one direction us .
to specify nothing for the unused direction.
Here we’ll present a plot with 6 variables in mtcars
# Create cyl_am
mtcars$cyl_am <- paste(mtcars$cyl, mtcars$am, sep = "_")
# Customize color
myCol <- rbind(brewer.pal(9, "Blues")[c(3,6,8)],
brewer.pal(9, "Reds")[c(3,6,8)])
# Plot with multi variables.
ggplot(mtcars, aes(x = wt, y = mpg, col = cyl_am)) +
geom_point() +
scale_color_manual(values = myCol) +
facet_grid(gear ~ vs)
The variables of interest here are name
, which contains the full popular name of each animal, and vore
, the eating behavior. We are going to facet the plot with vore
. Each animal can only be classified under one eating habit, so if we facet according to vore
, we don’t need to repeat the full list in each sub-plot.
Use scale = "free_y"
and space = "free_y"
to leave out rows for which there is no data.
library(tidyr)
library(dplyr)
# Take a look at msleep
head(msleep, 3)
## # A tibble: 3 x 11
## name genus vore order conservation sleep_total sleep_rem sleep_cycle
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 Cheetah Acin~ carni Carn~ lc 12.1 NA NA
## 2 Owl mo~ Aotus omni Prim~ <NA> 17 1.8 NA
## 3 Mounta~ Aplo~ herbi Rode~ nt 14.4 2.4 NA
## # ... with 3 more variables: awake <dbl>, brainwt <dbl>, bodywt <dbl>
# Tidy data for plot
colnames(msleep)[6:7] <- c('total', 'rem')
mamsleep <- msleep %>%
gather(sleep, time, total:rem) %>%
select(vore, name, sleep, time) %>%
filter(!is.na(vore))
# After tidy up
head(mamsleep, 3)
## # A tibble: 3 x 4
## vore name sleep time
## <chr> <chr> <chr> <dbl>
## 1 carni Cheetah total 12.1
## 2 omni Owl monkey total 17
## 3 herbi Mountain beaver total 14.4
plot3 <- ggplot(mamsleep, aes(x = time, y = name, col = sleep)) + geom_point()
# Before dropping levels
plot3 + facet_grid(vore ~ .)
# After dropping levels(free up rows)
plot3 + facet_grid(vore ~ ., scale= 'free_y', space = 'free_y')
element_text()
element_line()
element_rect()
theme()
layer.element_blank()
for not displaying the stuff.And here is our base plot, the most familiar dataset - mtcars.
library(RColorBrewer)
plot4 <- ggplot(mtcars, aes(x = wt, y = mpg, col = factor(cyl))) +
stat_smooth(method = 'lm', se = F) +
scale_color_manual('Cylinders', values = c('#C6DBEF','#6BAED6','#2171B5')) +
geom_jitter(size = 3.5, alpha = 0.7) +
scale_x_continuous('Weight(lb/1000)') +
scale_y_continuous('Miles/(US) gallon') +
facet_grid(.~cyl)
plot4
plot5 <- plot4 +
# Theme to remove all rectangles
theme(rect = element_blank()) +
# Combine custom themes
theme(plot.background = element_rect(fill = '#F7FBFf', color = '#DEEBF7', size = 1))
plot5
plot6 <- plot5 +
# Remove grid
theme(panel.grid = element_blank(),
# Set color of axis.line
axis.line = element_line(color = '#08519C'),
#Set color of axis.ticks
axis.ticks = element_line(color = '#08519C'))
plot6
plot7 <- plot6 +
theme(strip.text = element_text(size = 13, color = '#08519C'),
# 'hjust = 0' to put the text in the bottom left corner
axis.title = element_text(color = '#08519C', hjust = 0, face = 'bold'),
axis.text = element_text(color = '#08519C'),
legend.text = element_text(color = '#08519C'),
legend.title = element_text(color = '#08519C'))
plot7
We can only change the direction & the position of the legend.
legend.position
for position.
legend.position = c(0.85,0.85)
legend.position = 'bottom'
legend.direction
for direction.As for the elements of the legend, we need to change it in geom
layer using scale
or aes
becasue they are data ink.
We need library(grid)
to pass a unit
object in order to adjust position.
plot.margin
for adjusting plot margin: unit(c(top,right,bottom,left), "cm")
panel.spacing
for space between facets.library(grid)
plot8 <- plot7 +
# Change legend location & direction
theme(legend.position = 'bottom', legend.direction= 'horizontal') +
# Adjust the plot margin
theme(plot.margin = unit(c(1,1.5,0.5,1), 'cm')) +
# Increase spacing between facets
theme(panel.spacing.x = unit(1, 'cm'))
plot8
theme()
layers to a variable, then you can reuse it without copying many lines of codes.theme_update()
: When you call it and assign it to an object, that object stores the current default theme, and the arguments update the default theme.theme_set()
: Apply theme with theme_set()
.theme_blue <-
theme(rect = element_blank(),
plot.background = element_rect(fill = '#F7FBFf', color = '#DEEBF7', size = 1),
panel.grid = element_blank(),
axis.line = element_line(color = '#08519C'),
axis.ticks = element_line(color = '#08519C'),
strip.text = element_text(size = 13, color = '#08519C'),
axis.title = element_text(color = '#08519C', hjust = 0, face = 'bold'),
axis.text = element_text(color = '#08519C'),
legend.text = element_text(color = '#08519C'),
legend.title = element_text(color = '#08519C'),
legend.position = 'bottom',
legend.direction= 'horizontal',
plot.margin = unit(c(1,1.5,0.5,1), 'cm'),
panel.spacing.x = unit(1, 'cm'))
Apply the theme by +
.
plot4 + theme_blue
Update the default theme using theme_update()
, and store the current theme to original
.
original <- theme_update(rect = element_blank(),
plot.background = element_rect(fill = '#F7FBFf', color = '#DEEBF7', size = 1),
panel.grid = element_blank(),
axis.line = element_line(color = '#08519C'),
axis.ticks = element_line(color = '#08519C'),
strip.text = element_text(size = 13, color = '#08519C'),
axis.title = element_text(color = '#08519C', hjust = 0, face = 'bold'),
axis.text = element_text(color = '#08519C'),
legend.text = element_text(color = '#08519C'),
legend.title = element_text(color = '#08519C'),
legend.position = 'bottom',
legend.direction= 'horizontal',
plot.margin = unit(c(1,1.5,0.5,1), 'cm'),
panel.spacing.x = unit(1, 'cm'))
plot4
Restore back original
using theme_set()
.
theme_set(original)
plot4
ggthemes
There is a package called ggtheme
which includes many theme set you can choose from, and here are some examples.
You can combine with theme_update()
and theme_set
for default setting.
library(ggthemes)
plot4 + theme_bw()
plot4 + theme_classic()
So far, you’ve learned how to use ggplot2
, it’s time to have a deeper look of figuring out what is the best way to graphically represent our data.
As mentioned before, there are two kinds of bar plots:
Absolute Counts(
stat = 'bin'
)
Distribution(stat = 'identity'
) which is same asgeom_col()
which takes two variables.
The second one is incredibly common and equally terrible.
Let’s turn to mamsleep
for example.
sleep <- spread(mamsleep, sleep, time)
plot9 <- ggplot(sleep, aes(x = vore, y = total)) +
scale_y_continuous("Total Sleep Time (h)",
limits = c(0, 24),
breaks = seq(0, 24, 3),
expand = c(0, 0)) +
scale_x_discrete('Eating Habits', labels = c('Carnivore', 'Herbivore', 'Insectivore', 'Omnivore'))
plot9 +
stat_summary(fun.y = mean, geom = 'bar', fill = 'grey50') +
stat_summary(fun.data = mean_sdl, fun.args = list(mult = 1), geom = 'errorbar', width = 0.2) +
ggtitle('Dynamite Plots of Mammalian Sleep')
mamsleep
data for plotting and assign it to a variable called sleep
.ggplot(sleep, aes(x = vore, y = total))
make the base of this plot.
vore
is a discrete variable on x-axis which means the eating habbit.total
is a continuous variable on y-axis which means how much sleep the animal have.scale_y_continuous()
: Rename the y-axis title and set its limits, breaks ,expand.scale_x_discrete()
: Rename the x-axis title and variables’ names.stat_summary()
use fun.y = mean
to calculate their means.stat_summary()
use fun.data = mean_sdl, fun.args = list(mult = 1)
to calculates their standard deviations.However, there are some subtle problems in this graph:
Bar plots should be avoided in this plot. Let’s change it geom_point()
for more meaningful plot.
plot9 +
geom_point(alpha = 0.6, position = position_jitter(width = 0.2)) +
stat_summary(fun.y = mean, geom = 'point', col = "#E41A1C") +
stat_summary(fun.data = mean_sdl, fun.args = list(mult = 1), geom = 'errorbar', width = 0.2, col = "#E41A1C")
geom_point()
we can now see the strange patterns in this dataset.
sleep
this way ,and bars are simply not necessary!Just now we saw why “dynamite plots” (bar plots with error bars) are not well suited for their intended purpose of depicting distributions.
If you really want that, you can still get dynamite plots. However, you’ll need to set the positions manually.
A point geom will typically serve you much better.
It would be nice to give an impression of the number of values in each group here.
# Tidy data first.
mtcars.cyl <- mtcars %>%
group_by(factor(cyl)) %>%
summarize(wt.avg = mean(wt), sd = sd(wt), n())
colnames(mtcars.cyl)[c(1, 4)] <- c('Cylinders', 'n')
mtcars.cyl <- mutate(mtcars.cyl, prop = n/sum(n))
mtcars.cyl
## # A tibble: 3 x 5
## Cylinders wt.avg sd n prop
## <fct> <dbl> <dbl> <int> <dbl>
## 1 4 2.29 0.570 11 0.344
## 2 6 3.12 0.356 7 0.219
## 3 8 4.00 0.759 14 0.438
ggplot(mtcars.cyl, aes(x = Cylinders, y = wt.avg)) +
geom_col(width = mtcars.cyl$prop, fill = '#7F95B5') +
geom_errorbar(aes(ymin = wt.avg - sd, ymax = wt.avg + sd), width = 0.1, col = '#08306B')
To be brief, here are cons of using pie charts:
Here is the original bar blot for hair color.
HairCol <- as.data.frame(HairEyeColor)
ggplot(HairCol, aes(x = Hair, y = Freq, fill = Hair)) +
scale_fill_manual('Hair', values = c('slategray', 'rosybrown', 'darkred', 'gold')) +
geom_col(position = 'dodge') +
facet_grid(.~Sex)
This is the pie chart version of this data.
There are only two charts so it might not seem that much annoying.
However, imagine that you have lots of pie charts, how could you compare the statistics?
plot10 <- ggplot(HairCol, aes(x = 1, y = Freq, fill = Hair)) +
scale_fill_manual('Hair', values = c('slategray', 'rosybrown', 'darkred', 'gold')) +
scale_y_continuous('Proportion') +
theme_classic() +
theme(rect = element_blank(),
strip.text = element_text(size = 13, face = 'bold'))
plot10 +
geom_col(position = 'fill', width = 0.2) +
scale_x_continuous(limits = c(0.5, 1.1)) +
coord_polar(theta = 'y') +
facet_grid(.~Sex) +
theme(axis.title.y = element_blank(),
line = element_blank(),
axis.text = element_blank())
The preferred way is ‘Fill’ bar plot here.
No matter how many plots you have, it’s much easier to make a comparison.
plot10 +
geom_col(position = 'fill', width = 0.3) +
scale_x_continuous(limits = c(0.5, 1.5)) +
coord_flip() +
facet_grid(Sex~.) +
theme(axis.title.y = element_blank(),
axis.line = element_blank(),
axis.ticks.y = element_blank(),
axis.text.y = element_blank(),
legend.position = 'bottom',
legend.direction= 'horizontal')
Heat maps are a surprisingly popular choice for data visualization, but they are probably one of the least effective types of visualization.
color
for variables which is hard to recognize.Let’s see some alternatives.
The dataset we’re using here is about the yield of some varieties of barley in different site in 1931 & 1932.
We should call it out by library(lattice)
.
library(lattice)
head(barley)
## yield variety year site
## 1 27.00000 Manchuria 1931 University Farm
## 2 48.86667 Manchuria 1931 Waseca
## 3 27.43334 Manchuria 1931 Morris
## 4 39.93333 Manchuria 1931 Crookston
## 5 32.96667 Manchuria 1931 Grand Rapids
## 6 28.96667 Manchuria 1931 Duluth
Use geom_tile()
to make a heat map.
See? It doesn’t provide any meaningful information at first glance.
We use scale_fill_gradientn()
to fill in continuous variables.
scale_fill_brewer(palette = "*")
, scale_fill_manual('col_name', values = *)
for discrete variables.ggplot(barley, aes(x = year, y = variety, fill = yield)) +
geom_tile() + # Geom layer
facet_wrap( ~ site, ncol = 1) + # Facet layer
scale_fill_gradientn(colors = brewer.pal(9, "Reds")) # Adjust colors
First alternative:
ggplot(barley, aes(x = year, y = yield, col = variety, group = variety)) +
geom_line() +
facet_wrap(~site, nrow = 1) +
scale_x_discrete('Year') +
scale_y_continuous('Yield (bushels/acre)') +
scale_color_discrete('Variety')
Second Alternative:
geom = 'errorbar'
need to add some dodging.geom = 'ribbon'
ggplot(barley, aes(x = year, y = yield, col = site, fill = site, group = site)) +
stat_summary(fun.y = mean, geom = 'line', position = position_dodge(width = 0.2)) +
stat_summary(fun.y = mean, geom = 'point',position = position_dodge(width = 0.2)) +
# fun.args = list(mult = 1) to have a errorbar that spans over one standard deviation in both directions.
stat_summary(fun.data = mean_sdl, fun.args = list(mult = 1), geom = 'errorbar', width = 0.1, position = position_dodge(width = 0.2)) +
scale_x_discrete('Year', expand = c(0.05, 0.05)) +
scale_y_continuous('Yield (bushels/acre)') +
scale_color_discrete('Site') +
scale_fill_discrete('Site') +
ggtitle('With Errorbar')
ggplot(barley, aes(x = year, y = yield, col = site, fill = site, group = site)) +
stat_summary(fun.y = mean, geom = 'line') +
stat_summary(fun.data = mean_sdl, fun.args = list(mult = 1), geom = 'ribbon', alpha = 0.1, col = NA) +
scale_x_discrete('Year', expand = c(0.05, 0.05)) +
scale_y_continuous('Yield (bushels/acre)') +
scale_color_discrete('Site') +
scale_fill_discrete('Site') +
ggtitle('With Ribbon')
We’ll use 2009 CHIS adult-response dataset
for data visualization here.
Let’s get familiar with the dataset now.
Column Name | Description |
---|---|
RBMI | BMI Category description |
BMI_P | BMI value |
RACEHPR2 | race |
SRSEX | sex |
SRAGE_P | age |
MARIT2 | Marital status |
AB1 | General Health Condition |
ASTCUR | Current Asthma Status |
AB51 | Type I or Type II Diabetes |
POVLL | Poverty level |
Observation
: The Contingency Table we applied Chi-Squared Test
Expected
: The expected values under the null hypothesis of equal proportions.
Residuals
: The difference between the observed data and the expected values
Here’s a table comparing gender and political affiliation, and let’s apply Chi-Squared Test on it.
# Make a table manually
party <- matrix(c(762, 327, 468, 484, 239, 477), ncol = 3, byrow = T)
colnames(party) <- c('Democrat', 'Independent', 'Republican')
rownames(party) <- c('F', 'M')
party <- as.table(party)
# Obsrvation
party
## Democrat Independent Republican
## F 762 327 468
## M 484 239 477
# Expected
chisq.test(party)$expected
## Democrat Independent Republican
## F 703.6714 319.6453 533.6834
## M 542.3286 246.3547 411.3166
# Residuals
chisq.test(party)$residuals
## Democrat Independent Republican
## F 2.1988558 0.4113702 -2.8432397
## M -2.5046695 -0.4685829 3.2386734
When we perform a Chi-squared test, the results are provided to us with the expected & residuals.
And Mosaic Plot is just a method to visualize Chi-Squared Test.
library(vcd)
mosaic(party, shade = T, color = T)
A mosaic plot shows us not only the proportional size of each group in a contingency table, but we can also use it to report on the underlying statistics.