Title: | Software for Evaluating Counterfactuals |
---|---|
Description: | Inferences about counterfactuals are essential for prediction, answering what if questions, and estimating causal effects. However, when the counterfactuals posed are too far from the data at hand, conclusions drawn from well-specified statistical analyses become based largely on speculation hidden in convenient modeling assumptions that few would be willing to defend. Unfortunately, standard statistical approaches assume the veracity of the model rather than revealing the degree of model-dependence, which makes this problem hard to detect. WhatIf offers easy-to-apply methods to evaluate counterfactuals that do not require sensitivity testing over specified classes of models. If an analysis fails the tests offered here, then we know that substantive inferences will be sensitive to at least some modeling choices that are not based on empirical evidence, no matter what method of inference one chooses to use. WhatIf implements the methods for evaluating counterfactuals discussed in Gary King and Langche Zeng, 2006, "The Dangers of Extreme Counterfactuals," Political Analysis 14 (2) <DOI:10.1093/pan/mpj004>; and Gary King and Langche Zeng, 2007, "When Can History Be Our Guide? The Pitfalls of Counterfactual Inference," International Studies Quarterly 51 (March) <DOI:10.1111/j.1468-2478.2007.00445.x>. |
Authors: | Heather Stoll <[email protected]>, Gary King <[email protected]>, Langche Zeng <[email protected]>, Christopher Gandrud <[email protected]>, Ben Sabath |
Maintainer: | Soubhik Barari <[email protected]> |
License: | GPL (>=3) |
Version: | 1.5-10 |
Built: | 2024-11-07 02:55:44 UTC |
Source: | https://github.com/iqss/whatif |
This data set is one of two that together allow the
replication of the analysis in section 2.4 of King and Zeng 2006b.
It contains data on 122 counterfactuals derived by King and Zeng 2006b
from the factual Doyle and Sambanis 2000 data set, peacef
. It should
be used in conjunction with the latter.
data(peacecf)
data(peacecf)
A data frame with dimensions 122-by-11. Columns are covariates and
rows are data points (or units). The covariates are as in peacef
with the exception
of the key causal variable, untype4
, which is transformed to
1 - untype4
.
King and Zeng 2006b
King, Gary and Langche Zeng. 2006. "When Can History Be Our Guide? The Pitfalls of Counterfactual Inference." International Studies Quarterly 51 (March).
Doyle, Michael W. and Nicholas Sambanis. 2000. "International Peacebuilding: A Theoretical and Quantitative Analysis." American Political Science Review 94, no.4: 779–801.
This data set is one of two that together allow the
replication of the analysis in section 2.4 of King and Zeng 2006b.
It contains factual data from Doyle and Sambanis
2000 on 124 post-WWII civil wars. It should be used in conjunction
with the data set of counterfactuals derived from it, peacecf
.
data(peacef)
data(peacef)
A data frame with dimensions 124-by-11. Columns are covariates and
rows are data points (or units). The covariates are decade
,
wartype
, logcost
, wardur
, factnum
, factnumsq
,
trnsfcap
, untype4
, treaty
, develop
, and exp
,
in that order.
King and Zeng 2006b
King, Gary and Langche Zeng. 2006. "When Can History Be Our Guide? The Pitfalls of Counterfactual Inference." International Studies Quarterly 51 (March).
Doyle, Michael W. and Nicholas Sambanis. 2000. "International Peacebuilding: A Theoretical and Quantitative Analysis." American Political Science Review 94, no.4: 779–801.
Generates a cumulative frequency plot of distances from an object of class "whatif". The cumulative frequencies (the fraction of rows in the observed data set with either Gower or (squared) Euclidian distances to the counterfactuals less than the given value on the horizontal axis) appear on the vertical axis.
## S3 method for class 'whatif' plot(x, type = "f", numcf = NULL, eps = FALSE, ...)
## S3 method for class 'whatif' plot(x, type = "f", numcf = NULL, eps = FALSE, ...)
x |
An object of class "whatif", the output of the
function |
type |
A character string; the type of plot of the cumulative frequencies
of the distances to be produced. Possible types are: |
numcf |
A numeric vector; the specific counterfactuals to be plotted.
Each element represents a counterfactual, specifically its row number
from the matrix or data frame of counterfactuals. By default, all
counterfactuals are plotted. Default is |
eps |
A Boolean; should an encapsulated postscript file be
generated? Setting the argument equal to |
... |
Further arguments passed to and from other methods. |
LOWESS scatterplot smoothing using the function lowess
is plotted
in blue. Counterfactuals in the convex hull are plotted with a solid line
and counterfactuals outside of the convex hull with a dashed line.
A graph printed to the screen or an encapsulated postscript file saved
to your working directory. In the latter case, the file name has form
'graph_'type'_'numcf'.eps
', where 'type'
and 'numcf'
are the values of the respective arguments.
Stoll, Heather [email protected], King, Gary [email protected] and Zeng, Langche [email protected]
King, Gary and Langche Zeng. 2006. "The Dangers of Extreme Counterfactuals." Political Analysis 14 (2). Available from https://gking.Harvard.Edu.
King, Gary and Langche Zeng. 2007. "When Can History Be Our Guide? The Pitfalls of Counterfactual Inference." International Studies Quarterly 51 (March). Available from https://gking.harvard.edu.
whatif
,
summary.whatif
,
print.whatif
,
print.summary.whatif
## Create example data sets and counterfactuals my.cfact <- matrix(rnorm(3*5), ncol = 5) my.data <- matrix(rnorm(100*5), ncol = 5) ## Evaluate counterfactuals my.result <- whatif(data = my.data, cfact = my.cfact, mc.cores = 1) ## Plot cumulative frequencies for the first two counterfactuals (rows ## 1 and 2) in my.cfact plot(my.result, type = "b", numcf = c(1, 2), mc.cores = 1)
## Create example data sets and counterfactuals my.cfact <- matrix(rnorm(3*5), ncol = 5) my.data <- matrix(rnorm(100*5), ncol = 5) ## Evaluate counterfactuals my.result <- whatif(data = my.data, cfact = my.cfact, mc.cores = 1) ## Plot cumulative frequencies for the first two counterfactuals (rows ## 1 and 2) in my.cfact plot(my.result, type = "b", numcf = c(1, 2), mc.cores = 1)
Prints the information generated from the whatif
output object
by a call to summary
, which is stored in an object of class
"summary.whatif".
## S3 method for class 'summary.whatif' print(x, ...)
## S3 method for class 'summary.whatif' print(x, ...)
x |
An object of class "summary.whatif", the output of
the function |
... |
Further arguments passed to and from other methods. |
A printout to the screen of the whatif
information summarized
in the summary.whatif
output object.
Stoll, Heather [email protected], King, Gary [email protected] and Zeng, Langche [email protected]
King, Gary and Langche Zeng. 2006. "The Dangers of Extreme Counterfactuals." Political Analysis 14 (2). Available from https://gking.harvard.edu.
King, Gary and Langche Zeng. 2007. "When Can History Be Our Guide? The Pitfalls of Counterfactual Inference." International Studies Quarterly 51 (March). Available from https://gking.harvard.edu.
whatif
,
plot.whatif
,
summary.whatif
,
print.whatif
## Create example data sets and counterfactuals my.cfact <- matrix(rnorm(3*5), ncol = 5) my.data <- matrix(rnorm(100*5), ncol = 5) ## Evaluate counterfactuals my.result <- whatif(data = my.data, cfact = my.cfact, mc.cores = 1) ## Print summary output object my.result.sum <- summary(my.result) print(my.result.sum)
## Create example data sets and counterfactuals my.cfact <- matrix(rnorm(3*5), ncol = 5) my.data <- matrix(rnorm(100*5), ncol = 5) ## Evaluate counterfactuals my.result <- whatif(data = my.data, cfact = my.cfact, mc.cores = 1) ## Print summary output object my.result.sum <- summary(my.result) print(my.result.sum)
Prints the information produced by the function whatif
,
an object of class "whatif", to the screen.
## S3 method for class 'whatif' print(x, print.dist = FALSE, print.freq = FALSE, ...)
## S3 method for class 'whatif' print(x, print.dist = FALSE, print.freq = FALSE, ...)
x |
An object of class "whatif", the output of
the function |
print.dist |
A Boolean; should the matrix of pairwise
distances between each counterfactual and data point be printed to
the screen, if it was returned? Default is |
print.freq |
A Boolean; should the matrix of cumulative
frequencies of distances for each counterfactual be printed
to the screen? Default is |
... |
Further arguments passed to and from other methods. |
A printout to the screen of the information contained in the
whatif
output object.
Stoll, Heather [email protected], King, Gary [email protected] and Zeng, Langche [email protected]
King, Gary and Langche Zeng. 2006. "The Dangers of Extreme Counterfactuals." Political Analysis 14 (2). Available from https://gking.harvard.edu.
King, Gary and Langche Zeng. 2007. "When Can History Be Our Guide? The Pitfalls of Counterfactual Inference." International Studies Quarterly 51 (March). Available from https://gking.harvard.edu.
whatif
,
plot.whatif
,
summary.whatif
,
print.summary.whatif
## Create example data sets and counterfactuals my.cfact <- matrix(rnorm(3*5), ncol = 5) my.data <- matrix(rnorm(100*5), ncol = 5) ## Evaluate counterfactuals my.result <- whatif(data = my.data, cfact = my.cfact, mc.cores = 1) ## Print output object print(my.result)
## Create example data sets and counterfactuals my.cfact <- matrix(rnorm(3*5), ncol = 5) my.data <- matrix(rnorm(100*5), ncol = 5) ## Evaluate counterfactuals my.result <- whatif(data = my.data, cfact = my.cfact, mc.cores = 1) ## Print output object print(my.result)
Summarizes the information produced by the function whatif
.
The summary generated is returned as an output object and also printed
to the screen.
## S3 method for class 'whatif' summary(object, ...)
## S3 method for class 'whatif' summary(object, ...)
object |
An object of class "whatif", the output of
the function |
... |
Further arguments passed to and from other methods. |
An object of class "summary.whatif", a list containing the following five elements:
call |
The original call to |
m |
A scalar. The total number of counterfactuals evaluated. |
m.inhull |
A scalar. The number of counterfactuals evaluated that are in the convex hull of the observed covariate data. |
mean.near |
A scalar. The average percentage of data nearby each counterfactual, where the average is taken over all counterfactuals. |
sum.df |
A data frame with three columns and |
This object is printed to the screen.
Stoll, Heather [email protected], King, Gary [email protected] and Zeng, Langche [email protected]
King, Gary and Langche Zeng. 2006. "The Dangers of Extreme Counterfactuals." Political Analysis 14 (2). Available from https://gking.harvard.edu.
King, Gary and Langche Zeng. 2007. "When Can History Be Our Guide? The Pitfalls of Counterfactual Inference." International Studies Quarterly 51 (March). Available from https://gking.harvard.edu.
whatif
,
plot.whatif
,
print.whatif
,
print.summary.whatif
## Create example data sets and counterfactuals my.cfact <- matrix(rnorm(3*5), ncol = 5) my.data <- matrix(rnorm(100*5), ncol = 5) ## Evaluate counterfactuals my.result <- whatif(data = my.data, cfact = my.cfact, mc.cores = 1) ## Print summary summary(my.result)
## Create example data sets and counterfactuals my.cfact <- matrix(rnorm(3*5), ncol = 5) my.data <- matrix(rnorm(100*5), ncol = 5) ## Evaluate counterfactuals my.result <- whatif(data = my.data, cfact = my.cfact, mc.cores = 1) ## Print summary summary(my.result)
Implements the methods described in King and Zeng (2006a, 2006b) for evaluating counterfactuals.
whatif(formula = NULL, data, cfact, range = NULL, freq = NULL, nearby = 1, distance = "gower", miss = "list", choice = "both", return.inputs = FALSE, return.distance = FALSE, mc.cores = detectCores(), ...)
whatif(formula = NULL, data, cfact, range = NULL, freq = NULL, nearby = 1, distance = "gower", miss = "list", choice = "both", return.inputs = FALSE, return.distance = FALSE, mc.cores = detectCores(), ...)
formula |
An optional formula without a dependent variable that
is of class "formula" and that follows standard |
data |
May take one of the following forms:
Missing data is allowed and will be dealt with
via the argument |
cfact |
A |
range |
An optional numeric vector of length |
freq |
An optional numeric vector of any positive length, the elements
of which comprise a set of distances. Used in calculating
cumulative frequency distributions for the distances of the data
points from each counterfactual. For each such distance and
counterfactual, the cumulative frequency is the fraction of observed
covariate data points with distance to the counterfactual less
than or equal to the supplied distance value. The default varies
with the distance measure used. When the Gower distance measure is employed,
frequencies are calculated for the sequence of Gower distances from
0 to 1 in increments of 0.05. When the Euclidian distance measure
is employed, frequencies are calculated for the sequence of Euclidian
distances from the minimum to the maximum observed distances in twenty
equal increments, all rounded to two decimal places. Default is |
nearby |
An optional scalar indicating
which observed data points are considered to be nearby (i.e., withing ‘nearby’
geometric variances of) the counterfactuals. Used to calculate the summary statistic
returned by the function: the fraction of the observed data nearby
each counterfactual. By default, the geometric variance of the
covariate data is used. For example, setting |
distance |
An optional string indicating which of two distance measures
to employ. The choices are either |
miss |
An optional string indicating the strategy for dealing
with missing data in the observed covariate data set.
|
choice |
An optional string indicating which analyses to
undertake. The options are either |
return.inputs |
A Boolean; should the processed observed
covariate and counterfactual data matrices on which all
|
return.distance |
A Boolean; should the matrix of distances
between each counterfactual and data point be returned? If
|
mc.cores |
The number of cores to use for the convex hull test, i.e. at
most how many child processes will be run simultaneously. Must be at least
one, and parallelization requires at least two cores. The default is set by
|
.
... |
Further arguments passed to and from other methods. |
This function is the primary tool for evaluating your counterfactuals. Specifically, it:
Determines whether or not your counterfactuals are in the convex hull of the observed covariate data.
Computes the distance of your counterfactuals from each of the
observed covariate data points. The default distance function used is Gower's
non-parametric measure.
Computes a summary statistic for each counterfactual based on the distances in (2): the fraction of observed covariate data points with distances to your counterfactual less than a value you supply. By default, this value is taken to be the geometric variability of the observed data.
Computes the cumulative frequency distribution of each counterfactual for the distances in (2) using values that you supply. By default, Gower distances from 0 to 1 in increments of 0.05 are used.
An object of class "whatif", a list consisting of the following six or seven elements:
call |
The original call to |
inputs |
A list with two elements, |
in.hull |
A logical vector of length |
dist |
A |
geom.var |
A scalar. The geometric variability of the observed covariate data. |
sum.stat |
A numeric vector of length |
cum.freq |
A numeric matrix. By default, the matrix has
dimension |
This function requires the lpSolve package.
Stoll, Heather [email protected], King, Gary [email protected] and Zeng, Langche [email protected]
King, Gary and Langche Zeng. 2006. "The Dangers of Extreme Counterfactuals." Political Analysis 14 (2). Available from https://gking.harvard.edu.
King, Gary and Langche Zeng. 2007. "When Can History Be Our Guide? The Pitfalls of Counterfactual Inference." International Studies Quarterly 51 (March). Available from https://gking.harvard.edu.
plot.whatif
,
summary.whatif
,
print.whatif
,
print.summary.whatif
## Create example data sets and counterfactuals my.cfact <- matrix(rnorm(3*5), ncol = 5) my.data <- matrix(rnorm(100*5), ncol = 5) ## Evaluate counterfactuals my.result <- whatif(data = my.data, cfact = my.cfact, mc.cores = 1) ## Evaluate counterfactuals and supply own gower distances for ## cumulative frequency distributions my.result <- whatif(cfact = my.cfact, data = my.data, freq = c(0, .25, .5, 1, 1.25, 1.5), mc.cores = 1)
## Create example data sets and counterfactuals my.cfact <- matrix(rnorm(3*5), ncol = 5) my.data <- matrix(rnorm(100*5), ncol = 5) ## Evaluate counterfactuals my.result <- whatif(data = my.data, cfact = my.cfact, mc.cores = 1) ## Evaluate counterfactuals and supply own gower distances for ## cumulative frequency distributions my.result <- whatif(cfact = my.cfact, data = my.data, freq = c(0, .25, .5, 1, 1.25, 1.5), mc.cores = 1)