| Title: | Assessing Proximal, Distal, and Mediated Causal Excursion Effects for Micro-Randomized Trials |
|---|---|
| Description: | Provides methods to analyze micro-randomized trials (MRTs) with binary treatment options. Supports four types of analyses: (1) proximal causal excursion effects, including weighted and centered least squares (WCLS) for continuous proximal outcomes by Boruvka et al. (2018) <doi:10.1080/01621459.2017.1305274> and the estimator for marginal excursion effect (EMEE) for binary proximal outcomes by Qian et al. (2021) <doi:10.1093/biomet/asaa070>; (2) distal causal excursion effects (DCEE) for continuous distal outcomes using a two-stage estimator by Qian (2025) <doi:10.1093/biomtc/ujaf134>; (3) mediated causal excursion effects (MCEE) for continuous distal outcomes, estimating natural direct and indirect excursion effects in the presence of time-varying mediators by Qian (2025) <doi:10.48550/arXiv.2506.20027>; and (4) standardized proximal effect size estimation for continuous proximal outcomes, generalizing the approach in Luers et al. (2019) <doi:10.1007/s11121-017-0862-5> to allow adjustment for baseline and time-varying covariates for improved efficiency. |
| Authors: | Tianchen Qian [aut, cre] (ORCID: <https://orcid.org/0000-0003-4282-7826>), Shaolin Xiang [aut], Zhaoxi Cheng [aut], Xinyi Song [aut], John Dziak [aut], Audrey Boruvka [ctb] |
| Maintainer: | Tianchen Qian <[email protected]> |
| License: | GPL-3 |
| Version: | 0.4.1 |
| Built: | 2026-05-25 10:26:28 UTC |
| Source: | https://github.com/cran/MRTAnalysis |
Assert that input is a data frame
.mcee_assert_df(data).mcee_assert_df(data)
data |
Object to check |
Invisibly TRUE if valid, otherwise stops with error
Build basis matrix f(t) from time-varying effect formula
.mcee_build_f_matrix(time_varying_effect_form, data).mcee_build_f_matrix(time_varying_effect_form, data)
time_varying_effect_form |
RHS-only formula |
data |
Data frame to evaluate formula on |
Model matrix with basis functions evaluated at each row
Build per-row weights omega(i,t) for MCEE estimation
.mcee_build_weights( data, id, dp, weight_per_row = NULL, specific_dp_only = NULL, verbose = TRUE ).mcee_build_weights( data, id, dp, weight_per_row = NULL, specific_dp_only = NULL, verbose = TRUE )
data |
Data frame |
id |
Subject ID column name |
dp |
Decision point column name |
weight_per_row |
Optional user-specified per-row weights |
specific_dp_only |
Optional vector of decision points to target (others get weight 0) |
verbose |
Logical; whether to print informative messages |
Numeric vector of per-row weights
Validate binary column coding
.mcee_check_binary_col(data, col, allow_all1 = TRUE, label = NULL).mcee_check_binary_col(data, col, allow_all1 = TRUE, label = NULL)
data |
Data frame containing the column |
col |
Column name (can be NULL, in which case validation is skipped) |
allow_all1 |
Logical; if FALSE, column cannot be all 1s |
label |
Optional descriptive label for error messages |
Invisibly TRUE if valid, otherwise stops with error
Validate control formula excludes treatment and outcome
.mcee_check_control_formula(control_formula, treatment, outcome, dp, label).mcee_check_control_formula(control_formula, treatment, outcome, dp, label)
control_formula |
RHS-only formula for control variables |
treatment |
Treatment column name |
outcome |
Outcome column name |
dp |
Decision point column name (used in error messages) |
label |
Descriptive label for error messages |
Invisibly TRUE if valid, otherwise stops with error
Check that decision points are strictly increasing within each subject
.mcee_check_dp_strictly_increasing(data, id, dp).mcee_check_dp_strictly_increasing(data, id, dp)
data |
Data frame |
id |
Column name for subject ID |
dp |
Column name for decision point |
Invisibly TRUE if valid, otherwise stops with error
Check config formula for inclusion/exclusion of mediator
.mcee_check_formula_mediator(config, target, mediator).mcee_check_formula_mediator(config, target, mediator)
config |
A nuisance config list (may or may not contain 'formula'). |
target |
Character scalar, one of "p", "q", "eta", "mu", "nu". |
mediator |
Character scalar: mediator variable name. |
Invisibly TRUE. Warnings are produced if formulas look suspicious.
Check that rows for each subject appear in contiguous blocks
.mcee_check_id_rows_grouped(data, id, max_show = 5).mcee_check_id_rows_grouped(data, id, max_show = 5)
data |
Data frame |
id |
Column name for subject ID |
max_show |
Maximum number of offending IDs to show in error message |
Invisibly TRUE if valid, otherwise stops with error
Check data frame columns for missing/infinite values
.mcee_check_no_missing_vars(data, vars, where = NULL, max_show = 5).mcee_check_no_missing_vars(data, vars, where = NULL, max_show = 5)
data |
Data frame to check |
vars |
Character vector of column names to check |
where |
Optional context description for error messages |
max_show |
Maximum number of row indices to show per variable |
Invisibly TRUE if no missing data found, otherwise stops with error
Check numeric vector for missing/infinite values
.mcee_check_no_missing_vec(vec, name, max_show = 5).mcee_check_no_missing_vec(vec, name, max_show = 5)
vec |
Numeric vector to check |
name |
Variable name for error messages |
max_show |
Maximum number of row indices to show |
Invisibly TRUE if no missing data found, otherwise stops with error
Check that outcome is constant within each subject (required for distal outcomes)
.mcee_check_outcome_constant_within_id(data, id, outcome).mcee_check_outcome_constant_within_id(data, id, outcome)
data |
Data frame |
id |
Column name for subject ID |
outcome |
Column name for outcome |
Invisibly TRUE if valid, otherwise stops with error
Validate time-varying effect formula structure
.mcee_check_time_varying_effect_form(time_varying_effect_form, dp).mcee_check_time_varying_effect_form(time_varying_effect_form, dp)
time_varying_effect_form |
RHS-only formula for basis functions |
dp |
Decision point column name |
Invisibly TRUE if valid, otherwise stops with error
Generate compact one-line description of nuisance model object
.mcee_compact_model_info(obj).mcee_compact_model_info(obj)
obj |
Fitted model object or known value descriptor |
Character string describing the object
Implements the core MCEE estimating equations and sandwich variance estimation.
This function contains the mathematical heart of the MCEE method, solving
the weighted estimating equations for (NDEE) and (NIEE).
.mcee_core_rows( n, f_nrows, omega_nrows, i_index, phi11_vec, phi10_vec, phi00_vec ).mcee_core_rows( n, f_nrows, omega_nrows, i_index, phi11_vec, phi10_vec, phi00_vec )
n |
Integer. Number of unique subjects. |
f_nrows |
Matrix |
omega_nrows |
Numeric vector of length |
i_index |
Integer vector of length |
phi11_vec, phi10_vec, phi00_vec
|
Numeric vectors of length |
**MCEE Estimating Equations:**
**NDEE**:
**NIEE**:
where .
**Sandwich Variance Formula:**
, where:
**Bread** = ( matrix)
**Meat** = , with subject-level score vectors:
**Mathematical Details:** The implementation follows the theoretical framework detailed in the MCEE vignette appendix. The estimating equations are based on efficient influence functions for the causal parameters of interest in the mediation analysis setting.
List containing:
alpha_hatVector of length p: NDEE parameter estimates
alpha_seVector of length p: NDEE standard errors
beta_hatVector of length p: NIEE parameter estimates
beta_seVector of length p: NIEE standard errors
varcovMatrix : Joint variance-covariance for
alpha_varcovMatrix : Variance-covariance for only
beta_varcovMatrix : Variance-covariance for only
Select default GLM family based on nuisance parameter type
.mcee_default_family(target, method).mcee_default_family(target, method)
target |
Nuisance parameter name ("p", "q", "eta", "mu", "nu") |
method |
Learning method name |
GLM family object or NULL for non-GLM methods
Remove a variable from RHS-only formula
.mcee_drop_var_from_rhs(rhs_only_formula, var).mcee_drop_var_from_rhs(rhs_only_formula, var)
rhs_only_formula |
RHS-only formula |
var |
Variable name to remove |
Modified formula with variable removed
Internal workhorse function that fits individual nuisance parameters using various machine learning methods or known constants. Handles the complexity of different learner APIs and provides consistent predictions.
.mcee_fit_nuisance( config, data_for_fitting, data_for_predicting, lhs_var, param_name, data_for_fitting_name ).mcee_fit_nuisance( config, data_for_fitting, data_for_predicting, lhs_var, param_name, data_for_fitting_name )
config |
Configuration list describing how to fit the nuisance parameter.
Created by
|
data_for_fitting |
Data frame subset used to train the model (e.g., available rows only). |
data_for_predicting |
Data frame on which to generate predictions (usually full data). |
lhs_var |
Character. Column name of the response/outcome variable to model. |
param_name |
Character. Descriptive name for error messages (e.g., "p_t(1|H_t)"). |
data_for_fitting_name |
Character. Description of fitting data for model call display. |
**Supported Methods:**
"glm": Uses stats::glm() with automatic family detection
"lm": Uses stats::lm() (continuous outcomes only)
"gam": Uses mgcv::gam() supporting smooth terms
"rf": Uses randomForest::randomForest()
"ranger": Uses ranger::ranger() (faster random forest)
"sl": Uses SuperLearner::SuperLearner()
**Automatic Family Detection:**
When family=NULL in GLM/GAM configs:
- Binary outcomes (0/1 only): binomial()
- Continuous outcomes: gaussian()
**Known Values:**
If any of known, known_a1, known_a0 is provided, no model
is fitted. Returns constant predictions and a descriptor object.
List with components:
predNumeric vector of length nrow(data_for_predicting)
containing predictions/fitted values.
modelFitted model object (e.g., glm, gam, randomForest)
or a list descriptor for known values.
Print informative message if no availability column provided
.mcee_message_if_no_availability_provided(availability, verbose).mcee_message_if_no_availability_provided(availability, verbose)
availability |
Availability column name (or NULL) |
verbose |
Logical; whether to print messages |
Invisibly TRUE
Print formatted coefficient table for MCEE results
.mcee_print_coef_table(tab).mcee_print_coef_table(tab)
tab |
Data frame with coefficient estimates, standard errors, etc. |
Used for side effects (printing)
Check that required columns exist in data frame
.mcee_require_cols(data, cols, where = "data").mcee_require_cols(data, cols, where = "data")
data |
Data frame to check |
cols |
Character vector of required column names |
where |
Context description for error messages |
Invisibly TRUE if all columns exist, otherwise stops with error
Resolve randomization probability from column name or scalar
.mcee_resolve_rand_prob(data, rand_prob, availability = NULL).mcee_resolve_rand_prob(data, rand_prob, availability = NULL)
data |
Data frame |
rand_prob |
Either column name or numeric scalar |
availability |
Optional availability column name for validation |
Numeric vector of randomization probabilities
Validate clipping bounds for probability predictions
.mcee_validate_clipping(clipping).mcee_validate_clipping(clipping)
clipping |
Numeric vector of length 2 with lower and upper bounds |
Invisibly TRUE if valid, otherwise stops with error
Validate that learning method is supported
.mcee_validate_method(method).mcee_validate_method(method)
method |
Method name to validate |
Invisibly TRUE if valid, otherwise stops with error
Extract variables from nuisance configuration formula
.mcee_vars_in_config(cfg).mcee_vars_in_config(cfg)
cfg |
Configuration list (may contain formula element) |
Character vector of variable names from config formula
Extract variable names from RHS-only formula
.mcee_vars_in_rhs(rhs_only_formula).mcee_vars_in_rhs(rhs_only_formula)
rhs_only_formula |
RHS-only formula object |
Character vector of variable names (empty if not a valid formula)
Estimates the time-varying standardized proximal causal excursion effect for **continuous** proximal outcomes in a micro-randomized trial. The estimator uses inverse-probability weighting and can adjust for baseline and time-varying covariates to improve efficiency. Optionally, the effect and scale estimates are smoothed over decision points using LOESS, and participant-level bootstrap confidence intervals can be computed.
calculate_mrt_effect_size( data, id, outcome, treatment, time, rand_prob, availability, covariates = NULL, smooth = TRUE, loess_span = 0.25, loess_degree = 1, do_bootstrap = TRUE, boot_replications = 1000, confidence_alpha = 0.05 )calculate_mrt_effect_size( data, id, outcome, treatment, time, rand_prob, availability, covariates = NULL, smooth = TRUE, loess_span = 0.25, loess_degree = 1, do_bootstrap = TRUE, boot_replications = 1000, confidence_alpha = 0.05 )
data |
A data.frame of MRT data (see 'data_example_for_standardized_effect') |
id |
Column name for participant id |
outcome |
Column name for the continuous proximal outcome |
treatment |
Column name for treatment indicator |
time |
Column name for time / decision point |
rand_prob |
Column name for randomization probability |
availability |
Column name for availability indicator |
covariates |
Optional character vector of covariate column names |
smooth |
Logical; apply LOESS smoothing across time |
loess_span |
Numeric; smoother span |
loess_degree |
Numeric; polynomial degree in LOESS |
do_bootstrap |
Logical; whether to perform bootstrap over participants |
boot_replications |
Integer; number of bootstrap replications |
confidence_alpha |
Numeric; two-sided alpha level for CIs |
A data.frame of class "mrt_effect_size" containing the
standardized effect for a continuous proximal outcome with columns:
Decision point index.
Raw (unsmoothed) estimated excursion effect at each time.
Raw (unsmoothed) estimated outcome scale at each time.
Smoothed excursion effect across time (equals beta_hat
if smooth = FALSE).
Smoothed outcome scale across time (equals s_hat
if smooth = FALSE).
Standardized effect beta_sm / s_sm.
Lower confidence bound for estimate (NA if
do_bootstrap = FALSE).
Upper confidence bound for estimate (NA if
do_bootstrap = FALSE).
Luers, B., Klasnja, P., and Murphy, S. (2019). Standardized effect sizes for preventive mobile health interventions in micro-randomized trials. *Prevention Science*, 20(1), 100–109.
data("data_example_for_standardized_effect") ans_ci <- calculate_mrt_effect_size( data = data_example_for_standardized_effect, id = "id", outcome = "outcome", treatment = "treatment", time = "decision_point", rand_prob = "prob_treatment", availability = "availability", covariates = c("covariate1", "covariate2"), do_bootstrap = TRUE, boot_replications = 100 ) # Note: use at least 1000 bootstrap replications for stable CIs. summary(ans_ci) plot(ans_ci)data("data_example_for_standardized_effect") ans_ci <- calculate_mrt_effect_size( data = data_example_for_standardized_effect, id = "id", outcome = "outcome", treatment = "treatment", time = "decision_point", rand_prob = "prob_treatment", availability = "availability", covariates = c("covariate1", "covariate2"), do_bootstrap = TRUE, boot_replications = 100 ) # Note: use at least 1000 bootstrap replications for stable CIs. summary(ans_ci) plot(ans_ci)
Baseline model:
Treatment effect model:
Randomization probabilities cycle over 0.3, 0.5, 0.7 (with repetition).
Availability is exogenous at 0.8 for all time points.
data_binarydata_binary
A data frame with 3000 observations and 10 variables:
Individual id number.
Decision point index.
Time-varying covariate 1, the \"standardized time in study\", defined as the current decision point index divided by the total number of decision points.
Time-varying covariate 2, indicator of \"the second half of the study\", defined as whether the current decision point index is greater than the total number of decision points divided by 2.
Binary proximal outcome.
Treatment assignment: whether the intervention is randomized to be delivered (=1) or not (=0) at the current decision point.
Randomization probability for the current decision point.
Availability indicator (=1 available, =0 not available) at the current decision point.
Simulated longitudinal dataset suitable for illustrating the 'dcee()' function. Each row corresponds to one decision point for one subject. The distal outcome 'Y' is constant within subject (because it is measured at the end of the study, and here we append it to the long format data as an extra column to conform with the 'dcee()' function requirement.
data_distal_continuousdata_distal_continuous
a data frame with 1500 observations and 11 variables
Subject identifier
Decision point (1..T)
Endogenous continuous time-varying covariate
Endogenous binary time-varying covariate
Availability indicator (0/1)
Treatment (0/1)
Randomization probability P(A=1|H_t)
Lagged treatment
Distal continuous outcome (constant per subject)
A synthetic data set in long format, one row per participant-by-decision point, suitable for illustrating 'calculate_mrt_effect_size()'.
data_example_for_standardized_effectdata_example_for_standardized_effect
A data frame with 5000 observations and 10 variables:
Individual id number.
Decision point index.
Availability indicator (=1 available, =0 not available).
Randomization probability for treatment.
Treatment assignment (=1 treated, =0 not treated).
Time-varying covariate 1.
Time-varying covariate 2.
Time-varying treatment effect used to generate outcome.
Outcome noise scale.
Continuous proximal outcome.
A synthetic data set that mimics the HeartSteps V1 data structure to illustrate the use of [wcls()] function for continuous proximal outcomes
data_mimicHeartStepsdata_mimicHeartSteps
a data frame with 7770 observations and 9 variables
individual id number
decision point index
day in the study
proximal outcome: the step count in the 30 minutes following the current decision point (log-transformed)
proximal outcome at the previous decision point (lag-1 outcome): the step count in the 30 minutes following the previous decision point (log-transformed)
the step count in the 30 minutes prior to the current decision point (log-transformed); used as a control variable
whether the individual is at home or work (=1) or at other locations (=0) at the current decision point
whether the intervention is randomized to be delivered (=1) or not (=0) at the current decision point
the randomization probability P(A=1) for the current decision point
whether the individual is available (=1) or not (=0) at the current decision point
A simulated long-format dataset used in the vignette and tests. Each row corresponds to one subject–decision point. The distal outcome 'Y' is constant within subject (repeated on every row for that subject).
data_time_varying_mediator_distal_outcomedata_time_varying_mediator_distal_outcome
A data frame with n * T_val rows and the following columns:
Subject identifier (integer).
Decision point index, strictly increasing within subject (integer).
Availability indicator at time dp (0/1).
Treatment at time dp (0/1).
Mediator at time dp (numeric; could be binary or continuous).
Time-varying covariate at time dp (numeric).
Lagged treatment at time dp-1 (0/1).
Lagged mediator at time dp-1 (numeric).
Lagged covariate at time dp-1 (numeric).
Lagged availability at time dp-1 (0/1).
Randomization probability for A at time dp (numeric in (0,1)).
Availability probability for I at time dp (numeric in (0,1)).
Conditional mean of M given history (numeric; from DGM).
Conditional mean of X given history (numeric; from DGM).
Conditional mean component for distal outcome Y (numeric; from DGM).
Distal outcome, constant within subject (numeric).
Generated by dgm_time_varying_mediator_distal_outcome() in the package
source. Intended for illustrating mcee usage. No missing values.
Simulated.
mcee, mcee_general, mcee_userfit_nuisance
data(data_time_varying_mediator_distal_outcome) str(data_time_varying_mediator_distal_outcome)data(data_time_varying_mediator_distal_outcome) str(data_time_varying_mediator_distal_outcome)
Fits distal causal excursion effects in micro-randomized trials using a
**two-stage** estimator: (i) learn nuisance outcome regressions
with a specified learner (parametric/ML), optionally with
cross-fitting; (ii) solve estimating equations for the distal excursion
effect parameters ().
This wrapper standardizes inputs and delegates computation to [dcee_helper_2stage_estimation()].
dcee( data, id, outcome, treatment, rand_prob, moderator_formula, control_formula, availability = NULL, control_reg_method = c("gam", "lm", "rf", "ranger", "sl", "sl.user-specified-library", "set_to_zero"), cross_fit = FALSE, cf_fold = 10, weighting_function = NULL, verbose = TRUE, ... )dcee( data, id, outcome, treatment, rand_prob, moderator_formula, control_formula, availability = NULL, control_reg_method = c("gam", "lm", "rf", "ranger", "sl", "sl.user-specified-library", "set_to_zero"), cross_fit = FALSE, cf_fold = 10, weighting_function = NULL, verbose = TRUE, ... )
data |
A data.frame in long format. |
id |
Character scalar: column name for subject identifier. |
outcome |
Character scalar: column name for proximal/distal outcome. |
treatment |
Character scalar: column name for binary treatment {0,1}. |
rand_prob |
Character scalar: column name for randomization probability
giving |
moderator_formula |
RHS-only formula of moderators of the excursion effect (e.g., '~ 1', '~ Z', or '~ Z1 + Z2'). |
control_formula |
RHS-only formula of covariates for learning nuisance outcome regressions. When 'control_reg_method = "gam"', 's(x)' terms are allowed (e.g., '~ x1 + s(x2)'). For SuperLearner methods, variables are extracted from this formula to build the design matrix 'X'. |
availability |
Optional character scalar: column name for availability indicator (0/1). If 'NULL', availability is taken as 1 for all rows. |
control_reg_method |
One of '"gam"', '"lm"', '"rf"', '"ranger"', '"sl"', '"sl.user-specified-library"', '"set_to_zero"'. See Details. |
cross_fit |
Logical; if 'TRUE', perform K-fold cross-fitting by subject id. |
cf_fold |
Integer; number of folds if 'cross_fit = TRUE' (default 10). |
weighting_function |
Either a single numeric constant applied to all
rows, or a character column name in 'data' giving decision-point weights
|
verbose |
Logical; print minimal preprocessing messages (default 'TRUE'). |
... |
Additional arguments passed through to the chosen learner (e.g., 'num.trees', 'mtry' for random forests; 'sl.library' when 'control_reg_method = "sl.user-specified-library"'). |
**Learners.**
- 'gam' uses mgcv and supports 's(.)' terms in 'control_formula'.
- 'lm' uses base stats::lm.
- 'rf' uses randomForest; 'ranger' uses ranger.
- 'sl' / 'sl.user-specified-library' use SuperLearner. For the former,
'sl.library = c("SL.mean", "SL.glm", "SL.earth")' are used. For the latter,
please provide 'sl.library = c("SL.mean", ...)' via '...'.
**Notes.** - Treatment must be coded 0/1; 'rand_prob' must lie strictly in (0,1). - 'control_formula = ~ 1' is only valid with 'control_reg_method = "set_to_zero"'.
An object of class '"dcee_fit"' with components:
The matched call to dcee().
A list returned by the two–stage helper with elements:
beta_hatNamed numeric vector of distal causal excursion
effect estimates . Names are "Intercept" and the
moderator names (if any) from moderator_formula.
beta_seNamed numeric vector of standard errors for
beta_hat (same order/names).
beta_varcovVariance–covariance matrix of beta_hat
(square matrix; row/column names match names(beta_hat)).
conf_intMatrix of large-sample (normal) Wald
95% confidence intervals for beta_hat;
columns are "2.5 %" and "97.5 %".
conf_int_tquantileMatrix of small-sample
(t-quantile) 95% confidence intervals for beta_hat;
columns are "2.5 %" and "97.5 %"; degrees of freedom
are provided in $df of the "dcee_fit" object.
regfit_a0Stage-1 nuisance regression fit for
(outcome model among A=0), or NULL
when control_reg_method = "set_to_zero". Note: when
cross_fit = TRUE, this is the learner object from the
last fold and is provided for inspection only (do not use for
out-of-fold prediction).
regfit_a1Stage-1 nuisance regression fit for
(outcome model among A=1); same caveats as
regfit_a0 regarding cross_fit.
Small-sample degrees of freedom used for t-based intervals:
number of unique subjects minus length(fit$beta_hat).
Qian, T. (2025). Distal causal excursion effects: modeling long-term effects of time-varying treatments in micro-randomized trials. *Biometrics*, 81(4), ujaf134.
data(data_distal_continuous, package = "MRTAnalysis") ## Fast example: marginal effect with linear nuisance (CRAN-friendly) fit_lm <- dcee( data = data_distal_continuous, id = "userid", outcome = "Y", treatment = "A", rand_prob = "prob_A", moderator_formula = ~1, # marginal (no moderators) control_formula = ~X, # simple linear nuisance availability = "avail", control_reg_method = "lm", cross_fit = FALSE ) summary(fit_lm) summary(fit_lm, show_control_fit = TRUE) # show Stage-1 fit info ## Moderated effect with GAM nuisance (allows smooth terms); may be slower fit_gam <- dcee( data = data_distal_continuous, id = "userid", outcome = "Y", treatment = "A", rand_prob = "prob_A", moderator_formula = ~Z, # test moderation by Z control_formula = ~ s(X) + Z, # smooth in nuisance via mgcv::gam availability = "avail", control_reg_method = "gam", cross_fit = TRUE, cf_fold = 5 ) summary(fit_gam, lincomb = c(0, 1)) # linear combo: the Z coefficient summary(fit_gam, show_control_fit = TRUE) # show Stage-1 fit info ## Optional: SuperLearner (runs only if installed) if (requireNamespace("SuperLearner", quietly = TRUE)) { library(SuperLearner) fit_sl <- dcee( data = data_distal_continuous, id = "userid", outcome = "Y", treatment = "A", rand_prob = "prob_A", moderator_formula = ~1, control_formula = ~ X + Z, availability = "avail", control_reg_method = "sl", cross_fit = FALSE ) summary(fit_sl) }data(data_distal_continuous, package = "MRTAnalysis") ## Fast example: marginal effect with linear nuisance (CRAN-friendly) fit_lm <- dcee( data = data_distal_continuous, id = "userid", outcome = "Y", treatment = "A", rand_prob = "prob_A", moderator_formula = ~1, # marginal (no moderators) control_formula = ~X, # simple linear nuisance availability = "avail", control_reg_method = "lm", cross_fit = FALSE ) summary(fit_lm) summary(fit_lm, show_control_fit = TRUE) # show Stage-1 fit info ## Moderated effect with GAM nuisance (allows smooth terms); may be slower fit_gam <- dcee( data = data_distal_continuous, id = "userid", outcome = "Y", treatment = "A", rand_prob = "prob_A", moderator_formula = ~Z, # test moderation by Z control_formula = ~ s(X) + Z, # smooth in nuisance via mgcv::gam availability = "avail", control_reg_method = "gam", cross_fit = TRUE, cf_fold = 5 ) summary(fit_gam, lincomb = c(0, 1)) # linear combo: the Z coefficient summary(fit_gam, show_control_fit = TRUE) # show Stage-1 fit info ## Optional: SuperLearner (runs only if installed) if (requireNamespace("SuperLearner", quietly = TRUE)) { library(SuperLearner) fit_sl <- dcee( data = data_distal_continuous, id = "userid", outcome = "Y", treatment = "A", rand_prob = "prob_A", moderator_formula = ~1, control_formula = ~ X + Z, availability = "avail", control_reg_method = "sl", cross_fit = FALSE ) summary(fit_sl) }
Returns the estimated causal excursion effect (on log relative risk scale) and the estimated standard error. Small sample correction using the "Hat" matrix in the variance estimate is implemented.
emee( data, id, outcome, treatment, rand_prob, moderator_formula, control_formula, availability = NULL, numerator_prob = NULL, start = NULL, verbose = TRUE )emee( data, id, outcome, treatment, rand_prob, moderator_formula, control_formula, availability = NULL, numerator_prob = NULL, start = NULL, verbose = TRUE )
data |
A data set in long format. |
id |
The subject id variable. |
outcome |
The outcome variable. |
treatment |
The binary treatment assignment variable. |
rand_prob |
The randomization probability variable. |
moderator_formula |
A formula for the moderator variables. This should
start with ~ followed by the moderator variables. When set to |
control_formula |
A formula for the control variables. This should
start with ~ followed by the control variables. When set to |
availability |
The availability variable. Use the default value ( |
numerator_prob |
Either a number between 0 and 1, or a variable name for
a column in data. If you are not sure what this is, use the default value ( |
start |
A vector of the initial value of the estimators used in the numerical
solver. If using default value ( |
verbose |
If default ('TRUE'), additional messages will be printed during data preprocessing. |
An object of type "emee_fit"
## estimating the fully marginal excursion effect by setting ## moderator_formula = ~ 1 emee( data = data_binary, id = "userid", outcome = "Y", treatment = "A", rand_prob = "rand_prob", moderator_formula = ~1, control_formula = ~ time_var1 + time_var2, availability = "avail" ) ## estimating the causal excursion effect moderated by time_var1 ## by setting moderator_formula = ~ time_var1 emee( data = data_binary, id = "userid", outcome = "Y", treatment = "A", rand_prob = "rand_prob", moderator_formula = ~time_var1, control_formula = ~ time_var1 + time_var2, availability = "avail" )## estimating the fully marginal excursion effect by setting ## moderator_formula = ~ 1 emee( data = data_binary, id = "userid", outcome = "Y", treatment = "A", rand_prob = "rand_prob", moderator_formula = ~1, control_formula = ~ time_var1 + time_var2, availability = "avail" ) ## estimating the causal excursion effect moderated by time_var1 ## by setting moderator_formula = ~ time_var1 emee( data = data_binary, id = "userid", outcome = "Y", treatment = "A", rand_prob = "rand_prob", moderator_formula = ~time_var1, control_formula = ~ time_var1 + time_var2, availability = "avail" )
Returns the estimated causal excursion effect (on log relative risk scale) and the estimated standard error.
Small sample correction using the "Hat" matrix in the variance estimate is implemented.
This is a slightly altered version of emee(), where the treatment
assignment indicator is also centered in the residual term. It would have
similar (but not exactly the same) numerical output as emee(). This
is the estimator based on which the sample size calculator for binary outcome
MRT is developed. (See R package MRTSampleSizeBinary.)
emee2( data, id, outcome, treatment, rand_prob, moderator_formula, control_formula, availability = NULL, numerator_prob = NULL, start = NULL, verbose = TRUE )emee2( data, id, outcome, treatment, rand_prob, moderator_formula, control_formula, availability = NULL, numerator_prob = NULL, start = NULL, verbose = TRUE )
data |
A data set in long format. |
id |
The subject id variable. |
outcome |
The outcome variable. |
treatment |
The binary treatment assignment variable. |
rand_prob |
The randomization probability variable. |
moderator_formula |
A formula for the moderator variables. This should
start with ~ followed by the moderator variables. When set to |
control_formula |
A formula for the control variables. This should
start with ~ followed by the control variables. When set to |
availability |
The availability variable. Use the default value ( |
numerator_prob |
Either a number between 0 and 1, or a variable name for
a column in data. If you are not sure what this is, use the default value ( |
start |
A vector of the initial value of the estimators used in the numerical
solver. If using default value ( |
verbose |
If default ('TRUE'), additional messages will be printed during data preprocessing. |
An object of type "emee_fit"
## estimating the fully marginal excursion effect by setting ## moderator_formula = ~ 1 emee2( data = data_binary, id = "userid", outcome = "Y", treatment = "A", rand_prob = "rand_prob", moderator_formula = ~1, control_formula = ~ time_var1 + time_var2, availability = "avail" ) ## estimating the causal excursion effect moderated by time_var1 ## by setting moderator_formula = ~ time_var1 emee2( data = data_binary, id = "userid", outcome = "Y", treatment = "A", rand_prob = "rand_prob", moderator_formula = ~time_var1, control_formula = ~ time_var1 + time_var2, availability = "avail" )## estimating the fully marginal excursion effect by setting ## moderator_formula = ~ 1 emee2( data = data_binary, id = "userid", outcome = "Y", treatment = "A", rand_prob = "rand_prob", moderator_formula = ~1, control_formula = ~ time_var1 + time_var2, availability = "avail" ) ## estimating the causal excursion effect moderated by time_var1 ## by setting moderator_formula = ~ time_var1 emee2( data = data_binary, id = "userid", outcome = "Y", treatment = "A", rand_prob = "rand_prob", moderator_formula = ~time_var1, control_formula = ~ time_var1 + time_var2, availability = "avail" )
Estimates the Natural Direct Excursion Effect (NDEE; ) and
Natural Indirect Excursion Effect (NIEE; ) for distal outcomes
in micro-randomized trials (MRTs). Assumes the randomization probability
is known (via rand_prob) and fits all nuisance functions using the
same learner specified by control_reg_method.
mcee( data, id, dp, outcome, treatment, mediator, availability = NULL, rand_prob, time_varying_effect_form, control_formula_with_mediator, control_reg_method = c("glm", "gam", "rf", "ranger", "sl"), weight_per_row = NULL, specific_dp_only = NULL, verbose = TRUE, SL.library = NULL )mcee( data, id, dp, outcome, treatment, mediator, availability = NULL, rand_prob, time_varying_effect_form, control_formula_with_mediator, control_reg_method = c("glm", "gam", "rf", "ranger", "sl"), weight_per_row = NULL, specific_dp_only = NULL, verbose = TRUE, SL.library = NULL )
data |
A data.frame in long format (one row per id-by-decision point). |
id |
Character. Column name for subject identifier. |
dp |
Character. Column name for decision point index (must increase strictly within subject). |
outcome |
Character. Column name for distal outcome (constant within subject). |
treatment |
Character. Column name for treatment (coded 0/1). |
mediator |
Character. Column name for mediator. |
availability |
Optional character. Column name for availability (0/1). If |
rand_prob |
Either a column name in |
time_varying_effect_form |
RHS-only formula for the basis |
control_formula_with_mediator |
RHS-only formula for control variables used in nuisance models that may include the mediator (the wrapper will drop the mediator internally for nuisances that must exclude it). |
control_reg_method |
Learner for nuisance fits: one of |
weight_per_row |
Optional numeric vector of row weights (nonnegative, length |
specific_dp_only |
Optional numeric vector of decision points to target; internally converted to |
verbose |
Logical; print progress messages. |
SL.library |
Optional character vector of SuperLearner libraries (used when |
Requirements: rows grouped by subject, strictly increasing dp within subject,
no missing (NA/NaN/Inf) in relevant variables. If availability
is supplied, the wrapper enforces at : in the nuisances.
An object of class "mcee_fit" with elements:
mcee_fit: list with alpha_hat, beta_hat, alpha_se, beta_se,
varcov, alpha_varcov, beta_varcov.
nuisance_models: fitted Stage-1 models for p,q,eta,mu,nu.
nuisance_fitted: per-row fitted values for the nuisance functions.
meta: list with basis dimension, number of ids, per-id lengths, weights used.
call: the matched call.
summary.mcee_fit, mcee_general, mcee_userfit_nuisance
set.seed(1) n <- 10 T <- 4 id <- rep(1:n, each = T) dp <- rep(1:T, times = n) A <- rbinom(n * T, 1, 0.5) M <- rbinom(n * T, 1, plogis(-0.2 + 0.3 * A + 0.1 * dp)) Y <- ave(0.5 * A + 0.6 * M + 0.1 * dp + rnorm(n * T), id) dat <- data.frame(id, dp, A, M, Y) fit <- mcee(dat, "id", "dp", "Y", "A", "M", time_varying_effect_form = ~1, control_formula_with_mediator = ~ dp + M, control_reg_method = "glm", rand_prob = 0.5, verbose = TRUE ) summary(fit)set.seed(1) n <- 10 T <- 4 id <- rep(1:n, each = T) dp <- rep(1:T, times = n) A <- rbinom(n * T, 1, 0.5) M <- rbinom(n * T, 1, plogis(-0.2 + 0.3 * A + 0.1 * dp)) Y <- ave(0.5 * A + 0.6 * M + 0.1 * dp + rnorm(n * T), id) dat <- data.frame(id, dp, A, M, Y) fit <- mcee(dat, "id", "dp", "Y", "A", "M", time_varying_effect_form = ~1, control_formula_with_mediator = ~ dp + M, control_reg_method = "glm", rand_prob = 0.5, verbose = TRUE ) summary(fit)
Creates a configuration to fit nuisance parameters using generalized additive models
via mgcv::gam(). Supports smooth terms like s().
mcee_config_gam(target, formula, family = NULL, clipping = NULL)mcee_config_gam(target, formula, family = NULL, clipping = NULL)
target |
Character. Nuisance parameter name ("p", "q", "eta", "mu", "nu"). |
formula |
RHS-only formula (e.g., |
family |
Optional GLM family. Defaults to |
clipping |
Optional numeric vector |
A configuration list for use with mcee_general.
# GAM with smooth time effect cfg_eta <- mcee_config_gam("eta", ~ X1 + s(dp, k = 4)) # GAM with multiple smooths cfg_mu <- mcee_config_gam("mu", ~ s(dp) + s(M, X1, k = 10))# GAM with smooth time effect cfg_eta <- mcee_config_gam("eta", ~ X1 + s(dp, k = 4)) # GAM with multiple smooths cfg_mu <- mcee_config_gam("mu", ~ s(dp) + s(M, X1, k = 10))
Creates a configuration to fit nuisance parameters using generalized linear models
via stats::glm().
mcee_config_glm(target, formula, family = NULL, clipping = NULL)mcee_config_glm(target, formula, family = NULL, clipping = NULL)
target |
Character. Nuisance parameter name ("p", "q", "eta", "mu", "nu"). |
formula |
RHS-only formula (e.g., |
family |
Optional GLM family. Defaults to |
clipping |
Optional numeric vector |
A configuration list for use with mcee_general.
# Binary outcome model for propensity cfg_q <- mcee_config_glm("q", ~ dp + M, family = binomial()) # Gaussian outcome model cfg_eta <- mcee_config_glm("eta", ~ dp + X1)# Binary outcome model for propensity cfg_q <- mcee_config_glm("q", ~ dp + M, family = binomial()) # Gaussian outcome model cfg_eta <- mcee_config_glm("eta", ~ dp + X1)
Creates a configuration for nuisance parameters with known constant values, bypassing model fitting. Useful for known randomization probabilities in MRTs.
mcee_config_known(target, value = NULL, a1 = NULL, a0 = NULL)mcee_config_known(target, value = NULL, a1 = NULL, a0 = NULL)
target |
Character. Nuisance parameter name ("p", "q", "eta", "mu", "nu"). |
value |
Numeric scalar. Single constant value for all observations. |
a1, a0
|
Numeric scalars. Arm-specific constants for A=1 and A=0 conditions.
If provided, these override |
A configuration list for use with mcee_general.
# Known randomization probability cfg_p <- mcee_config_known("p", 0.6) # Arm-specific known values cfg_eta <- mcee_config_known("eta", a1 = 0.8, a0 = 0.2)# Known randomization probability cfg_p <- mcee_config_known("p", 0.6) # Arm-specific known values cfg_eta <- mcee_config_known("eta", a1 = 0.8, a0 = 0.2)
Creates a configuration to fit nuisance parameters using linear models
via stats::lm(). Only appropriate for continuous outcomes.
mcee_config_lm(target, formula)mcee_config_lm(target, formula)
target |
Character. Nuisance parameter name ("p", "q", "eta", "mu", "nu"). |
formula |
RHS-only formula (e.g., |
A configuration list for use with mcee_general.
# Linear model for continuous outcome cfg_eta <- mcee_config_lm("eta", ~ dp + X1 + X2)# Linear model for continuous outcome cfg_eta <- mcee_config_lm("eta", ~ dp + X1 + X2)
mcee_general()
Creates a configuration list describing **how to obtain a nuisance function**
used by mcee_general. You may either:
supply **known values** (bypasses learning), or
specify a **learning method** (e.g., GLM/GAM/RF/Ranger/SL) with a formula.
mcee_config_maker( target, method = NULL, formula = NULL, family = NULL, known = NULL, known_a1 = NULL, known_a0 = NULL, clipping = NULL, SL.library = NULL, ... )mcee_config_maker( target, method = NULL, formula = NULL, family = NULL, known = NULL, known_a1 = NULL, known_a0 = NULL, clipping = NULL, SL.library = NULL, ... )
target |
Character; which nuisance to configure. One of
|
method |
Optional character learner name when *not* using known values.
Supported:
Ignored if any of |
formula |
RHS-only formula describing predictors for the learner
(used when |
family |
Optional GLM/GAM family. If |
known |
Optional numeric scalar/vector of **known values** for the nuisance.
Commonly used for |
known_a1, known_a0
|
Optional numeric scalar/vector providing known values
for the treatment-specific versions of a nuisance (e.g., |
clipping |
Optional numeric vector of length 2, |
SL.library |
Character vector of SuperLearner libraries (only used
when |
... |
Reserved for future extensions; currently ignored. |
If any of known, known_a1, or known_a0 is provided,
the returned configuration is of type “known” and **no learner will be fit**.
Otherwise, the configuration records the requested learner, formula, family,
optional clipping, and (for SL) the library.
Internally, helper validators ensure method is supported and
clipping (if provided) is sane. Family defaults are chosen when
family = NULL for GLM/GAM methods.
A named list describing the configuration. For known configs:
list( nuisance_parameter = <target>, known = <numeric or NULL>, known_a1 = <numeric or NULL>, known_a0 = <numeric or NULL>, clipping = <numeric length-2 or NULL> )
For learner configs:
list( nuisance_parameter = <target>, method = <character>, formula = <formula or NULL>, family = <family or NULL>, clipping = <numeric length-2 or NULL>, SL.library = <character vector; only when method == "sl"> )
mcee_general,
helper constructors like
mcee_config_known(),
mcee_config_glm(),
mcee_config_gam(),
mcee_config_lm(),
mcee_config_rf(),
mcee_config_ranger(),
mcee_config_sl(),
mcee_config_sl_user().
# Known p (MRT randomization), GLM for other nuisances cfg_p <- mcee_config_maker("p", known = 0.5) cfg_q <- mcee_config_maker("q", method = "glm", formula = ~ dp + M) cfg_eta <- mcee_config_maker("eta", method = "glm", formula = ~dp) cfg_mu <- mcee_config_maker("mu", method = "glm", formula = ~ dp + M) cfg_nu <- mcee_config_maker("nu", method = "glm", formula = ~dp) # SuperLearner with default library (set explicitly if you prefer) # cfg_q_sl <- mcee_config_maker("q", method = "sl", formula = ~ dp + M, # SL.library = c("SL.mean","SL.glm","SL.ranger")) # Known treatment-specific outcome regressions (e.g., from external source) # cfg_eta_known <- mcee_config_maker("eta", known_a1 = rep(1, 100), # known_a0 = rep(0, 100))# Known p (MRT randomization), GLM for other nuisances cfg_p <- mcee_config_maker("p", known = 0.5) cfg_q <- mcee_config_maker("q", method = "glm", formula = ~ dp + M) cfg_eta <- mcee_config_maker("eta", method = "glm", formula = ~dp) cfg_mu <- mcee_config_maker("mu", method = "glm", formula = ~ dp + M) cfg_nu <- mcee_config_maker("nu", method = "glm", formula = ~dp) # SuperLearner with default library (set explicitly if you prefer) # cfg_q_sl <- mcee_config_maker("q", method = "sl", formula = ~ dp + M, # SL.library = c("SL.mean","SL.glm","SL.ranger")) # Known treatment-specific outcome regressions (e.g., from external source) # cfg_eta_known <- mcee_config_maker("eta", known_a1 = rep(1, 100), # known_a0 = rep(0, 100))
Creates a configuration to fit nuisance parameters using ranger random forests
via ranger::ranger(). Faster alternative to randomForest.
mcee_config_ranger(target, formula)mcee_config_ranger(target, formula)
target |
Character. Nuisance parameter name ("p", "q", "eta", "mu", "nu"). |
formula |
RHS-only formula (e.g., |
A configuration list for use with mcee_general.
# Ranger random forest for outcome model cfg_eta <- mcee_config_ranger("eta", ~ dp + X1 + X2 + X3)# Ranger random forest for outcome model cfg_eta <- mcee_config_ranger("eta", ~ dp + X1 + X2 + X3)
Creates a configuration to fit nuisance parameters using random forests
via randomForest::randomForest(). Good for nonlinear patterns.
mcee_config_rf(target, formula)mcee_config_rf(target, formula)
target |
Character. Nuisance parameter name ("p", "q", "eta", "mu", "nu"). |
formula |
RHS-only formula (e.g., |
A configuration list for use with mcee_general.
# Random forest for complex propensity model cfg_q <- mcee_config_rf("q", ~ dp + M + X1 + X2)# Random forest for complex propensity model cfg_q <- mcee_config_rf("q", ~ dp + M + X1 + X2)
Creates a configuration to fit nuisance parameters using SuperLearner
via SuperLearner::SuperLearner(). Automatically selects among
multiple learning algorithms.
mcee_config_sl(target, formula, SL.library = NULL, clipping = NULL)mcee_config_sl(target, formula, SL.library = NULL, clipping = NULL)
target |
Character. Nuisance parameter name ("p", "q", "eta", "mu", "nu"). |
formula |
RHS-only formula (e.g., |
SL.library |
Optional character vector of learner names. If |
clipping |
Optional numeric vector |
A configuration list for use with mcee_general.
# SuperLearner with default library cfg_q <- mcee_config_sl("q", ~ dp + M + X1) # SuperLearner with custom library cfg_eta <- mcee_config_sl("eta", ~ dp + X1, SL.library = c("SL.glm", "SL.rf", "SL.ranger") )# SuperLearner with default library cfg_q <- mcee_config_sl("q", ~ dp + M + X1) # SuperLearner with custom library cfg_eta <- mcee_config_sl("eta", ~ dp + X1, SL.library = c("SL.glm", "SL.rf", "SL.ranger") )
Creates a configuration to fit nuisance parameters using SuperLearner with a user-specified library (required parameter).
mcee_config_sl_user(target, formula, SL.library, clipping = NULL)mcee_config_sl_user(target, formula, SL.library, clipping = NULL)
target |
Character. Nuisance parameter name ("p", "q", "eta", "mu", "nu"). |
formula |
RHS-only formula (e.g., |
SL.library |
Character vector of learner names (required). |
clipping |
Optional numeric vector |
A configuration list for use with mcee_general.
# SuperLearner with specific library cfg_mu <- mcee_config_sl_user("mu", ~ dp + M + X1, SL.library = c("SL.glm", "SL.earth", "SL.nnet") )# SuperLearner with specific library cfg_mu <- mcee_config_sl_user("mu", ~ dp + M + X1, SL.library = c("SL.glm", "SL.earth", "SL.nnet") )
Like mcee, but each nuisance function is configured explicitly
via config_* objects (formula/method/family or known).
mcee_general( data, id, dp, outcome, treatment, mediator, availability = NULL, time_varying_effect_form, config_p, config_q, config_eta, config_mu, config_nu, weight_per_row = NULL, verbose = TRUE )mcee_general( data, id, dp, outcome, treatment, mediator, availability = NULL, time_varying_effect_form, config_p, config_q, config_eta, config_mu, config_nu, weight_per_row = NULL, verbose = TRUE )
data |
A data.frame in long format (one row per id-by-decision point). |
id |
Character. Column name for subject identifier. |
dp |
Character. Column name for decision point index (must increase strictly within subject). |
outcome |
Character. Column name for distal outcome (constant within subject). |
treatment |
Character. Column name for treatment (coded 0/1). |
mediator |
Character. Column name for mediator. |
availability |
Optional character. Column name for availability (0/1). If |
time_varying_effect_form |
RHS-only formula for the basis |
config_p, config_q, config_eta, config_mu, config_nu
|
Lists created by
|
weight_per_row |
Optional numeric vector of row weights (nonnegative, length |
verbose |
Logical; print progress messages. |
Use this wrapper for observational studies (estimate p) or when you want
different learners per nuisance. The same data requirements as mcee apply.
An "mcee_fit" object; see mcee.
mcee, mcee_userfit_nuisance, mcee_config_maker
set.seed(1) n <- 10 T <- 4 id <- rep(1:n, each = T) dp <- rep(1:T, times = n) A <- rbinom(n * T, 1, 0.5) M <- rbinom(n * T, 1, plogis(-0.2 + 0.3 * A + 0.1 * dp)) Y <- ave(0.5 * A + 0.6 * M + 0.1 * dp + rnorm(n * T), id) dat <- data.frame(id, dp, A, M, Y) cfg <- list( p = mcee_config_known("p", 0.5), q = mcee_config_glm("q", ~ dp + M), eta = mcee_config_glm("eta", ~dp), mu = mcee_config_glm("mu", ~ dp + M), nu = mcee_config_glm("nu", ~dp) ) fit_gen <- mcee_general(dat, "id","dp","Y","A","M", time_varying_effect_form = ~ dp, config_p=cfg$p, config_q=cfg$q, config_eta=cfg$eta, config_mu=cfg$mu, config_nu=cfg$nu)set.seed(1) n <- 10 T <- 4 id <- rep(1:n, each = T) dp <- rep(1:T, times = n) A <- rbinom(n * T, 1, 0.5) M <- rbinom(n * T, 1, plogis(-0.2 + 0.3 * A + 0.1 * dp)) Y <- ave(0.5 * A + 0.6 * M + 0.1 * dp + rnorm(n * T), id) dat <- data.frame(id, dp, A, M, Y) cfg <- list( p = mcee_config_known("p", 0.5), q = mcee_config_glm("q", ~ dp + M), eta = mcee_config_glm("eta", ~dp), mu = mcee_config_glm("mu", ~ dp + M), nu = mcee_config_glm("nu", ~dp) ) fit_gen <- mcee_general(dat, "id","dp","Y","A","M", time_varying_effect_form = ~ dp, config_p=cfg$p, config_q=cfg$q, config_eta=cfg$eta, config_mu=cfg$mu, config_nu=cfg$nu)
Fits all nuisance components (Stage 1) and then computes the MCEE parameters (Stage 2) and their sandwich variance. This is a low-level driver used by the high-level wrapper; it assumes 'omega_nrows' and 'f_nrows' are already aligned to the rows of 'data'.
mcee_helper_2stage_estimation( data, id_var, dp_var, outcome_var, treatment_var, mediator_var, avail_var = NULL, config_p, config_q, config_eta, config_mu, config_nu, omega_nrows, f_nrows )mcee_helper_2stage_estimation( data, id_var, dp_var, outcome_var, treatment_var, mediator_var, avail_var = NULL, config_p, config_q, config_eta, config_mu, config_nu, omega_nrows, f_nrows )
data |
A long-format 'data.frame' (one row per subject-by-decision point). |
id_var |
Character scalar. Name of the subject ID column. |
dp_var |
Character scalar. Name of the decision point column (values need not be consecutive; they may vary in count across subjects). |
outcome_var |
Character scalar. Name of the distal outcome column. |
treatment_var |
Character scalar. Name of the binary treatment column (coded 0/1). |
mediator_var |
Character scalar. Name of the mediator column. |
avail_var |
Character scalar or 'NULL'. Name of the availability column (1 = available, 0 = unavailable). If 'NULL', availability is treated as all 1. |
config_p |
Configuration for
|
config_q |
Configuration for |
config_eta |
Configuration for |
config_mu |
Configuration for |
config_nu |
Configuration for |
omega_nrows |
Numeric vector of length |
f_nrows |
Numeric matrix with |
Availability handling:
When avail_var exists and equals 0, Stage 1 sets the working probabilities
to 1 for that row (e.g., , , similarly
for ). This prevents division-by-zero in the estimating equations.
Auto-family rules:
If family is omitted in a GLM/GAM config, it defaults to binomial()
for config_p and config_q, and to gaussian() for
config_eta, config_mu, and config_nu.
Learners:
"glm": uses stats::glm().
"gam": uses mgcv::gam() (supports s() smooths).
"rf": uses randomForest::randomForest().
"ranger": uses ranger::ranger().
"sl": uses SuperLearner::SuperLearner().
If SL.library is not given, a simple default library is used:
c("SL.mean", "SL.glm", "SL.gam").
A list with components:
fitA list with entries
alpha_hat, alpha_se, beta_hat, beta_se,
and varcov (the sandwich variance for ).
nuisance_modelsA list of fitted Stage-1 objects:
p, q, eta1, eta0, mu1, mu0, nu1, nu0.
(For known/known_a0/known_a1, a small descriptor list is returned.)
mcee_general for a high-level wrapper that constructs
omega_nrows and f_nrows from user-friendly arguments.
## Not run: # Minimal sketch (assuming `df` has columns id, t, A, M, Y, I): fit <- mcee_helper_2stage_estimation( data = df, id_var = "id", dp_var = "t", outcome_var = "Y", treatment_var = "A", mediator_var = "M", avail_var = "I", config_p = list(formula = ~ t + M, method = "glm"), # binomial auto config_q = list(formula = ~ t + M + A, method = "glm"), # binomial auto config_eta = list(formula = ~t, method = "gam"), # gaussian auto config_mu = list(formula = ~ t + s(M), method = "gam"), # gaussian auto config_nu = list(formula = ~t, method = "glm"), # gaussian auto omega_nrows = rep(1, nrow(df)), f_nrows = cbind(1) # marginal (p = 1) ) fit$fit$alpha_hat fit$fit$beta_hat ## End(Not run)## Not run: # Minimal sketch (assuming `df` has columns id, t, A, M, Y, I): fit <- mcee_helper_2stage_estimation( data = df, id_var = "id", dp_var = "t", outcome_var = "Y", treatment_var = "A", mediator_var = "M", avail_var = "I", config_p = list(formula = ~ t + M, method = "glm"), # binomial auto config_q = list(formula = ~ t + M + A, method = "glm"), # binomial auto config_eta = list(formula = ~t, method = "gam"), # gaussian auto config_mu = list(formula = ~ t + s(M), method = "gam"), # gaussian auto config_nu = list(formula = ~t, method = "glm"), # gaussian auto omega_nrows = rep(1, nrow(df)), f_nrows = cbind(1) # marginal (p = 1) ) fit$fit$alpha_hat fit$fit$beta_hat ## End(Not run)
Fits all five nuisance components required for MCEE estimation and returns both per-row predictions and fitted model objects. This is Stage 1 of the two-stage MCEE procedure.
mcee_helper_stage1_fit_nuisance( data, id_var, dp_var, outcome_var, treatment_var, mediator_var, avail_var, config_p, config_q, config_eta, config_mu, config_nu )mcee_helper_stage1_fit_nuisance( data, id_var, dp_var, outcome_var, treatment_var, mediator_var, avail_var, config_p, config_q, config_eta, config_mu, config_nu )
data |
Data frame in long format. |
id_var, dp_var, outcome_var, treatment_var, mediator_var, avail_var
|
Character column names (same as in |
config_p, config_q, config_eta, config_mu, config_nu
|
Configuration lists
for each nuisance parameter (see |
**Nuisance Parameters Fitted:**
p: Propensity score - fitted on available rows. (Technically,
this is , but the user is allowed to input
and the function will automatically correct it by setting p1 = 1 when .)
q: Conditional propensity - fitted on available rows. (Technically,
this is , but the user is allowed to input
and the function will automatically correct it by setting q1 = 1 when .)
eta1, eta0: Outcome regression without mediator
mu1, mu0: Outcome regression with mediator
nu1, nu0: Cross-world regressions for counterfactual outcomes
**Data Subsets Used for Fitting:**
- p, q: Only rows where availability==1
- eta1, mu1: Rows where A==1 OR availability==0
- eta0, mu0: Rows where A==0
- nu1: Fitted on A==0 rows using mu1 predictions as outcome
- nu0: Fitted on A==1 or unavailable rows using mu0 predictions as outcome
**Availability Handling:**
When availability==0, predictions are forced to p1=p0=q1=q0=1 to
prevent division by zero in Stage 2.
List with two components:
nuisance_fittedList of numeric vectors (length nrow(data)) containing
per-row predictions: p1, p0, q1, q0, eta1,
eta0, mu1, mu0, nu1, nu0.
nuisance_modelsList of fitted model objects or "known" descriptors:
p, q, eta1, eta0, mu1, mu0, nu1, nu0.
Computes the Natural Direct Excursion Effect (NDEE; ) and Natural Indirect
Excursion Effect (NIEE; ) parameters using Stage-1 nuisance predictions.
This is Stage 2 of the two-stage MCEE procedure.
mcee_helper_stage2_estimate_mcee( data, id_var, dp_var, outcome_var, treatment_var, avail_var = NULL, p1, p0, q1, q0, eta1, eta0, mu1, mu0, nu1, nu0, omega_nrows, f_nrows )mcee_helper_stage2_estimate_mcee( data, id_var, dp_var, outcome_var, treatment_var, avail_var = NULL, p1, p0, q1, q0, eta1, eta0, mu1, mu0, nu1, nu0, omega_nrows, f_nrows )
data |
Data frame in long format. |
id_var, dp_var, outcome_var, treatment_var, avail_var
|
Character column names. |
p1, p0, q1, q0, eta1, eta0, mu1, mu0, nu1, nu0
|
Numeric vectors of length |
omega_nrows |
Numeric vector of length |
f_nrows |
Numeric matrix with |
**MCEE Estimating Equations:**
The function constructs influence functions , , for each row and
solves the estimating equations:
**NDEE ()**:
**NIEE ()**:
**Influence Functions:**
- : Direct effect pathway influence function
- : Mediated effect pathway influence function
- : Control/reference pathway influence function
**Variance Estimation:** Uses sandwich variance estimation with subject-level clustering. The variance accounts for the two-stage estimation uncertainty.
List containing MCEE parameter estimates and inference:
alpha_hatVector of length p: NDEE parameter estimates
alpha_seVector of length p: NDEE standard errors
beta_hatVector of length p: NIEE parameter estimates
beta_seVector of length p: NIEE standard errors
varcovMatrix : Joint variance-covariance for
alpha_varcovMatrix : Variance-covariance for
beta_varcovMatrix : Variance-covariance for
Skips Stage-1 model fitting and uses user-provided nuisance predictions.
mcee_userfit_nuisance( data, id, dp, outcome, treatment, mediator, availability = NULL, time_varying_effect_form, p1, q1, eta1, eta0, mu1, mu0, nu1, nu0, weight_per_row = NULL, verbose = TRUE )mcee_userfit_nuisance( data, id, dp, outcome, treatment, mediator, availability = NULL, time_varying_effect_form, p1, q1, eta1, eta0, mu1, mu0, nu1, nu0, weight_per_row = NULL, verbose = TRUE )
data |
A data.frame in long format (one row per id-by-decision point). |
id |
Character. Column name for subject identifier. |
dp |
Character. Column name for decision point index (must increase strictly within subject). |
outcome |
Character. Column name for distal outcome (constant within subject). |
treatment |
Character. Column name for treatment (coded 0/1). |
mediator |
Character. Column name for mediator. |
availability |
Optional character. Column name for availability (0/1). If |
time_varying_effect_form |
RHS-only formula for the basis |
p1, q1, eta1, eta0, mu1, mu0, nu1, nu0
|
Numeric vectors (or column names) of
per-row predictions aligned with |
weight_per_row |
Optional numeric vector of row weights (nonnegative, length |
verbose |
Logical; print progress messages. |
Nuisance definitions:
p1: (known in MRTs). (Technically,
this is , but the user is allowed to input
and the function will automatically correct it by setting p1 = 1 when .)
q1: . (Technically,
this is , but the user is allowed to input
and the function will automatically correct it by setting q1 = 1 when .)
eta1, eta0: and .
mu1, mu0: and .
nu1, nu0: cross-world regressions; see vignette and paper for definitions.
If availability is provided, rows with are coerced to p1=q1=1
(and hence p0=q0=1); a warning is emitted if overrides occur.
An "mcee_fit" object; see mcee.
set.seed(1) n <- 10 T <- 4 id <- rep(1:n, each = T) dp <- rep(1:T, times = n) A <- rbinom(n * T, 1, 0.5) M <- rbinom(n * T, 1, plogis(-0.2 + 0.3 * A + 0.1 * dp)) Y <- ave(0.5 * A + 0.6 * M + 0.1 * dp + rnorm(n * T), id) dat <- data.frame(id, dp, A, M, Y) fit_usr <- mcee_userfit_nuisance(dat, "id","dp","Y","A","M", time_varying_effect_form = ~ dp, p1 = rep(0.5, nrow(dat)), q1 = runif(nrow(dat),.3,.7), eta1 = rnorm(nrow(dat)), eta0 = rnorm(nrow(dat)), mu1 = rnorm(nrow(dat)), mu0 = rnorm(nrow(dat)), nu1 = rnorm(nrow(dat)), nu0 = rnorm(nrow(dat)))set.seed(1) n <- 10 T <- 4 id <- rep(1:n, each = T) dp <- rep(1:T, times = n) A <- rbinom(n * T, 1, 0.5) M <- rbinom(n * T, 1, plogis(-0.2 + 0.3 * A + 0.1 * dp)) Y <- ave(0.5 * A + 0.6 * M + 0.1 * dp + rnorm(n * T), id) dat <- data.frame(id, dp, A, M, Y) fit_usr <- mcee_userfit_nuisance(dat, "id","dp","Y","A","M", time_varying_effect_form = ~ dp, p1 = rep(0.5, nrow(dat)), q1 = runif(nrow(dat),.3,.7), eta1 = rnorm(nrow(dat)), eta0 = rnorm(nrow(dat)), mu1 = rnorm(nrow(dat)), mu0 = rnorm(nrow(dat)), nu1 = rnorm(nrow(dat)), nu0 = rnorm(nrow(dat)))
Plot the standardized effect estimate over time, with optional bootstrap confidence bounds if available.
## S3 method for class 'mrt_effect_size' plot( x, show_ci = TRUE, col = "black", lwd = 1.5, ci_col = "red", ci_lty = 2, ... )## S3 method for class 'mrt_effect_size' plot( x, show_ci = TRUE, col = "black", lwd = 1.5, ci_col = "red", ci_lty = 2, ... )
x |
An object of class |
show_ci |
Logical; if |
col |
Color for the estimate line. |
lwd |
Line width for the estimate line. |
ci_col |
Color for CI lines. |
ci_lty |
Line type for CI lines. |
... |
Additional arguments passed to [plot()]. |
Prints formatted coefficient tables and inference results for mediated causal excursion effects, including alpha (Natural Direct Excursion Effect) and beta (Natural Indirect Excursion Effect) parameters.
## S3 method for class 'summary.mcee_fit' print(x, ...)## S3 method for class 'summary.mcee_fit' print(x, ...)
x |
An object of class |
... |
Currently unused. |
Invisibly returns the input object x. Called for side effects.
Produce inference tables for distal causal excursion effects from a
[dcee()] model. By default uses small-sample -tests with
df = object$df (subjects minus number of betas). If df
is missing or nonpositive, falls back to large-sample normal (z) inference.
## S3 method for class 'dcee_fit' summary( object, lincomb = NULL, conf_level = 0.95, show_control_fit = FALSE, ... )## S3 method for class 'dcee_fit' summary( object, lincomb = NULL, conf_level = 0.95, show_control_fit = FALSE, ... )
object |
An object of class |
lincomb |
Optional numeric vector or matrix specifying linear
combinations |
conf_level |
Confidence level for intervals (default |
show_control_fit |
Logical; if |
... |
Currently ignored. |
A list of class "summary.dcee_fit" with components:
call — the original call
df — degrees of freedom used for t-tests (may be NA)
conf_level — the confidence level
excursion_effect — data frame with coefficient table for
lincomb — optional data frame with linear-combination results
control_fit — optional list describing Stage-1 fits (only if show_control_fit)
summary method for class "emee_fit".
## S3 method for class 'emee_fit' summary( object, lincomb = NULL, conf_level = 0.95, show_control_fit = FALSE, ... )## S3 method for class 'emee_fit' summary( object, lincomb = NULL, conf_level = 0.95, show_control_fit = FALSE, ... )
object |
An object of class "emee_fit". |
lincomb |
A vector of length p (p is the number of moderators including intercept) or a matrix with p columns. When not set to 'NULL', the summary will include the specified linear combinations of the causal excursion effect coefficients and the corresponding confidence interval, standard error, and p-value. |
conf_level |
A numeric value indicating the confidence level for confidence intervals. Default to 0.95. |
show_control_fit |
A logical value of whether the fitted coefficients for the control variables will be printed in the summary. Default to FALSE. (Interpreting the fitted coefficients for control variables is not recommended.) |
... |
Further arguments passed to or from other methods. |
the original function call and the estimated causal excursion effect coefficients, confidence interval with conf_level, standard error, t-statistic value, degrees of freedom, and p-value.
fit <- emee( data = data_binary, id = "userid", outcome = "Y", treatment = "A", rand_prob = "rand_prob", moderator_formula = ~time_var1, control_formula = ~ time_var1 + time_var2, availability = "avail", numerator_prob = 0.5, start = NULL ) summary(fit)fit <- emee( data = data_binary, id = "userid", outcome = "Y", treatment = "A", rand_prob = "rand_prob", moderator_formula = ~time_var1, control_formula = ~ time_var1 + time_var2, availability = "avail", numerator_prob = 0.5, start = NULL ) summary(fit)
Prints coefficient tables for the Natural Direct Excursion Effect (alpha) and Natural Indirect Excursion Effect (beta), with small-sample t inference. Optionally reports linear combinations and Stage-1 nuisance summaries.
## S3 method for class 'mcee_fit' summary( object, lincomb_alpha = NULL, lincomb_beta = NULL, lincomb_joint = NULL, conf_level = 0.95, show_nuisance = FALSE, ... )## S3 method for class 'mcee_fit' summary( object, lincomb_alpha = NULL, lincomb_beta = NULL, lincomb_joint = NULL, conf_level = 0.95, show_nuisance = FALSE, ... )
object |
An object of class |
lincomb_alpha, lincomb_beta
|
Optional numeric vector or matrix specifying
linear combinations of |
lincomb_joint |
Optional numeric vector or matrix specifying linear
combinations of the stacked parameter |
conf_level |
Confidence level for Wald intervals (default 0.95). |
show_nuisance |
Logical; if |
... |
Unused. |
A list of class "summary.mcee_fit" with printed side effects.
# s <- summary(fit, lincomb_alpha = c(1, 9), lincomb_beta = c(1, 9))# s <- summary(fit, lincomb_alpha = c(1, 9), lincomb_beta = c(1, 9))
Summarizes the time-varying standardized proximal effect size estimates produced by [calculate_mrt_effect_size()] for continuous proximal outcomes.
## S3 method for class 'mrt_effect_size' summary(object, ...)## S3 method for class 'mrt_effect_size' summary(object, ...)
object |
An object of class |
... |
Currently ignored. |
A list of class "summary.mrt_effect_size" with components:
call — the original call
n_id — number of participants (if available)
n_time — number of decision points
smooth, loess_span, loess_degree — smoothing settings
do_bootstrap, boot_replications, confidence_alpha — bootstrap settings
effect_summary — data frame of summary statistics for estimate
ci_summary — optional data frame of CI-width statistics
summary method for class "wcls_fit".
## S3 method for class 'wcls_fit' summary( object, lincomb = NULL, conf_level = 0.95, show_control_fit = FALSE, ... )## S3 method for class 'wcls_fit' summary( object, lincomb = NULL, conf_level = 0.95, show_control_fit = FALSE, ... )
object |
An object of class "wcls_fit". |
lincomb |
A vector of length p (p is the number of moderators including intercept) or a matrix with p columns. When not set to 'NULL', the summary will include the specified linear combinations of the causal excursion effect coefficients and the corresponding confidence interval, standard error, and p-value. |
conf_level |
A numeric value indicating the confidence level for confidence intervals. Default to 0.95. |
show_control_fit |
A logical value of whether the fitted coefficients for the control variables will be printed in the summary. Default to FALSE. (Interpreting the fitted coefficients for control variables is not recommended.) |
... |
Further arguments passed to or from other methods. |
the original function call and the estimated causal excursion effect coefficients, 95 value or Wald-statistic value (depending on whether sample size is < 50), degrees of freedom, and p-value.
fit <- wcls( data = data_mimicHeartSteps, id = "userid", outcome = "logstep_30min", treatment = "intervention", rand_prob = 0.6, moderator_formula = ~1, control_formula = ~logstep_pre30min, availability = "avail", numerator_prob = 0.6 ) summary(fit)fit <- wcls( data = data_mimicHeartSteps, id = "userid", outcome = "logstep_30min", treatment = "intervention", rand_prob = 0.6, moderator_formula = ~1, control_formula = ~logstep_pre30min, availability = "avail", numerator_prob = 0.6 ) summary(fit)
Returns the estimated causal excursion effect (on additive scale) and the estimated standard error. Small sample correction using the "Hat" matrix in the variance estimate is implemented.
wcls( data, id, outcome, treatment, rand_prob, moderator_formula, control_formula, availability = NULL, numerator_prob = NULL, verbose = TRUE )wcls( data, id, outcome, treatment, rand_prob, moderator_formula, control_formula, availability = NULL, numerator_prob = NULL, verbose = TRUE )
data |
A data set in long format. |
id |
The subject id variable. |
outcome |
The outcome variable. |
treatment |
The binary treatment assignment variable. |
rand_prob |
The randomization probability variable. |
moderator_formula |
A formula for the moderator variables. This should
start with ~ followed by the moderator variables. When set to |
control_formula |
A formula for the control variables. This should
start with ~ followed by the control variables. When set to |
availability |
The availability variable. Use the default value ( |
numerator_prob |
Either a number between 0 and 1, or a variable name for
a column in data. If you are not sure what this is, use the default value ( |
verbose |
If default ('TRUE'), additional messages will be printed during data preprocessing. |
An object of type "wcls_fit"
wcls( data = data_mimicHeartSteps, id = "userid", outcome = "logstep_30min", treatment = "intervention", rand_prob = 0.6, moderator_formula = ~1, control_formula = ~logstep_pre30min, availability = "avail", numerator_prob = 0.6 )wcls( data = data_mimicHeartSteps, id = "userid", outcome = "logstep_30min", treatment = "intervention", rand_prob = 0.6, moderator_formula = ~1, control_formula = ~logstep_pre30min, availability = "avail", numerator_prob = 0.6 )