Title: | Variable Length Markov Chain with Exogenous Covariates |
---|---|
Description: | Models categorical time series through a Markov Chain when a) covariates are predictors for transitioning into the next state/symbol and b) when the dependence in the past states has variable length. The probability of transitioning to the next state in the Markov Chain is defined by a multinomial regression whose parameters depend on the past states of the chain and, moreover, the number of states in the past needed to predict the next state also depends on the observed states themselves. See Zambom, Kim, and Garcia (2022) <doi:10.1111/jtsa.12615>. |
Authors: | Adriano Zanin Zambom Developer [aut, cre, cph], Seonjin Kim Developer [aut], Nancy Lopes Garcia Developer [aut] |
Maintainer: | Adriano Zanin Zambom Developer <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0 |
Built: | 2025-03-06 02:46:25 UTC |
Source: | https://github.com/cran/VLMCX |
VLMCX
objects that compose Variable Length Markov Chains with Exogenous Covariates
Computes the Akaike Information Criteria for the data using the estimated parameters of the multinomial logistic regression in the VLMCX fit.
AIC(fit)
AIC(fit)
fit |
a betaVLMC object. |
a numeric value with the corresponding AIC.
Adriano Zanin Zambom <[email protected]>
set.seed(1) n = 1000 d = 2 X = cbind(rnorm(n), rnorm(n)) p = 1/(1 + exp(0.5 + -2*X[,1] - 3.5*X[,2])) y = c(sample(1:0,1), rbinom(n,1, p)) fit = maximum.context(y[1:n], X, max.depth = 3, n.min = 25) draw(fit) AIC(fit) ##[1] 563.5249 fit = VLMCX(y[1:n], X, alpha.level = 0.001, max.depth = 3, n.min = 25) draw(fit) AIC(fit) ##[1] 559.4967
set.seed(1) n = 1000 d = 2 X = cbind(rnorm(n), rnorm(n)) p = 1/(1 + exp(0.5 + -2*X[,1] - 3.5*X[,2])) y = c(sample(1:0,1), rbinom(n,1, p)) fit = maximum.context(y[1:n], X, max.depth = 3, n.min = 25) draw(fit) AIC(fit) ##[1] 563.5249 fit = VLMCX(y[1:n], X, alpha.level = 0.001, max.depth = 3, n.min = 25) draw(fit) AIC(fit) ##[1] 559.4967
VLMCX
objects that compose Variable Length Markov Chains with Exogenous Covariates
Computes the Bayesian Information Criteria for the data using the estimated parameters of the multinomial logistic regression in the VLMCX fit.
BIC(fit)
BIC(fit)
fit |
a betaVLMC object. |
a numeric value with the corresponding BIC.
Adriano Zanin Zambom <[email protected]>
set.seed(1) n = 1000 d = 2 X = cbind(rnorm(n), rnorm(n)) p = 1/(1 + exp(0.5 + -2*X[,1] - 3.5*X[,2])) y = c(sample(1:0,1), rbinom(n,1, p)) fit = maximum.context(y[1:n], X, max.depth = 3, n.min = 25) draw(fit) BIC(fit) ##[1] 696.0343 fit = VLMCX(y[1:n], X, alpha.level = 0.001, max.depth = 3, n.min = 25) draw(fit) BIC(fit) ##[1] 588.9432
set.seed(1) n = 1000 d = 2 X = cbind(rnorm(n), rnorm(n)) p = 1/(1 + exp(0.5 + -2*X[,1] - 3.5*X[,2])) y = c(sample(1:0,1), rbinom(n,1, p)) fit = maximum.context(y[1:n], X, max.depth = 3, n.min = 25) draw(fit) BIC(fit) ##[1] 696.0343 fit = VLMCX(y[1:n], X, alpha.level = 0.001, max.depth = 3, n.min = 25) draw(fit) BIC(fit) ##[1] 588.9432
Extracts the estimated coefficients from a VLMCX object for a specific context (sequence of states in the past used to predict the next state/symbol of the chain).
coef(fit, context)
coef(fit, context)
fit |
a VLMCX object. |
context |
the context whose coefficients are desired. |
an object with two items:
alpha |
a vector with coefficients corresponding to the intercept for the transition into the states in the state space of |
beta |
a 3 dimensional-array of estimated coefficients corresponding to [steps in the past, number of covariate, symbol (in the state space) to transition into]. |
Adriano Zanin Zambom <[email protected]>
set.seed(1) n = 1000 d = 2 X = cbind(rnorm(n), rnorm(n)) y = rbinom(n,1,.5) fit = maximum.context(y, X) coef(fit, c(0,0,1,0)) ## context in the order: y_{t-1} = 0, y_{t-2} = 0, y_{t-3} = 1, y_{t-4} = 0
set.seed(1) n = 1000 d = 2 X = cbind(rnorm(n), rnorm(n)) y = rbinom(n,1,.5) fit = maximum.context(y, X) coef(fit, c(0,0,1,0)) ## context in the order: y_{t-1} = 0, y_{t-2} = 0, y_{t-3} = 1, y_{t-4} = 0
Prunes the given tree according to the significance of the covariates and the contexts that are determined by a multinomial regression.
context.algorithm(fit, node, alpha.level = 0.05, max.depth = 5, n.min = 5, trace = FALSE)
context.algorithm(fit, node, alpha.level = 0.05, max.depth = 5, n.min = 5, trace = FALSE)
fit |
a VLMCX object |
node |
The top most node up to which the prunning is allowed. |
alpha.level |
the alpha level for rejection of each hypothesis in the algorithm. |
max.depth |
the maximum depth of the initial "maximal" tree. |
n.min |
minimum number of observations for each parameter needed in the estimation of that context |
trace |
if trace == TRUE then information is printed during the running of the prunning algorithm. |
context.algorithm returns an object of class "VLMCX"
. The generic functions coef
, AIC
,BIC
, draw
, and LogLik
extract various useful features of the fitted object returned by VLMCX.
An object of class "VLMCX"
is a list containing at least the following components:
y |
the time series data corresponding to the states inputed by the user. |
X |
the time series covariates data inputed by the user. |
tree |
the estimated rooted tree estimated by the algorithm. Each node contains the |
LogLik |
the log-likelihood of the data using the estimated context tree. |
baseline.state |
the state used as a baseline fore the multinomial regression. |
Adriano Zanin Zambom <[email protected]>
n = 500 X = cbind(rnorm(n), rnorm(n)) y = rbinom(n,1,.5) fit = maximum.context(y, X, max.depth = 3) pruned.fit = context.algorithm(fit, fit$tree) draw(pruned.fit)
n = 500 X = cbind(rnorm(n), rnorm(n)) y = rbinom(n,1,.5) fit = maximum.context(y, X, max.depth = 3) pruned.fit = context.algorithm(fit, fit$tree) draw(pruned.fit)
Draws the rooted tree corresponding to the estimated contexts in a VLMCX
object.
draw(fit, title = "VLMCX Context Tree", print.coef = TRUE)
draw(fit, title = "VLMCX Context Tree", print.coef = TRUE)
fit |
a VLMCX object. |
title |
the title in the graph. |
print.coef |
It TRUE the algorithm prints in the console the list of all contexts and their corresponding alpha and beta coefficients for the multinomial regression. If FALSE, the algorithm prints in the console a text version of the rooted context tree. |
The graph contains circles corresponding to the estimated nodes of the contexts estimated by the algorithm but does not include the structure and covariate parameter vectors.
No return value, called for plotting only.
Adriano Zanin Zambom <[email protected]>
n = 1000 d = 2 set.seed(1) X = cbind(rnorm(n), rnorm(n)) y = rbinom(n,1,.2) fit = maximum.context(y, X) draw(fit) fit = VLMCX(y, X, alpha.level = 0.0001, max.depth = 3, n.min = 15, trace = TRUE) draw(fit) draw(fit, print.coef = FALSE)
n = 1000 d = 2 set.seed(1) X = cbind(rnorm(n), rnorm(n)) y = rbinom(n,1,.2) fit = maximum.context(y, X) draw(fit) fit = VLMCX(y, X, alpha.level = 0.0001, max.depth = 3, n.min = 15, trace = TRUE) draw(fit) draw(fit, print.coef = FALSE)
Estimates the parameters of the multinomial logistic model in the VLMCX tree for each context in the tree.
estimate(VLMCXtree, y, X)
estimate(VLMCXtree, y, X)
VLMCXtree |
a VLMCX tree |
y |
a "time series" vector (numeric, charachter, or factor) |
X |
Numeric matrix of predictors with rows corresponding to the y observations (over time) and columns corresponding to covariates. |
A tree from an object of type VLMCX. The tree contains the items
context |
the context, or sequence of symbols. |
alpha |
a vector with coefficients corresponding to the intercept for the transition into the states in the state space of |
beta |
a 3 dimensional-array of estimated coefficients corresponding to [steps in the past, number of covariate, symbol (in the state space) to transition into]. |
child |
list whose entries are |
Adriano Zanin Zambom <[email protected]>
set.seed(1) n = 4000 d = 2 X = cbind(rnorm(n), rnorm(n)) alphabet = 0:2 ### state space y = sample(alphabet,2, replace = TRUE) for (i in 3:n) { if (identical(as.numeric(y[(i-1):(i-2)]), c(0,0))) value = c(exp(-0.5 + -1*X[i-1,1] + 2.5*X[i-1,2]), exp(0.5 + -2*X[i-1,1] - 3.5*X[i-1,2])) else if (identical(as.numeric(y[(i-1):(i-2)]), c(0,1))) value = c(exp(-0.5), exp(0.5)) else value = c(runif(1,0,3), runif(1,0,3)) prob = c(1,value)/(1 + sum(value)) ## compute probs with baseline state probability y[i] = sample(alphabet,1,prob=prob) } tree = NULL tree$context = "x" ## this is the root tree$alpha = NULL tree$beta = NULL tree$child = list() this_child = NULL this_child$context = "0" this_child$alpha = 0 this_child$child = list() tree$child[[1]] = this_child this_grandchild = NULL this_grandchild$context = c(0, 0) this_grandchild$alpha = 0 this_grandchild$beta = array(c(0,0,0,0),c(1, 2, 2)) ## steps, d, alphabet (state space) this_grandchild$child = list() tree$child[[1]]$child[[1]] = this_grandchild this_other_grandchild = NULL this_other_grandchild$context = c(0, 1) this_other_grandchild$alpha = 0 this_other_grandchild$beta = NULL this_other_grandchild$child = list() tree$child[[1]]$child[[2]] = this_other_grandchild estimate(tree, y, X) fit = VLMCX(y, X, alpha.level = 0.0001, max.depth = 2, n.min = 15, trace = TRUE) estimate(fit$tree, y, X)
set.seed(1) n = 4000 d = 2 X = cbind(rnorm(n), rnorm(n)) alphabet = 0:2 ### state space y = sample(alphabet,2, replace = TRUE) for (i in 3:n) { if (identical(as.numeric(y[(i-1):(i-2)]), c(0,0))) value = c(exp(-0.5 + -1*X[i-1,1] + 2.5*X[i-1,2]), exp(0.5 + -2*X[i-1,1] - 3.5*X[i-1,2])) else if (identical(as.numeric(y[(i-1):(i-2)]), c(0,1))) value = c(exp(-0.5), exp(0.5)) else value = c(runif(1,0,3), runif(1,0,3)) prob = c(1,value)/(1 + sum(value)) ## compute probs with baseline state probability y[i] = sample(alphabet,1,prob=prob) } tree = NULL tree$context = "x" ## this is the root tree$alpha = NULL tree$beta = NULL tree$child = list() this_child = NULL this_child$context = "0" this_child$alpha = 0 this_child$child = list() tree$child[[1]] = this_child this_grandchild = NULL this_grandchild$context = c(0, 0) this_grandchild$alpha = 0 this_grandchild$beta = array(c(0,0,0,0),c(1, 2, 2)) ## steps, d, alphabet (state space) this_grandchild$child = list() tree$child[[1]]$child[[1]] = this_grandchild this_other_grandchild = NULL this_other_grandchild$context = c(0, 1) this_other_grandchild$alpha = 0 this_other_grandchild$beta = NULL this_other_grandchild$child = list() tree$child[[1]]$child[[2]] = this_other_grandchild estimate(tree, y, X) fit = VLMCX(y, X, alpha.level = 0.0001, max.depth = 2, n.min = 15, trace = TRUE) estimate(fit$tree, y, X)
Computes the log-likelihood of the data using the estimated parameters of the multinomial logistic regression based on contexts of variable length, that is, a finite suffix of the past, called "context", is used to predict the next symbol, which can have different lengths depending on the past observations themselves.
LogLik(fit)
LogLik(fit)
fit |
a VLMCX object. |
a numeric value with the corresponding log-likelihood
Adriano Zanin Zambom <[email protected]>
n = 1000 d = 2 X = cbind(rnorm(n), rnorm(n)) y = rbinom(n,1,.5) fit = maximum.context(y, X) LogLik(fit)
n = 1000 d = 2 X = cbind(rnorm(n), rnorm(n)) y = rbinom(n,1,.5) fit = maximum.context(y, X) LogLik(fit)
Build the largest context tree, which is the biggest context tree such that all elements in it have been observed at least n.min
times.
maximum.context(y, X, max.depth = 5, n.min = 5)
maximum.context(y, X, max.depth = 5, n.min = 5)
y |
a "time series" vector (numeric, charachter, or factor) |
X |
Numeric matrix of predictors with rows corresponding to the y observations (over time) and columns corresponding to covariates. |
max.depth |
Maximum depth of the desired tree. |
n.min |
Minimum number of observations per coefficient to be estimated. |
maximum.context returns an object of class "VLMCX"
. The generic functions coef
, AIC
,BIC
, draw
, and LogLik
extract various useful features of the value returned by VLMCX.
An object of class "VLMCX"
is a list containing at least the following components:
y |
the time series data corresponding to the states inputed by the user. |
X |
the time series covariates data inputed by the user. |
tree |
the estimated rooted tree estimated by the algorithm. Each node contains the |
LogLik |
the log-likelihood of the data using the estimated context tree. |
baseline.state |
the state used as a baseline fore the multinomial regression. |
Adriano Zanin Zambom <[email protected]>
n = 1000 d = 2 X = cbind(rnorm(n), rnorm(n)) y = rbinom(n,1,.5) fit = maximum.context(y, X)
n = 1000 d = 2 X = cbind(rnorm(n), rnorm(n)) y = rbinom(n,1,.5) fit = maximum.context(y, X)
Uses the estimated coefficients from a VLMCX object to estimate the next state of the Markov Chain either using new data or the original data with which the model was fit.
predict(fit, new.y = NULL, new.X = NULL)
predict(fit, new.y = NULL, new.X = NULL)
fit |
a VLMCX object. |
new.y |
the new sequency of observations of the "time series" as a vector (numeric, charachter, or factor). The values of y.new must be of the same type as the ones used to fit the VLMCX object. If new.y is NULL (or if new.X is NULL) the algorithm uses the original data used to fit the VLMCX object. |
new.X |
Numeric matrix of predictors with rows corresponding to the new.y observations (over time) and columns corresponding to covariates. |
a value of the predicted symbol of the next state of the Markoc Chain corresponding to the type of the imput (numeric, charachter, or factor).
Adriano Zanin Zambom <[email protected]>
set.seed(1) n = 1000 X = cbind(rnorm(n)) y = rbinom(n,1,.5) fit = maximum.context(y, X) ## using the original data predict(fit) ## using new data predict(fit, new.y = c(0,0,1,0,0), new.X = c(2.3, 1.1, -.2, -3,1))
set.seed(1) n = 1000 X = cbind(rnorm(n)) y = rbinom(n,1,.5) fit = maximum.context(y, X) ## using the original data predict(fit) ## using new data predict(fit, new.y = c(0,0,1,0,0), new.X = c(2.3, 1.1, -.2, -3,1))
Simulate the states of a Markov Chain based on VLMCX model.
simulate(VLMCXtree, nsim = 500, X = NULL, seed = NULL, n.start = 100)
simulate(VLMCXtree, nsim = 500, X = NULL, seed = NULL, n.start = 100)
VLMCXtree |
a VLMCX tree (a VLMCX object can also be used, in which case its tree is used). |
nsim |
non-negative integer, giving the length of the result. |
X |
A vector or matrix of exogenous variables. If vector, its length must be equal to nsim+n.start, if matrix, its first dimension must be of length nsim+n.start, if NULL a univariate independent and identically distributed Normal vector is used. |
seed |
random seed initializer. |
n.start |
the number of initial values to be discarded (because of initial effects). |
a vector with nsim simulated states.
Adriano Zanin Zambom <[email protected]>
tree = NULL tree$context = "x" ## this is the root tree$alpha = NULL tree$beta = NULL tree$child = list() this_child = NULL this_child$context = "left" this_child$alpha = 0.5 this_child$child = list() tree$child[[1]] = this_child this_grandchild = NULL this_grandchild$context = c("left", "left") this_grandchild$alpha = 0.6 this_grandchild$beta = array(c(1.9, 1.6, 2.6, -1.6),c(2, 2, 1)) ## steps, d, alphabet this_grandchild$child = list() tree$child[[1]]$child[[1]] = this_grandchild this_other_grandchild = NULL this_other_grandchild$context = c("left", "right") this_other_grandchild$alpha = -0.6 this_other_grandchild$beta = array(c(-1.3, -1.5, 2.3, -1.2),c(2, 2, 1)) this_other_grandchild$child = list() tree$child[[1]]$child[[2]] = this_other_grandchild other_child = NULL other_child$context = "right" other_child$alpha = -0.7 other_child$beta = array(c(1,-.3),c(1, 2, 1)) ## steps, d, alphabet other_child$child = list() tree$child[[2]] = other_child set.seed(1) X = cbind(rnorm(1100), rnorm(1100)) simulated.data = simulate(tree, nsim = 1000, X, seed = 1, n.start = 100) fit = VLMCX(simulated.data$y, simulated.data$X, alpha.level = 0.001, max.depth = 4, n.min = 20, trace = TRUE) draw(fit) fit
tree = NULL tree$context = "x" ## this is the root tree$alpha = NULL tree$beta = NULL tree$child = list() this_child = NULL this_child$context = "left" this_child$alpha = 0.5 this_child$child = list() tree$child[[1]] = this_child this_grandchild = NULL this_grandchild$context = c("left", "left") this_grandchild$alpha = 0.6 this_grandchild$beta = array(c(1.9, 1.6, 2.6, -1.6),c(2, 2, 1)) ## steps, d, alphabet this_grandchild$child = list() tree$child[[1]]$child[[1]] = this_grandchild this_other_grandchild = NULL this_other_grandchild$context = c("left", "right") this_other_grandchild$alpha = -0.6 this_other_grandchild$beta = array(c(-1.3, -1.5, 2.3, -1.2),c(2, 2, 1)) this_other_grandchild$child = list() tree$child[[1]]$child[[2]] = this_other_grandchild other_child = NULL other_child$context = "right" other_child$alpha = -0.7 other_child$beta = array(c(1,-.3),c(1, 2, 1)) ## steps, d, alphabet other_child$child = list() tree$child[[2]] = other_child set.seed(1) X = cbind(rnorm(1100), rnorm(1100)) simulated.data = simulate(tree, nsim = 1000, X, seed = 1, n.start = 100) fit = VLMCX(simulated.data$y, simulated.data$X, alpha.level = 0.001, max.depth = 4, n.min = 20, trace = TRUE) draw(fit) fit
Estimates a Variable Length Markov Chain model, which can also be seen as a categorical time series model, where exogenous covariates can compose the multinomial regression that predicts the next state/symbol in the chain. This type of approach is a parsimonious model where only a finite suffix of the past, called "context", is enough to predict the next symbol. The length of the each context can differ depending on the past observations themselves.
VLMCX(y, X, alpha.level = 0.05, max.depth = 5, n.min = 5, trace = FALSE)
VLMCX(y, X, alpha.level = 0.05, max.depth = 5, n.min = 5, trace = FALSE)
y |
a "time series" vector (numeric, charachter, or factor) |
X |
Numeric matrix of predictors with rows corresponding to the y observations (over time) and columns corresponding to covariates. |
alpha.level |
the alpha level for rejection of each hypothesis in the algorithm. |
max.depth |
the maximum depth of the initial "maximal" tree. |
n.min |
minimum number of observations for each parameter needed in the estimation of that context |
trace |
if trace == TRUE then information is printed during the running of the prunning algorithm. |
The algorithm is a backward selection procedure that starts with the maximal context, which is the biggest context tree such that all elements in it have been observed at least n.min
times. Then, final nodes (past most state in each context) are prunned according to the p-value from the likelihood ratio test for removing the covariates corresponding to that node and the significance of that node itself. The algorithm continues iteratively prunning until nodes cannot be prunned because the covariates or the node context itself is significant.
VLMCX returns an object of class "VLMCX"
. The generic functions coef
, AIC
,BIC
, draw
, and LogLik
extract various useful features of the fitted object returned by VLMCX.
An object of class "VLMCX"
is a list containing at least the following components:
y |
the time series data corresponding to the states inputed by the user. |
X |
the time series covariates data inputed by the user. |
tree |
the estimated rooted tree estimated by the algorithm. Each node contains the |
LogLik |
the log-likelihood of the data using the estimated context tree. |
baseline.state |
the state used as a baseline fore the multinomial regression. |
Adriano Zanin Zambom <[email protected]>
Zambom, Kim, Garcia (2022) Variable length Markov chain with exogenous covariates. Journal of Time Series Analysis, 43, 321-328.
#### Example 1 set.seed(1) n = 3000 d = 2 X = cbind(rnorm(n), rnorm(n)) alphabet = 0:2 y = sample(alphabet,2, replace = TRUE) for (i in 3:n) { if (identical(as.numeric(y[(i-1):(i-2)]), c(0,0))) value = c(exp(-0.5 + -1*X[i-1,1] + 2.5*X[i-1,2]), exp(0.5 + -2*X[i-1,1] - 3.5*X[i-1,2])) else if (identical(as.numeric(y[(i-1):(i-2)]), c(0,1))) value = c(exp(-0.5), exp(0.5)) else if (identical(as.numeric(y[(i-1):(i-2)]), c(0,2))) value = c(exp(1), exp(1)) else if (identical(as.numeric(y[(i-1):(i-2)]), c(2,0))) value = c(exp(0.5 + 1.2*X[i-1,1] + 0.5*X[i-1,2] + 2*X[i-2,1] + 1.5*X[i-2,2]), exp(-0.5 -2*X[i-1,1] - .5*X[i-1,2] +1.3*X[i-2,1] + 1.5*X[i-2,2])) else if (identical(as.numeric(y[(i-1):(i-2)]), c(2,1))) value = c(exp(-1 + -X[i-1,1] + 2.5*X[i-1,2]), exp(0.1 + -0.5*X[i-1,1] - 1.5*X[i-1,2])) else if (identical(as.numeric(y[(i-1):(i-2)]), c(2,2))) value = c(exp(-0.5 + -X[i-1,1] - 2.5*X[i-1,2]), exp(0.5 + -2*X[i-1,1] - 3.5*X[i-1,2])) else value = c(runif(1,0,3), runif(1,0,3)) prob = c(1,value)/(1 + sum(value)) ## compute probs with baseline state probability y[i] = sample(alphabet,1,prob=prob) } fit = VLMCX(y, X, alpha.level = 0.001, max.depth = 4, n.min = 15, trace = TRUE) draw(fit) ## Note the only context that was estimated but not in the true ## model is (1): removing it or not does not change the likelihood, ## so the algorithm keeps it. coef(fit, c(0,2)) predict(fit,new.y = c(0,0), new.X = matrix(c(1,1,1,1), nrow=2)) #[1] 0.2259747309 0.7738175143 0.0002077548 predict(fit,new.y = c(0,0,0), new.X = matrix(c(1,1,1,1,1,1), nrow=3)) # [1] 0.2259747309 0.7738175143 0.0002077548 #### Example 2 set.seed(1) n = 2000 d = 1 X = rnorm(n) alphabet = 0:1 y = sample(alphabet,2, replace = TRUE) for (i in 3:n) { if (identical(as.numeric(y[(i-1):(i-3)]), c(0,0, 0))) value = c(exp(-0.5 -1*X[i-1] + 2*X[i-2])) else if (identical(as.numeric(y[(i-1):(i-3)]), c(0, 0, 1))) value = c(exp(-0.5)) else if (identical(as.numeric(y[(i-1):(i-2)]), c(1,0))) value = c(exp(0.5 + 1.2*X[i-1] + 2*X[i-2] )) else if (identical(as.numeric(y[(i-1):(i-2)]), c(1,1))) value = c(exp(-1 + -X[i-1] +2*X[i-2])) else value = c(runif(1,0,3)) prob = c(1,value)/(1 + sum(value)) ## compute probs with baseline state probability y[i] = sample(alphabet,1,prob=prob) } fit = VLMCX(y, X, alpha.level = 0.001, max.depth = 4, n.min = 15, trace = TRUE) draw(fit) coef(fit, c(1,0)) #### Example 3 set.seed(1) n = 4000 d = 1 X = cbind(rnorm(n)) alphabet = 0:3 y = sample(alphabet,2, replace = TRUE) for (i in 3:n) { if (identical(as.numeric(y[(i-1):(i-2)]), c(3, 3))) value = c(exp(-0.5 -1*X[i-1] + 2.5*X[i-2]), exp(0.5 -2*X[i-1] - 3.5*X[i-2]), exp(0.5 +2*X[i-1] + 3.5*X[i-2])) else if (identical(as.numeric(y[(i-1):(i-2)]), c(3, 1))) value = c(exp(-0.5 + X[i-1]), exp(0.5 -1.4*X[i-1]), exp(0.9 +1.4*X[i-1])) else if (identical(as.numeric(y[(i-1):(i-2)]), c(1, 0))) value = c(exp(-.5), exp(.5), exp(.8)) else if (identical(as.numeric(y[(i-1):(i-2)]), c(1, 2))) value = c(exp(.4), exp(-.5), exp(.8)) else value = c(runif(1,0,3), runif(1,0,3), runif(1,0,3)) prob = c(1,value)/(1 + sum(value)) ## compute probs with baseline state probability y[i] = sample(alphabet,1,prob=prob) } fit = VLMCX(y, X, alpha.level = 0.00001, max.depth = 3, n.min = 15, trace = TRUE) ## The context (0, 1) was not identified because the draw(fit) coef(fit, c(3,1))
#### Example 1 set.seed(1) n = 3000 d = 2 X = cbind(rnorm(n), rnorm(n)) alphabet = 0:2 y = sample(alphabet,2, replace = TRUE) for (i in 3:n) { if (identical(as.numeric(y[(i-1):(i-2)]), c(0,0))) value = c(exp(-0.5 + -1*X[i-1,1] + 2.5*X[i-1,2]), exp(0.5 + -2*X[i-1,1] - 3.5*X[i-1,2])) else if (identical(as.numeric(y[(i-1):(i-2)]), c(0,1))) value = c(exp(-0.5), exp(0.5)) else if (identical(as.numeric(y[(i-1):(i-2)]), c(0,2))) value = c(exp(1), exp(1)) else if (identical(as.numeric(y[(i-1):(i-2)]), c(2,0))) value = c(exp(0.5 + 1.2*X[i-1,1] + 0.5*X[i-1,2] + 2*X[i-2,1] + 1.5*X[i-2,2]), exp(-0.5 -2*X[i-1,1] - .5*X[i-1,2] +1.3*X[i-2,1] + 1.5*X[i-2,2])) else if (identical(as.numeric(y[(i-1):(i-2)]), c(2,1))) value = c(exp(-1 + -X[i-1,1] + 2.5*X[i-1,2]), exp(0.1 + -0.5*X[i-1,1] - 1.5*X[i-1,2])) else if (identical(as.numeric(y[(i-1):(i-2)]), c(2,2))) value = c(exp(-0.5 + -X[i-1,1] - 2.5*X[i-1,2]), exp(0.5 + -2*X[i-1,1] - 3.5*X[i-1,2])) else value = c(runif(1,0,3), runif(1,0,3)) prob = c(1,value)/(1 + sum(value)) ## compute probs with baseline state probability y[i] = sample(alphabet,1,prob=prob) } fit = VLMCX(y, X, alpha.level = 0.001, max.depth = 4, n.min = 15, trace = TRUE) draw(fit) ## Note the only context that was estimated but not in the true ## model is (1): removing it or not does not change the likelihood, ## so the algorithm keeps it. coef(fit, c(0,2)) predict(fit,new.y = c(0,0), new.X = matrix(c(1,1,1,1), nrow=2)) #[1] 0.2259747309 0.7738175143 0.0002077548 predict(fit,new.y = c(0,0,0), new.X = matrix(c(1,1,1,1,1,1), nrow=3)) # [1] 0.2259747309 0.7738175143 0.0002077548 #### Example 2 set.seed(1) n = 2000 d = 1 X = rnorm(n) alphabet = 0:1 y = sample(alphabet,2, replace = TRUE) for (i in 3:n) { if (identical(as.numeric(y[(i-1):(i-3)]), c(0,0, 0))) value = c(exp(-0.5 -1*X[i-1] + 2*X[i-2])) else if (identical(as.numeric(y[(i-1):(i-3)]), c(0, 0, 1))) value = c(exp(-0.5)) else if (identical(as.numeric(y[(i-1):(i-2)]), c(1,0))) value = c(exp(0.5 + 1.2*X[i-1] + 2*X[i-2] )) else if (identical(as.numeric(y[(i-1):(i-2)]), c(1,1))) value = c(exp(-1 + -X[i-1] +2*X[i-2])) else value = c(runif(1,0,3)) prob = c(1,value)/(1 + sum(value)) ## compute probs with baseline state probability y[i] = sample(alphabet,1,prob=prob) } fit = VLMCX(y, X, alpha.level = 0.001, max.depth = 4, n.min = 15, trace = TRUE) draw(fit) coef(fit, c(1,0)) #### Example 3 set.seed(1) n = 4000 d = 1 X = cbind(rnorm(n)) alphabet = 0:3 y = sample(alphabet,2, replace = TRUE) for (i in 3:n) { if (identical(as.numeric(y[(i-1):(i-2)]), c(3, 3))) value = c(exp(-0.5 -1*X[i-1] + 2.5*X[i-2]), exp(0.5 -2*X[i-1] - 3.5*X[i-2]), exp(0.5 +2*X[i-1] + 3.5*X[i-2])) else if (identical(as.numeric(y[(i-1):(i-2)]), c(3, 1))) value = c(exp(-0.5 + X[i-1]), exp(0.5 -1.4*X[i-1]), exp(0.9 +1.4*X[i-1])) else if (identical(as.numeric(y[(i-1):(i-2)]), c(1, 0))) value = c(exp(-.5), exp(.5), exp(.8)) else if (identical(as.numeric(y[(i-1):(i-2)]), c(1, 2))) value = c(exp(.4), exp(-.5), exp(.8)) else value = c(runif(1,0,3), runif(1,0,3), runif(1,0,3)) prob = c(1,value)/(1 + sum(value)) ## compute probs with baseline state probability y[i] = sample(alphabet,1,prob=prob) } fit = VLMCX(y, X, alpha.level = 0.00001, max.depth = 3, n.min = 15, trace = TRUE) ## The context (0, 1) was not identified because the draw(fit) coef(fit, c(3,1))