Often the case with health data, mixtures may have non-linear relationships with the outcome variable, meaning that standard additive models may not be best suited. Working with highly correlated exposures and complex interaction effects with previous models could potentially lead to inconsistent estimates. In this tutorial, we will explore Bayesian Kernel Machine Regression (BKMR) which allows for increased flexibility for complex relationships between the predictor and outcome variables.
Kernel Machine Regression provides a more flexible approach by using a kernel. Kernels in regression are beneficial as they provide an estimation of the relationship between the predictors and outcome without knowing the form (linear, logistic, exponential, etc). This means that the kernel is a non-parametric technique. In order to determine the best fit between the predictor variables and outcome of interest, the kernel weights data points that are close in distance to outcome higher, and points that have a greater distance lower. This method allows the regression model to better fit the data and account for complex relationships between variables.
Additionally, the Bayesian Kernel Machine Regression integrates Kernel Machine Regression in a Bayesian framework, placing priors on model parameters, providing uncertainty quantification through the posterior, and allowing for flexibility such as hierarchical variable selection.
library(tidyverse)
library(tidymodels)
library(ggcorrplot)
library(corrplot)
To illustrate how BKMR empowers modeling of highly correlated
predictors, we will use the nhanes dataset where the
variables are described in the file nhanes-codebook.txt.
Load this data with the load function and specify the data
file.
load(file='nhanes1518.rda')
We first explore the URX predictors (i.e. the ones related to
phthalates concentrations), subsetting the dataset to include only the
same predictors in the weighted quantile sum module, and then filter out
NA values:
nhanes<-nhanes1518 |>
select(BMXBMI,URXMHBP, URXMOH, URXMHP, URXMHH, URXMCOH, URXMHNC, URXMHH, URXECP, URXHIBP, URXMIB, RIDAGEYR)|>
na.omit()|>
mutate(BMXBMI = log(BMXBMI), RIDAGEYR = RIDAGEYR - 18)
We first start exploring the data by plotting a correlation matrix
between the variables of interest. We can use the corrplot
function within the corrplot() package, and adjust the
style to fit our aesthetics of desire. We see that the predictors are
highly correlated, especially between URXMHH and URXECP, and URXMHBP and
URXMBP:
corr_mat=cor(nhanes|>select(-RIDAGEYR),method="s")
corrplot(corr_mat, # colorful number
addCoef.col = 1, # Change font size of correlation coefficients
number.cex = 0.5)
We also plot the distributions of our variables of interest involved in the model, and notice that the variables have a long tail in the original scale, which hints at log transformation which we will perform later on.
df_long <- nhanes|>select(-RIDAGEYR) |> pivot_longer(everything())
# Plot histograms using ggplot
ggplot(df_long, aes(x = value)) +
geom_histogram() +
facet_wrap(~ name, scales = "free")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Bayesian Kernel Machine Regression (BKMR) models the relationship between outcome and variables using two parts: a nonparametric function \(h(\cdot)\) and linear function.
\[Y_i = h(\bf{Z}_i)+X_i\beta + \epsilon_i,\]
where \(\bf{Z}_i = (Z_{1i},\dots,Z_{ki})\) are chemical mixtures of observation \(i\), \(h(\cdot)\) is named predictor-response function, a nonparametric function capturing the nonlinear relationship between chemical mixtures and outcome, and \(X_i\) representa other variables which may have linear relationship with outcome and \(\beta\) corresponds to its coefficient. \(\epsilon_i \overset{i.i.d.}{\sim} N(0,\sigma^2)\) is the error term.
The nonparametric function \(h(\cdot)\) is highly flexible, and usually represented using kernel function:
\[h(\bf{Z}) = \sum_{i=1}^n K(Z_i,Z) \alpha_i,\]
where \(K(Z,Z')\) is a function measures the similarity of \(Z\) and \(Z'\), \(\alpha_i\)’s are coefficients. The outcome corresponding to \(X\) are weighted by similarity of \(Z\) to the data points \(\{Z_i,i = 1,\dots,n\}\), data points that are more similar/ closer to \(Z\) will plays more important role in outcome estimation. A common choice is a Gaussian kernel
\[K(Z,Z') = exp\{-\sum_{k=1}^K(Z_k-Z_k')^2/\rho\}.\]
The BKMR can also allow for variable selection by slightly modifying the kernel:
\[K(Z,Z';r) = exp\{-\sum_{k=1}^Kr_k(Z_k-Z_k')^2\},\]
where \(r_k\) are non-negative values, with a “spike-and-slab” prior. If \(r_k=0\), the variable \(Z_k\) are excluded from the model. This variable selection method is similar to the frequentists’ LASSO shrinkage.
Similar to previous tutorials, we are interested in the research question: predicting log BMI using phthalate measurements from urine samples for individuals above 18 years of age. We assume the age has linear relationship to logBMI. For illustration, we randomly sample 10% from the dataset.
set.seed(123)
nhanes = nhanes |> sample_frac(size = 0.1)
y <- nhanes$BMXBMI # bmi
Z <- nhanes |> select(-BMXBMI, -RIDAGEYR) # chemical mixtures
Z <- log(Z)
# group 1: URXMBP, URXMIB, URXMC1, URXMEP, URXMZP
# group 2: URXECP, URXMHH, URXMOH, URXMHP
X <- nhanes |> select(RIDAGEYR)
Z <- as.matrix(Z) |> scale()
X <- as.matrix(X)
To fit the BKMR model, we use the kmbayes function in R
package bkmr.
#install.packages("bkmr")
library(bkmr)
## For guided examples, go to 'https://jenfb.github.io/bkmr/overview.html'
In the model:
y is a vector of the response
Z is the matrix containing chemical mixtures
(variables with nonlinear relationship to outcome)
X is the matrix containing other covariates with
linear relationship to outcome
iter number of iterations of the MCMC
sampler
varsel, whether to conduct variable selection on the
predictors \(Z\).
verbose, printing the details during modeling
fitting
In this example, we set varsel = TRUE to conduct
variable selection while fitting the model.
set.seed(1)
fitkm <- kmbayes(y = y, Z = Z, X = X, iter = 5000, varsel = TRUE, verbose = FALSE)
ExtractEsts() provides the summary statistics (mean,
standard deviation, quantiles) of posterior samples of parameters \(\beta\), \(r_k\), and \(\sigma^2\). We can use q_2.5
and q_97.5 (2.5% and 97.5% quantiles) to construct 95%
credible interval.
ExtractEsts(fitkm)
## $sigsq.eps
## mean sd q_2.5 q_25 q_50 q_75
## sigsq.eps 0.05759287 0.003491528 0.0511592 0.05518165 0.05746192 0.05984281
## q_97.5
## sigsq.eps 0.06461115
##
## $beta
## mean sd q_2.5 q_25 q_50 q_75
## beta 0.006593113 0.0004329955 0.005764133 0.006298141 0.006597041 0.006885753
## q_97.5
## beta 0.007436542
##
## $lambda
## mean sd q_2.5 q_25 q_50 q_75 q_97.5
## lambda 33.64641 12.98202 14.68944 24.16913 31.62367 40.48474 67.08331
##
## $r
## mean sd q_2.5 q_25 q_50 q_75
## r1 0.0007126409 0.002991421 0.00000000 0.00000000 0.00000000 0.00000000
## r2 0.0004297215 0.002402648 0.00000000 0.00000000 0.00000000 0.00000000
## r3 0.0000000000 0.000000000 0.00000000 0.00000000 0.00000000 0.00000000
## r4 0.0000000000 0.000000000 0.00000000 0.00000000 0.00000000 0.00000000
## r5 0.0000000000 0.000000000 0.00000000 0.00000000 0.00000000 0.00000000
## r6 0.0000000000 0.000000000 0.00000000 0.00000000 0.00000000 0.00000000
## r7 0.0004761129 0.003591938 0.00000000 0.00000000 0.00000000 0.00000000
## r8 0.0145516694 0.004500822 0.01011490 0.01106806 0.01285694 0.01742771
## r9 0.0137789790 0.004082672 0.01027884 0.01203941 0.01277530 0.01413881
## q_97.5
## r1 0.01375564
## r2 0.01011203
## r3 0.00000000
## r4 0.00000000
## r5 0.00000000
## r6 0.00000000
## r7 0.01087760
## r8 0.02660138
## r9 0.01982509
We can say for the selected sample, holding other variables constant, for each additional year of age, 95% of the posterior distribution for the percentage change in BMI is between 0.578% and 0.746%.
If we conduct variable selection (varsel = TRUE) while
fitting the model, we can use ExtractPIPs() to extract the
posterior inclusion probabilities(PIP). The PIPs represent the
probability that a specific predictor is included in the model,
providing a quantitative measure of the predictor’s importance.
For the selected sample, URXHIBP and URXMIB have PIP at 1, suggesting both URXHIBP and URXMIB have potential effects to BMI. URXMHBP, URXMOH, and URXECP have non-zero but small PIP, suggesting these three chemical mixtures may have some potential effect to BMI. For the chemical mixtures with PIP equal 0, they may not have potential effects to BMI when other chemical mixtures are presented.
ExtractPIPs(fitkm)
## variable PIP
## 1 URXMHBP 0.0544
## 2 URXMOH 0.0340
## 3 URXMHP 0.0000
## 4 URXMHH 0.0000
## 5 URXMCOH 0.0000
## 6 URXMHNC 0.0000
## 7 URXECP 0.0256
## 8 URXHIBP 1.0000
## 9 URXMIB 1.0000
Alternatively, we can also use summary() to summarize
BKMR model fits. Here we set show_MH=FALSE to avoid
printing acceptance rates from Metropolis-Hastings algorithm. The
acceptance rate can be used to evaluate convergence and mixing of MCMC
sampler. We will discuss it in the MCMC
Diagnostics section.
summary(fitkm,show_MH = FALSE)
## Fitted object of class 'bkmrfit'
## Iterations: 5000
## Outcome family: gaussian
## Model fit on: 2025-03-26 22:52:43.671329
## Running time: 11.39405 mins
##
##
## Parameter estimates (based on iterations 2501-5000):
## param mean sd q_2.5 q_97.5
## 1 beta 0.00659 0.00043 0.00576 0.00744
## 2 sigsq.eps 0.05759 0.00349 0.05116 0.06461
## 3 r1 0.00071 0.00299 0.00000 0.01376
## 4 r2 0.00043 0.00240 0.00000 0.01011
## 5 r3 0.00000 0.00000 0.00000 0.00000
## 6 r4 0.00000 0.00000 0.00000 0.00000
## 7 r5 0.00000 0.00000 0.00000 0.00000
## 8 r6 0.00000 0.00000 0.00000 0.00000
## 9 r7 0.00048 0.00359 0.00000 0.01088
## 10 r8 0.01455 0.00450 0.01011 0.02660
## 11 r9 0.01378 0.00408 0.01028 0.01983
## 12 lambda 33.64641 12.98202 14.68944 67.08331
##
## Posterior inclusion probabilities:
## variable PIP
## 1 URXMHBP 0.0544
## 2 URXMOH 0.0340
## 3 URXMHP 0.0000
## 4 URXMHH 0.0000
## 5 URXMCOH 0.0000
## 6 URXMHNC 0.0000
## 7 URXECP 0.0256
## 8 URXHIBP 1.0000
## 9 URXMIB 1.0000
## NULL
Notice, we cannot directly interpret parameters \(r\). Instead, we can estimate \(h(\cdot)\) to interpret the relationship between outcome and chemical mixtures
The function \(h(\bf{Z})=h((Z_1,\dots,Z_k))\) is called predictor-response function, capturing how the predictors (chemical mixtures) influence the response (outcome variable (log(BMI))). This function allows us to account for nonlinear relationships as well as potential interactions.
Notice \(h(\cdot)\) is a high-dimensional surface which cannot be visualized, and it takes a nonparametric form which is hard to interpret. Here we introduce three different ways to explore \(h(\dot)\):
Estimate h at a given value of predictors: \(h(Z=z)\)
Uni/Bi-variable predictor-response function: holding other predictors constant, exploring relationship of one or two predictors (chemical mixtures) with the outcome, which is \(h(Z_k = z_k|Z_{(-k)}=Z_{(-k),med})\) or \(h(Z_k = z_k, Z_m = z_m|Z_{(-k,-m)}=Z_{(-k,-m),med})\).
Overall risk summaries: calculating the overall effect of predictors by comparing \(h(Z=z)/h(Z=Z_{med})\).
Notice the BKMR can be rewritten as follows:
\[y_i \sim N(h_i+x_i^T\beta,\sigma^2)\] \[\bf{h}=(h_1,\dots,h_n)^T \sim N(0,\tau K)\]
where \(K\) is a \(n\) by \(n\) kernel matrix, with \((i,j)\)-element \(K(z_i,z_j)=exp{-\sum_{k=1}^Kr_k(z_{ik}-z_{jk})^2}\).
Derived from this model, the posterior distribution of \(h(\cdot)\) is normally distributed with mean \(\mu_h(\theta)\) and variance \(V_h(\theta)\), where \(\theta=(\beta,r,\lambda)\) are parameters.
The BKMR package can estimate \(h(\cdot)\) using three methods:
(1)ComputePostmeanHnew(method = "approx"). \(h \sim N(\hat{\mu}_h,\hat{V}_h)\), where
\(\hat{\mu}_h\) and \(\hat{V}_h\) are approximated by plugging in
posterior mean of \(\theta\): \(\mu_h(\hat{\theta})\), \(V_h(\hat{\theta})\).
(2)ComputePostmeanHnew(method = "exact"). \(h \sim N(\hat{\mu}_h,\hat{V}_h)\), where
\(\hat{\mu}_h\) and \(\hat{V}_h\) are \(E(\mu_h(\theta))\) and \(E(V_h(\theta))+Var(\mu_h(\theta))\)
estimated by posterior samples.
(3)SamplePred(). \(h \sim
N(\mu_h,V_h)\). Instead of estimating posterior mean and
variance, we directly sample \(h\)
given posterior samples of \(\theta\).
The first approach is fast but only an approximation. The second and third approaches are exact methods that can provide unbiased estimates of posterior summaries, and the third approach can provide full posterior distribution. But these two exact methods take much longer time especially the third one. We recommend to use the third one for small dataset, the second one for moderate sized datasets and the first one for large dataset.
The three methods are demonstrated below using the median values of the Phalates and Phytoestrogens from the NHANES dataset.
# calculate the median of the Z's
med <- apply(Z, 2, median)
Znew <- matrix(med, nrow = 1)
# approach 1:
h_est1 <- ComputePostmeanHnew(fitkm, Znew = Znew, method = "approx")
# approach 2:
h_est2 <- ComputePostmeanHnew(fitkm, Znew = Znew, method = "exact")
# approach 3:
set.seed(1)
samps3 <- SamplePred(fitkm, Znew = Znew, Xnew = cbind(0))
compare <- data.frame(
method = c(1:3),
post_mean = c(h_est1$postmean, h_est2$postmean, mean(samps3)),
post_sd = c(sqrt(h_est1$postvar), sqrt(h_est2$postvar), sd(samps3))
)
print(compare)
## method post_mean post_sd
## 1 1 3.106074 0.01287843
## 2 2 3.102730 0.01472242
## 3 3 3.103042 0.01406543
Both the second and third methods produced similar posterior means. However, the first “approximation” method differed slightly from the remaining approaches, and underestimates the standard deviation.
PredictorResponseUnivar() captures relationship between
\(h\) and single predictor, while
fixing other predictors at their median. As demonstrated in previous
section, we can have three difference ways to estimate \(h\). The default method is “approx”, we can
switch to other methods by setting the argument
method =.
We observe that for predictors with small PIP, the posterior means (blue line) of \(h(\cdot)\) are horizontal line at 0, indicating that they do not have a large impact on the response.
Here we focus on URXHIBP and URXMIB since their PIP equal to 1. Holding other variables at their median level, people with higher value of URXHIBP, are likely have higher BMI. Holding other variables at their median level, people with higher value of URXMIB, are likely have smaller BMI.
pred.resp.univar <- PredictorResponseUnivar(fit = fitkm)
ggplot(pred.resp.univar, aes(z, est, ymin = est - 1.96*se, ymax = est + 1.96*se)) +
geom_smooth(stat = "identity") +
facet_wrap(~ variable) +
ylab("h(z)")
To investigate relationship between \(h\) and two predictors, we can plot the
relationship between outcome and the first predictor given the second
predictor at varying quantile levels, holding other predictors at a
fixed quantile level (we use median by default). We need to apply
PredictorResponseBivar() to obtain relationship between
outcome and two predictors, then apply
PredictorResponseBivarLevels() to obtain relationship
between outcome and first predictor conditional on the second
predictor.
Visualizing relationship between \(h\) and two predictors can help us check the interactions between two predictors. Here we only select predictors with non-zero PIPs.
We observe that only for URXHIBP and
URXMIB, there are clear difference among the lines in
different colors, consistent with PIP results and the trend are similar
to visualization for uni-variable predictor-response.
We can check whether there exists some interactions between
predictors from the plot. If the lines with different color cross, this
means an interaction may be present. Here most lines are parallel to
each other except for the lines between URXHIBP and
URXMIB. URXMIB grows faster if
URXHIBP is at relative small value (0.1 quantile, red
line), compared median or high level of URXHIBP.
pred.resp.bivar <- PredictorResponseBivar(fit = fitkm, min.plot.dist = 1)
pred.resp.bivar.levels <- PredictorResponseBivarLevels(
pred.resp.df = pred.resp.bivar, Z = Z, qs = c(0.1, 0.5, 0.9))
var_nonzero = ExtractPIPs(fitkm)[ExtractPIPs(fitkm)[,2]>0,1]
pred.resp.bivar.levels = pred.resp.bivar.levels %>%
filter(variable1%in%var_nonzero,variable2%in%var_nonzero)
ggplot(pred.resp.bivar.levels, aes(z1, est)) +
geom_smooth(aes(col = quantile), stat = "identity") +
facet_grid(variable2 ~ variable1) +
ggtitle("h(expos1 | quantiles of expos2)") +
xlab("expos1") +
theme_bw()
Summary statistics of the exposure-response function \(h(z)\) can be also examined and visualized. Instead of considering the effect of one predictor conditional on other predictors, we can quantify the total effect of predictors, by comparing the \(h(\cdot)\) when all predictors at their certain quantile levels relative to the median.
For example, we calculate \(h(Z=Z_q)-h(Z=Z_{0.5})\), overall
contributions of phthalate at their 0.2, 0.4, 0.6, 0.8 quantiles
(qs) minus the response at their 0.5 quantiles. Notice, we
can select reference quantile by specifying q.fixed and
select the estimation method (default is
method = "approx").
We observe a decreasing trend, suggesting people with total pathalate at higher level tend to have lower BMI holding age constant. For people at the same age, we are 95% confident that people with total pathalate at a higher level (each pathalate is at 0.8 quantile) tend to have log BMI increase by -0.037 to 0.017 as compared to people with total pathalate at moderate level (each pathlate is at median).
risks.overall <- OverallRiskSummaries(fit = fitkm, y = y, Z = Z, X = X,
qs = seq(0.2, 0.8, by = 0.2),
q.fixed = 0.5)
risks.overall %>%
mutate(lowerCI = est - 1.96*sd, upperCI = est + 1.96*sd)
## quantile est sd lowerCI upperCI
## 1 0.2 0.004784706 0.017205518 -0.028938109 0.038507522
## 2 0.4 0.001747921 0.004013293 -0.006118133 0.009613975
## 3 0.6 0.001436503 0.004407295 -0.007201796 0.010074802
## 4 0.8 -0.010267960 0.013665622 -0.037052579 0.016516658
ggplot(risks.overall, aes(quantile, est, ymin = est - 1.96*sd, ymax = est + 1.96*sd)) +
geom_pointrange()
SingVarRiskSummaries focus on \(h(z_{j,q2}-z_{j,q1}|Z_{(-j)}=Z_{(-j),0.5})\)
the difference of \(h()\) when a
predictor is at its \(q2\)-quantile and
\(q1\)-quantile, conditional on other
predictors at a fixed quantile. qs.diff which indicates
which quantile is being compared for the individual/target predictor,
q.fixed argument indicates the quantiles of other
predictors.
risks.singvar <- SingVarRiskSummaries(fit = fitkm, y = y, Z = Z, X = X,
qs.diff = c(0.25, 0.75),
q.fixed = c(0.25, 0.50, 0.75),
method = "exact")
risks.singvar
## # A tibble: 27 × 4
## q.fixed variable est sd
## <fct> <fct> <dbl> <dbl>
## 1 0.25 URXMHBP -0.00569 0.0248
## 2 0.25 URXMOH -0.00138 0.00824
## 3 0.25 URXMHP 0 0
## 4 0.25 URXMHH 0 0
## 5 0.25 URXMCOH 0 0
## 6 0.25 URXMHNC 0 0
## 7 0.25 URXECP -0.00111 0.00757
## 8 0.25 URXHIBP -0.237 0.0505
## 9 0.25 URXMIB 0.167 0.0428
## 10 0.5 URXMHBP -0.00219 0.0126
## # ℹ 17 more rows
ggplot(risks.singvar, aes(variable, est, ymin = est - 1.96*sd,
ymax = est + 1.96*sd, col = q.fixed)) +
geom_pointrange(position = position_dodge(width = 0.75)) +
coord_flip()
SingVarIntSummaries calculate \(h(z_{j,q2}-z_{j,q1}|Z_{(-j)}=Z_{(-j),p2})-h(z_{j,q2}-z_{j,q1}|Z_{(-j)}=Z_{(-j),p1})\).
risks.int <- SingVarIntSummaries(fit = fitkm, y = y, Z = Z, X = X,
qs.diff = c(0.25, 0.75),
qs.fixed = c(0.25, 0.75),
method = "exact")
risks.int
## # A tibble: 9 × 3
## variable est sd
## <fct> <dbl> <dbl>
## 1 URXMHBP 0.00654 0.0290
## 2 URXMOH 0.00125 0.00825
## 3 URXMHP 0 0
## 4 URXMHH 0 0
## 5 URXMCOH 0 0
## 6 URXMHNC 0 0
## 7 URXECP 0.000871 0.00697
## 8 URXHIBP 0.0759 0.0531
## 9 URXMIB 0.0745 0.0478
We apply SamplePred to obtain posterior prediction
samples. We can construct 95% credible interval by calculating 0.025 and
0.975 quantiles of posterior samples. Here we provide an example. For an
individual with age 18 and fairly typical exposure values, his/her log
BMI on average will between 3.07 and 3.13.
med <- apply(Z, 2, median)
Znew <- matrix(med, nrow = 1)
y_pred <- SamplePred(fitkm, Znew = Znew, Xnew = 0)
quantile(y_pred,c(0.025,0.975))
## 2.5% 97.5%
## 3.070586 3.133297
Below, we can visualize the general trace of the Markov Chain Monte Carlo method for the samples. This graphs the \(\beta\) parameter used in the BKMR model.
TracePlot(fit = fitkm, par = "beta")
Examining the traceplot, it appears the chains generally show no noticeable patterns across iterations (no apparent wandering trends). Additionally, the samples thoroughly span over a consistent range of parameter values. Therefore, our model has convergence and overall favorable mixing.
One of the appeals of BKMR regression is the ability to use
hierarchical variable selection. This method assesses which variables
are critical to the model by using grouping. Primarily, the chemical
exposures are grouped, often by their relationships with other exposures
or by using prior knowledge. Each exposure in the Z matrix can be
assigned to a group by using groups = c(1,1,2,2) where the
first two variables in Z are assigned to group 1, while the remaining
variables are placed in group 2. If the variables are not manually
grouped, there is an option for component wise selection where variables
are evaluated on an individual basis (varsel = TRUE).
Here we use the previous example, but with three groups: (1) “URXMHBP”, “URXMOH”, “URXECP”; (2)“URXMHP”,“URXMHH”,“URXMCOH”,“URXMHNC”;(3)“URXHIBP”,“URXMIB”.
set.seed(1)
fitkm_hier <- kmbayes(y = y, Z = Z, X = X, iter = 5000, varsel = TRUE,
groups = c(1,1,2,2,2,2,1,3,3), verbose = FALSE)
We can contrast the posterior inclusion probabilities from the two models:
Where the exposures were grouped:
ExtractPIPs(fitkm)
## variable PIP
## 1 URXMHBP 0.0544
## 2 URXMOH 0.0340
## 3 URXMHP 0.0000
## 4 URXMHH 0.0000
## 5 URXMCOH 0.0000
## 6 URXMHNC 0.0000
## 7 URXECP 0.0256
## 8 URXHIBP 1.0000
## 9 URXMIB 1.0000
For grouped exposures, we can see the first group and third group have
PIP greater than 0.5, suggesting including them in the model. Within the
first group, URXMHBP is more likely associated with BMI. Within the
third group, URXMIB is more likely associated with BMI.
ExtractPIPs(fitkm_hier)
## variable group groupPIP condPIP
## 1 URXMHBP 1 0.5396 0.68346924
## 2 URXMOH 1 0.5396 0.16530764
## 3 URXMHP 2 0.0160 0.40000000
## 4 URXMHH 2 0.0160 0.12500000
## 5 URXMCOH 2 0.0160 0.47500000
## 6 URXMHNC 2 0.0160 0.00000000
## 7 URXECP 1 0.5396 0.15122313
## 8 URXHIBP 3 0.6852 0.06830123
## 9 URXMIB 3 0.6852 0.93169877
Another feature of the BKMR package is that tuning parameters for the
fitting algorithm can be manually adjusted. The model uses the Markov
chain Monte Carlo method and also uses Gibbs steps for updating the
parameters. The parameters can be accessed through the
control.params argument. In the list of tuning methods,
parameters including \(\lambda\)
(accessed by lambda.jump) can be adjusted which is the
standard deviaion of the proposed distribution (\(\lambda = \tau / \sigma^2\)) where \(\tau\) is the variance of the kernel
matrix, adjusting the smoothness of the exposure-response. Additional
adjustments include the standard deviation of the proposal distribution
which can adjusted by r.jump. Parameters for the Monte
Carlo Method can also be adjusted dependent on if the variable is not
included and then included in the model or if the variable remains in
the proposed model. If the exposure is initially not included to
included in the model, the standard deviation can be specified by
r.jump1. If the variable stays in the proposed model, the
standard deviation is accessed through r.jump2. Finally,
the mean can be modified if the variable is not included to included
through r.muprop.
In addition to tuning parameters, the the prior distribution can be
adjusted. The prior distribution can be specified by
r.prior with options including include “gamma”, “invunif”,
and “unif”. The mean and standard deviation for a gamma prior can be
adjusted by mu.lambda and sigma.lambda.
Additional adjustments for the gamma prior include \(\sigma^2\) altered through
a.sigsq, b.sigsq. For a beta prior, the shape
parameters \(\pi\) can be manipulated
through a.p0, b.p0 and should be used when
varsel = TRUE.
Additional options can be found in the github page