| Title: | Variable Selection using the Pivotal Information Criterion |
|---|---|
| Description: | Sparse regression and classification via the Pivotal Information Criterion (PIC), an alternative to the Bayesian Information Criterion (BIC), cross-validation, and Lasso-based tuning. The regularization parameter is selected from a pivotal null-distribution statistic, eliminating the need for cross-validation and yielding sharper support recovery. Provides Fast Iterative Shrinkage-Thresholding Algorithm (FISTA) optimization for the L1, Smoothly Clipped Absolute Deviation (SCAD), and Minimax Concave Penalty (MCP) penalties across six response distributions: Gaussian, binomial, Poisson, exponential, Gumbel, and Cox. Under standard sparsity assumptions, the selector achieves a phase transition for exact support recovery, analogous to results in compressed sensing. See Sardy, van Cutsem and van de Geer (2026) <doi:10.48550/arXiv.2603.04172>. |
| Authors: | Maxime van Cutsem [aut, cre], Sylvain Sardy [aut] |
| Maintainer: | Maxime van Cutsem <[email protected]> |
| License: | GPL-2 |
| Version: | 0.1.3 |
| Built: | 2026-06-04 14:07:38 UTC |
| Source: | https://github.com/vcmaxouuu/picreg |
pic fit.Given a test set, reports a small set of family-appropriate predictive metrics. Optionally appends support-recovery diagnostics when the true active set is known.
assess(object, ...) ## S3 method for class 'pic' assess(object, newx, newy, true_features = NULL, ...)assess(object, ...) ## S3 method for class 'pic' assess(object, newx, newy, true_features = NULL, ...)
object |
A fitted |
... |
Unused; present for S3 method consistency. |
newx |
Numeric design matrix at which predictions are evaluated, with the same columns as the training data. |
newy |
Response on the new observations. Numeric vector for
all families except Cox; for Cox, a two-column matrix
|
true_features |
Optional integer or character vector listing the indices (or names) of the true active variables. When supplied, support-recovery metrics are appended. |
The metrics depend on the family:
"gaussian"MSE, MAE, R-squared.
"binomial"accuracy, AUC, binomial deviance.
"poisson"MSE, MAE, Poisson deviance.
"exponential"MSE, MAE, Exponential deviance.
"gumbel"MSE, MAE, deviance computed from the per-sample
negative log-likelihood with a moment estimate of the scale
parameter .
"cox"Harrell's C-index and the Breslow partial
log-likelihood (negative, normalized by n).
When true_features is non-NULL, four support-recovery metrics
are appended (independent of newx / newy):
exact_recovery, tpr (true-positive rate / sensitivity), fdr
(false-discovery rate) and f1 (harmonic mean of precision and
recall). Names are accepted in addition to integer positions when
the fit carries column names.
A two-column data.frame with columns metric (character)
and value (numeric).
data(QuickStartExample) X <- QuickStartExample$X y <- QuickStartExample$y fit <- pic(X, y, family = "gaussian", penalty = "scad") assess(fit, X, y, true_features = paste0("gene_", 1:5))data(QuickStartExample) X <- QuickStartExample$X y <- QuickStartExample$y fit <- pic(X, y, family = "gaussian", penalty = "scad") assess(fit, X, y, true_features = paste0("gene_", 1:5))
Breslow baseline cumulative hazard and survival.
baseline_functions(times, events, predictions)baseline_functions(times, events, predictions)
times |
Survival times (length |
events |
Event indicators (0 or 1; length |
predictions |
Risk scores; higher = higher risk = shorter survival. |
A data.frame with columns time, cumulative_hazard, survival.
A synthetic logistic-regression dataset used to illustrate pic() with
the Binomial family. It contains observations of
predictors, of which carry signal; the
remaining 45 are noise.
BinomialExampleBinomialExample
A list with two components:
XNumeric matrix of dimension
with column names gene_* and noise_*.
yInteger vector of length containing
class labels.
Column names follow the same convention as
QuickStartExample: active variables are labeled
gene_1, ..., gene_5 and inactive ones noise_1, ...,
noise_45, interleaved in random order.
The binary response is generated as
with non-zero coefficients drawn uniformly in with
random sign. The class balance is roughly of positives.
data(BinomialExample) fit <- pic(BinomialExample$X, BinomialExample$y, family = "binomial") fit$selecteddata(BinomialExample) fit <- pic(BinomialExample$X, BinomialExample$y, family = "binomial") fit$selected
Returns the fitted coefficients as a one-column sparse matrix of class
"dgCMatrix" (from the Matrix package), mirroring the output of
coef() for glmnet. The first row is the intercept, labeled
"(Intercept)" (value 0 when no intercept was fitted); the remaining
rows are the predictors. Row names are taken from the column names of X
when available (matrix colnames(X) or data-frame column names) and
otherwise default to V1, ..., Vp. Zero coefficients are stored implicitly
and printed as ".", which keeps the display compact in high dimensions.
## S3 method for class 'pic' coef(object, standardized = FALSE, ...)## S3 method for class 'pic' coef(object, standardized = FALSE, ...)
object |
A fitted |
standardized |
Logical; if |
... |
Unused; present for S3 method consistency. |
Internally the model is fitted on a standardized design matrix, so the
raw coefficients live on the standardized scale. By default this method
rescales them back to the original scale of X — the values to
plug into the un-standardized design for prediction — via
where m and s are the column mean and standard deviation. Pass
standardized = TRUE to skip the rescaling and return the raw fit
values (identical to fit$beta).
A sparse (p + 1) by 1 matrix of class "dgCMatrix", with row
names c("(Intercept)", <variables>) and column name "coefficient".
Use as.numeric() to obtain a plain numeric vector, or
which(coef(fit) != 0) to list the selected entries.
Harrell's concordance index.
concordance_index(times, events, predictions)concordance_index(times, events, predictions)
times |
Survival times (length |
events |
Event indicators (0 or 1; length |
predictions |
Risk scores; higher = higher risk = shorter survival. |
Numeric scalar in [0, 1].
n).Breslow partial log-likelihood (negative, normalized by n).
cox_partial_log_likelihood(times, events, predictions)cox_partial_log_likelihood(times, events, predictions)
times |
Survival times (length |
events |
Event indicators (0 or 1; length |
predictions |
Risk scores; higher = higher risk = shorter survival. |
A numeric scalar: the negative Breslow partial log-likelihood
for the Cox proportional-hazards model, normalized by the sample
size n. Lower values indicate a better fit.
A synthetic survival dataset used to illustrate pic() with the Cox
family. It contains subjects observed on
covariates, of which carry signal; the remaining 45 are
noise.
CoxExampleCoxExample
A list with two components:
XNumeric matrix of dimension
with column names gene_* and noise_*.
yNumeric matrix of dimension
with columns time and event.
Column names follow the same convention as
QuickStartExample: active variables are labeled
gene_1, ..., gene_5 and inactive ones noise_1, ...,
noise_45, interleaved in random order.
Event times are drawn from an exponential proportional-hazards model
and independent censoring times from . The observed response is the standard two-column
. The censoring
rate is roughly .
data(CoxExample) fit <- pic(CoxExample$X, CoxExample$y, family = "cox") fit$selecteddata(CoxExample) fit <- pic(CoxExample$X, CoxExample$y, family = "cox") fit$selected
Builds the family of survival curves obtained by varying a single
covariate while holding the others at their column means. The
returned object has the same shape as predict_survival_function(),
so it composes directly with plot_survival_curves().
feature_effects_on_survival(object, idx, values = NULL)feature_effects_on_survival(object, idx, values = NULL)
object |
A fitted |
idx |
Index of the feature to vary. Either an integer column
position in the training |
values |
Optional numeric vector of values to evaluate. When
|
Both the per-column mean row and a default grid of representative
values (used when values = NULL) are cached on the fit by pic()
under attr(fit, "preproc"), so the training design matrix does
not need to be passed back in. The cached default grid uses the
unique values of the column when there are at most five of them
(handy for ordinal / categorical covariates), and the four
equispaced empirical quantiles (0, 1/3, 2/3, 1) otherwise.
A list with components time (length K) and survival
(matrix K x length(values), one column per evaluated value).
Column names of survival are formatted as
"<feature_name> = <value>" and are picked up automatically by
plot_survival_curves() for the legend.
predict_survival_function(), plot_survival_curves().
Computes the data-driven regularization parameter
using the Pivotal Detection Boundary (PDB) principle. The selected value is
defined as the empirical quantile of a null-distribution gradient statistic
where denotes the loss evaluated under the null model.
lambda_pdb( X, family, n_simu = 5000L, alpha = 0.05, method = c("mc_exact", "mc_gaussian", "analytical") )lambda_pdb( X, family, n_simu = 5000L, alpha = 0.05, method = c("mc_exact", "mc_gaussian", "analytical") )
X |
Numeric design matrix. Columns should typically be standardized to zero mean and unit variance. |
family |
A PIC family specification. Can be either a family object
or a character string accepted by |
n_simu |
Number of Monte Carlo simulations used by the stochastic
methods. Ignored when |
alpha |
Nominal tail probability used to define the quantile level. |
method |
One of "mc_exact", "mc_gaussian", or "analytical". |
Under the null , the gradient of the loss carries only sampling noise.
The smallest large enough to dominate this noise is the natural threshold
separating signal from noise, i.e. the value above which a coefficient should be
kept rather than shrunk to zero. Calibrating this way has two
consequences. First, the quantile depends only on the design matrix ,
the family, and the level (not on the response ), so
cross-validation is no longer needed. Second, and more importantly, it leads
to sharper support recovery than prediction-error-based selectors such as
cross-validated lasso: by targeting the noise level of the gradient directly,
PDB controls the inclusion of noise variables and more reliably identifies
the true non-zero coefficients. Computing the quantile requires only the
distribution of , which lambda_pdb() estimates
through one of the three methods below.
method optionThe empirical quantile is obtained using one of the three following methods:
"mc_exact"Family-aware Monte Carlo. For each of the n_simu draws, a
response vector is sampled under the null model
() of the chosen family; the family-specific
gradient of the loss at is evaluated on the
fixed design , and its supremum norm is recorded. The
empirical quantile of the n_simu recorded
norms is returned. Most accurate but slowest of the three.
"mc_gaussian"Monte Carlo under the Gaussian approximation of the null
gradient. A central-limit argument gives
with . Each of the n_simu draws
samples directly from this Gaussian and records its supremum
norm — no family-specific evaluation needed. Family-agnostic
and noticeably faster than "mc_exact"; valid in the regime
where the CLT kicks in (moderate to large ).
"analytical"Closed-form Bonferroni bound on the Gaussian tail. Combining a
union bound over the coordinates of the gradient with
the standard Gaussian tail bound gives
Deterministic and — no simulation. Conservative
(slightly over-estimates the true quantile) and gets looser as
grows, but useful when speed matters or when Monte
Carlo is overkill.
The "mc_gaussian" and "analytical" methods use a variance scaling
factor depending on the family:
Gaussian / Binomial / Poisson / Exponential:
Gumbel: , with the Euler-Mascheroni constant.
Cox:
Both Monte Carlo methods are dominated by a single
matrix product , where stacks the simulated
residuals. This product is dispatched to BLAS as a single ,
which is essentially as fast as a Monte Carlo selector can be on dense
designs. For very large or , however, the constant
becomes large and "mc_exact" may be unnecessarily expensive:
As grows, the central-limit approximation tightens
and "mc_gaussian" gives essentially the same
as "mc_exact" at a fraction of the cost (no family-specific
residual generation, fewer dependencies on -draws).
"analytical" is and useful as a quick upper
bound — slightly conservative, but accurate enough for triage
and for very high where the Bonferroni tail is tight.
Practical rule of thumb: prefer "mc_exact" for small to moderate
problems (default), "mc_gaussian" once grows and the
design is dense, and "analytical" when even the Monte Carlo cost is
a concern.
alpha optionThe level is the nominal Type-I error of the test of
. By construction of the quantile,
so under the null model no variable enters the active set with
probability at most . With the default
, this caps the false-discovery rate at
under the null: when the data carry no signal, picreg returns the
empty support of the time.
An object of class "pic.lambda_pdb".
value |
Selected regularization parameter |
statistics |
Simulated null statistics used to estimate the quantile.
|
method |
Estimation method used. |
alpha |
Quantile level parameter. |
n_simu |
Number of Monte Carlo simulations. |
X <- scale(matrix(rnorm(100 * 20), 100, 20)) lam <- lambda_pdb( X, family = "gaussian", method = "mc_exact" ) print(lam)X <- scale(matrix(rnorm(100 * 20), 100, 20)) lam <- lambda_pdb( X, family = "gaussian", method = "mc_exact" ) print(lam)
For each n in n_grid, draws a standardized Gaussian design matrix
of shape (n, p) and computes the null gradient-norm statistic via
the three available selectors: "mc_exact", "mc_gaussian", and
"analytical". Stores the simulated Monte Carlo statistics and the
three resulting values per n.
pdb_asymptotic( n_grid, p, type = c("gaussian", "binomial", "poisson", "exponential", "gumbel", "cox"), alpha = 0.05, n_simu = 5000L, verbose = FALSE )pdb_asymptotic( n_grid, p, type = c("gaussian", "binomial", "poisson", "exponential", "gumbel", "cox"), alpha = 0.05, n_simu = 5000L, verbose = FALSE )
n_grid |
Integer vector of sample sizes to evaluate. |
p |
Number of features (scalar integer). |
type |
Family name: |
alpha |
Nominal level used for the (1 - alpha) quantile. |
n_simu |
Monte Carlo size for each selector. |
verbose |
Logical; if |
The intended use is to visualize the convergence of the exact
family-specific null distribution to the Gaussian approximation as
n grows — i.e., to check empirically that mc_gaussian is a valid
substitute for mc_exact in the asymptotic regime.
An object of class c("pic.pdb_asymptotic", "pic.diagnostic").
n_grid, p, type, alpha, n_simu
|
Configuration. |
stats_exact, stats_gaussian
|
Lists of length |
lambda_exact, lambda_gaussian, lambda_analytical
|
Numeric
vectors of length |
call |
The call. |
plot.pic.pdb_asymptotic() for visualization.
as_ <- pdb_asymptotic(n_grid = c(50, 200, 1000), p = 200, type = "poisson") plot(as_)as_ <- pdb_asymptotic(n_grid = c(50, 200, 1000), p = 200, type = "poisson") plot(as_)
Prints a formatted summary of the Pivotal Detection Boundary selector
used by pic() to choose : method, nominal level,
Monte Carlo size, selected , and a compact view of
the null distribution when Monte Carlo was run. For models fitted
with method = "analytical" or with a user-supplied lambda, only
the selector metadata is shown.
pdb_summary(model, digits = 4L)pdb_summary(model, digits = 4L)
model |
A fitted |
digits |
Number of significant digits used in the distribution table. |
Invisibly returns NULL. Called for its side effect (printing).
data(QuickStartExample) fit <- pic(QuickStartExample$X, QuickStartExample$y, family = "gaussian", penalty = "lasso") pdb_summary(fit)data(QuickStartExample) fit <- pic(QuickStartExample$X, QuickStartExample$y, family = "gaussian", penalty = "lasso") pdb_summary(fit)
For a fixed (n, p) configuration, varies the sparsity level s
from 0 to s_max and, for each s, estimates by Monte Carlo
(m replications) the probability that pic() recovers exactly
the support of the true coefficient vector. If several penalties are
supplied, one result is produced for each (n, p, penalty) combination.
phase_transition( n, p, type = c("gaussian", "binomial", "poisson", "exponential", "gumbel", "cox"), s_max, m = 50, penalty = "lasso", beta_value = 3, lambda_method = "mc_exact", lambda_alpha = 0.05, lambda_n_simu = 5000L, verbose = FALSE, parallel = FALSE, workers = parallel::detectCores() - 1L )phase_transition( n, p, type = c("gaussian", "binomial", "poisson", "exponential", "gumbel", "cox"), s_max, m = 50, penalty = "lasso", beta_value = 3, lambda_method = "mc_exact", lambda_alpha = 0.05, lambda_n_simu = 5000L, verbose = FALSE, parallel = FALSE, workers = parallel::detectCores() - 1L )
n |
Integer or integer vector — number of observations per configuration. |
p |
Integer or integer vector of the same length as |
type |
Family name: |
s_max |
Largest sparsity level evaluated. Must satisfy |
m |
Number of Monte Carlo replications per |
penalty |
One or more penalties for |
beta_value |
Magnitude of the non-zero coefficients used to generate
the response. The sign is fixed to |
lambda_method |
Passed to |
lambda_alpha |
Nominal level for the PDB selector. |
lambda_n_simu |
Monte Carlo size for the PDB selector. |
verbose |
Logical; if |
parallel |
Logical; if |
workers |
Integer; number of background R processes to use when
|
At each Monte Carlo replicate a design matrix is drawn from a
standard Gaussian, s features are sampled uniformly at random,
the response is generated under the chosen type, and one pic
fit is run for each requested penalty. The selected support is
compared to the truth and three metrics are stored:
exact_recovery1 if the selected set equals the true set, 0 otherwise.
tpr - true positive rate. For s = 0, this is set to 1 when no true feature exists.
fdr - false discovery rate.
At each replicate, the design is drawn iid
, the true support is sampled
uniformly at random, and beta_value for
, otherwise. The linear predictor
is . The response is then drawn
conditionally on according to the requested family:
"gaussian" with
.
"binomial",
where is the logistic
function.
"poisson".
"exponential".
"gumbel" with
, drawn as
for .
"cox"Event times
and independent censoring times
. The response is the 2-column
matrix .
An object of class c("pic.phase_transition", "pic.diagnostic").
s_grid |
|
exact_recovery, tpr, fdr
|
Matrices of shape |
curve_n, curve_p, curve_penalty
|
Curve descriptors aligned with the rows of the metric matrices. |
config |
A list of all configuration arguments for downstream plotting / reporting. |
call |
The call. |
plot.pic.phase_transition() for visualization.
pt <- phase_transition(n = 50, p = 100, type = "gaussian", s_max = 8, m = 20, penalty = c("lasso", "scad"), parallel = TRUE) plot(pt)pt <- phase_transition(n = 50, p = 100, type = "gaussian", s_max = 8, m = 20, penalty = c("lasso", "scad"), parallel = TRUE) plot(pt)
Fits sparse regression, classification, and survival models built on a linear predictor, using a family-specific loss whose gradient at the null is (asymptotically) pivotal, combined with a sparsity-inducing penalty. Supported families are Gaussian, binomial, Poisson, exponential, Gumbel, and Cox proportional hazards.
pic( X, y, family = c("gaussian", "binomial", "poisson", "exponential", "gumbel", "cox"), penalty = "lasso", standardize = TRUE, intercept = TRUE, lambda_method = c("mc_exact", "mc_gaussian", "analytical"), relax = FALSE, mcp_gamma = 3, scad_a = 3.7, lambda = NULL, lambda_alpha = 0.05, lambda_n_simu = 2000, tol = 1e-05, path_length = 10L, maxit = 1000 )pic( X, y, family = c("gaussian", "binomial", "poisson", "exponential", "gumbel", "cox"), penalty = "lasso", standardize = TRUE, intercept = TRUE, lambda_method = c("mc_exact", "mc_gaussian", "analytical"), relax = FALSE, mcp_gamma = 3, scad_a = 3.7, lambda = NULL, lambda_alpha = 0.05, lambda_n_simu = 2000, tol = 1e-05, path_length = 10L, maxit = 1000 )
X |
Numeric design matrix of shape |
y |
Response variable with length |
family |
A character name determining the response distribution and
the associated loss function used during optimization. See
Details on family-specific losses below for the exact
objective functions associated with each family. Default |
penalty |
Penalty name: one of |
standardize |
Whether to standardize columns of |
intercept |
Whether to fit an unpenalized intercept. Forced to
|
lambda_method |
One of |
relax |
If |
mcp_gamma |
MCP concavity parameter when |
scad_a |
SCAD concavity parameter when |
lambda |
Optional user-supplied regularization parameter. When |
lambda_alpha |
Nominal level for PDB. See |
lambda_n_simu |
Number of Monte Carlo draws for PDB. See |
tol |
FISTA convergence tolerance at the final path step. |
path_length |
Number of points in the warm-start regularization
path running from |
maxit |
Maximum number of iterations for FISTA if convergence is not yet reached. |
Minimises the objective function
where the regularization parameter is selected automatically using
the Pivotal Detection Boundary (PDB) principle implemented in
lambda_pdb(). The PDB choice is not just a substitute for
cross-validation: by calibrating against a pivotal
null-distribution statistic rather than out-of-sample prediction
error, it yields sharper support recovery - fewer false positives
on noise variables and more accurate identification of the true
non-zero coefficients. Here, and are
family-specific transformations chosen so that the gradient at the
null has a distribution free of the nuisance parameter, which is
precisely what makes the use of
valid. Optimization is performed using a warm-started FISTA
regularization path.
"gaussian"Uses the mean squared error for , with
and ,
giving the square-root lasso loss
This makes the gradient at scale-free, so the PDB
threshold needs no estimate of the noise standard deviation.
"binomial"A variance-stabilized transformation of the Bernoulli likelihood.
With and the logistic link
, the loss is
The classical logistic link is preserved; only the loss itself is modified to obtain a pivotal gradient at the null.
"poisson"The same pivotalization applied to the Poisson likelihood. With
and the canonical log link
, the loss is
The canonical log link is kept unchanged; pivotality is obtained through the loss rather than through the link.
"exponential"Uses the standard Exponential negative log-likelihood directly (no
transformation). With and
, the loss is
"gumbel"A location-scale model with extreme-value noise. The base log-likelihood is
With and the identity link
, the objective is
. The scale
is re-estimated internally by maximum likelihood at
every iteration; the user does not supply it.
"cox"A square-root-transformed Cox partial log-likelihood. With
and the identity link
, the objective is
where is the event indicator and the risk
set at event time . The Breslow approximation is used for
tied event times, and the model is always fitted without an intercept.
An object of class c("pic.<family>", "pic").
beta |
Fitted coefficients (length |
intercept |
Fitted intercept or |
df |
The number of nonzero coefficients. |
selected |
Indices of the nonzero coefficients. |
lambda |
|
family |
The family object used. |
penalty |
The penalty object used. |
lambda_pdb |
Result from |
n_samples |
The number of observations in the training set. |
n_features |
The number of variables in the training set. |
For family = "cox", the fit additionally carries:
baseline_cumulative_hazard |
Estimated Breslow baseline cumulative hazard function. |
baseline_survival |
Estimated baseline survival function. |
unique_times |
Sorted unique event times used in the baseline estimation. |
censoring_rate |
Percentage of censored observations in the training set. |
data(QuickStartExample) X <- QuickStartExample$X y <- QuickStartExample$y fit <- pic(X, y, family = "gaussian", penalty = "scad") fit$beta fit$selecteddata(QuickStartExample) X <- QuickStartExample$X y <- QuickStartExample$y fit <- pic(X, y, family = "gaussian", penalty = "scad") fit$beta fit$selected
Each family is a thin descriptor carrying:
name - used by the dispatchers to route to the C++ implementation.
g - mean-function link (object with name and callable fn).
g$fn(eta) is applied at predict time to obtain the mean
response.
phi - variance-stabilizing transform (object with name).
Purely informational on the R side; the actual stabilization
happens inside the C++ math.
Typing fit$family gives a one-glance summary of the model's family
(see print.pic.family()). The actual loss / gradient math lives in C++:
src/family_*.cpp — Gaussian, Binomial, Poisson, Exponential, Gumbel.
src/cox.cpp — Cox.
The fitted object has class c("pic.<family>", "pic") —
methods dispatch on the second class for shared behavior
and on the first class for family-specific overrides.
Three penalties are supported, identified by lowercase name to match
the C++ registry. Each penalty enters the pic() objective as
where depends on the penalty.
"lasso"L1 (soft-thresholding) penalty:
Convex, gives the strongest shrinkage on large coefficients
bias does not vanish as .
"scad" (Smoothly Clipped Absolute Deviation, Fan & Li 2001)Non-convex penalty with concavity parameter scad_a > 2
(default 3.7):
Behaves like the lasso for small , then tapers off so
large coefficients are barely penalized - yields nearly unbiased
estimates on strong signals.
"mcp" (Minimax Concave Penalty, Zhang 2010)Non-convex penalty with concavity parameter mcp_gamma > 1
(default 3.0):
Similar motivation as SCAD but a smoother transition: starts at
the lasso derivative for small and tapers linearly to
zero at .
The actual evaluation and proximal operators live in C++
(src/penalty_*.cpp). Larger scad_a / mcp_gamma make the
penalty closer to the lasso; smaller values amplify the
non-convexity (and the bias reduction on strong signals).
Entry points:
plot(fit) - lollipop plot of the non-zero coefficients.
plot(fit$lambda_pdb) - histogram of the PDB null distribution
with a vertical line at the selected .
plot_baseline() - Cox-only: baseline cumulative hazard and
baseline survival, two panels.
plot(phase_transition(...)) - recovery curves.
plot(pdb_asymptotic(...)) - null-distribution convergence panels.
Provides Harrell's C-index, the Breslow partial log-likelihood, baseline cumulative hazard / survival, and feature-effect curves.
Two-panel step plot for a fitted pic.cox model: the Breslow
baseline cumulative hazard on top and the baseline
survival below.
plot_baseline(model)plot_baseline(model)
model |
A fitted |
Invisibly returns NULL.
Step-line visualization of the output of
predict_survival_function() or feature_effects_on_survival():
one survival curve per subject (or per feature value) on a common
time grid. To keep curves distinguishable, each curve is drawn with
its own line type and a sparse set of marker glyphs is overlaid at
regular time points.
plot_survival_curves( sf, subjects = NULL, max_subjects = 10L, labels = NULL, n_marks = 8L, main = "Individual survival curves", ... )plot_survival_curves( sf, subjects = NULL, max_subjects = 10L, labels = NULL, n_marks = 8L, main = "Individual survival curves", ... )
sf |
A list as returned by |
subjects |
Optional integer vector selecting which columns of
|
max_subjects |
Maximum number of curves drawn when |
labels |
Optional character vector of labels used in the
legend, one per plotted curve. Defaults to the column names of
|
n_marks |
Number of marker glyphs overlaid on each curve to
help distinguish them. Default |
main |
Plot title. |
... |
Additional graphical parameters forwarded to
|
Invisibly returns NULL.
One row per selected variable, sorted by descending absolute coefficient value (largest at the top). Each variable is drawn as a horizontal segment from zero to its fitted value.
## S3 method for class 'pic' plot(x, standardized = TRUE, max_features = NULL, ...)## S3 method for class 'pic' plot(x, standardized = TRUE, max_features = NULL, ...)
x |
A fitted |
standardized |
Logical; if |
max_features |
Optional cap on the number of features displayed (the strongest are kept). |
... |
Additional graphical parameters forwarded to
|
Invisibly returns the plotted (named) coefficient vector.
Histogram of the simulated null gradient-norm statistics, with a
vertical line at the selected .
## S3 method for class 'pic.lambda_pdb' plot(x, breaks = 40L, ...)## S3 method for class 'pic.lambda_pdb' plot(x, breaks = 40L, ...)
x |
A |
breaks |
Number of histogram bins (default 40). |
... |
Unused; present for S3 method consistency. |
Invisibly returns x.
Multi-panel histogram comparison of the simulated null gradient-norm
statistic under the "mc_exact" (light grey fill) and "mc_gaussian"
(dashed outline) selectors, one panel per n in n_grid. Two
vertical lines are added per panel:
- solid navy, the empirical (1 - alpha)
quantile of "mc_exact" (what pic would actually use).
- dashed black, the Bonferroni
closed-form bound.
## S3 method for class 'pic.pdb_asymptotic' plot(x, breaks = 40L, ...)## S3 method for class 'pic.pdb_asymptotic' plot(x, breaks = 40L, ...)
x |
An object returned by |
breaks |
Number of histogram bins (default 40). |
... |
Unused; present for S3 method consistency. |
Invisibly returns x.
pic.phase_transition object.Plots the chosen recovery metric as a function of the sparsity
level s. Curves are distinguished by line type (grayscale ramp
beyond five curves). When multiple (n, p) configurations are
compared across several penalties, panels are laid out in a grid
with at most three columns per row.
## S3 method for class 'pic.phase_transition' plot(x, metric = c("exact_recovery", "tpr", "fdr"), ...)## S3 method for class 'pic.phase_transition' plot(x, metric = c("exact_recovery", "tpr", "fdr"), ...)
x |
An object returned by |
metric |
One of |
... |
Additional graphical parameters forwarded to |
Invisibly returns x.
Survival curves for new data from a fitted Cox pic model.
predict_survival_function(object, newx)predict_survival_function(object, newx)
object |
A fitted |
newx |
New design matrix (rows = subjects). |
A list with time (length K) and survival (matrix K x m,
one column per subject).
Linear predictor / response prediction for a pic fit.
## S3 method for class 'pic' predict(object, newx, type = c("response", "link", "class"), ...)## S3 method for class 'pic' predict(object, newx, type = c("response", "link", "class"), ...)
object |
A fitted |
newx |
Matrix of new values at which predictions are to be made. |
type |
|
... |
Unused; present for S3 method consistency. |
A numeric vector.
Three-line summary showing the family name and its (g, phi) link pair.
## S3 method for class 'pic.family' print(x, ...)## S3 method for class 'pic.family' print(x, ...)
x |
A |
... |
Ignored. |
Invisibly returns x.
Print PDB asymptotic diagnostic.
## S3 method for class 'pic.pdb_asymptotic' print(x, ...)## S3 method for class 'pic.pdb_asymptotic' print(x, ...)
x |
A |
... |
Unused; present for S3 method consistency. |
Invisibly returns x.
Print phase-transition analysis.
## S3 method for class 'pic.phase_transition' print(x, ...)## S3 method for class 'pic.phase_transition' print(x, ...)
x |
A |
... |
Unused; present for S3 method consistency. |
Invisibly returns x.
summary.pic object.Print a summary.pic object.
## S3 method for class 'summary.pic' print(x, digits = 4L, ...)## S3 method for class 'summary.pic' print(x, digits = 4L, ...)
x |
A |
digits |
Number of significant digits for the printed values. |
... |
Unused; present for S3 method consistency. |
Invisibly returns x.
A synthetic Gaussian regression dataset used to illustrate pic()
throughout the introductory vignette. It contains
observations of predictors, of which only
carry signal; the remaining 25 are pure noise.
QuickStartExampleQuickStartExample
A list with two components:
XNumeric matrix of dimension
with column names gene_* and noise_*.
yNumeric vector of length .
Column names are chosen to make the underlying support obvious at a glance:
gene_1, ..., gene_5: the five active variables, whose
non-zero coefficients are drawn uniformly in
with random sign.
noise_1, ..., noise_25: the remaining inactive
variables, with true coefficient .
The columns are interleaved in random order; column names are the only indicator of which features are part of the true support.
The response is generated as with
.
data(QuickStartExample) fit <- pic(QuickStartExample$X, QuickStartExample$y) fit$selecteddata(QuickStartExample) fit <- pic(QuickStartExample$X, QuickStartExample$y) fit$selected
Returns a structured summary of the fit: the family, penalty, selected
, the problem dimensions, the number of selected
variables, the intercept, and a table of the non-zero coefficients
ordered by decreasing absolute value. Coefficients are returned on the
original scale of X by default (pass standardized = TRUE for the
internal standardized scale, matching fit$beta).
## S3 method for class 'pic' summary(object, standardized = FALSE, ...)## S3 method for class 'pic' summary(object, standardized = FALSE, ...)
object |
A fitted |
standardized |
Logical; if |
... |
Unused; present for S3 method consistency. |
An object of class "summary.pic": a list with elements
family, penalty, lambda, n_samples, n_features, df,
intercept, standardized, and coefficients (a two-column
data.frame of the non-zero coefficients).
data(QuickStartExample) fit <- pic(QuickStartExample$X, QuickStartExample$y) summary(fit)data(QuickStartExample) fit <- pic(QuickStartExample$X, QuickStartExample$y) summary(fit)