biglm.big.matrix, bigglm.big.matrix {bigmemory}R Documentation

Use Thomas Lumley's “biglm” package with a “big.matrix”

Description

This is a wrapper to Thomas Lumley's biglm package, allowing its use with data stored in big.matrix objects.

Usage

biglm.big.matrix(formula, data, fc=NULL, chunksize=NULL, weights=NULL, sandwich=FALSE)
bigglm.big.matrix(formula, data, family=gaussian(), fc=NULL, chunksize=NULL,
          weights=NULL, sandwich=FALSE, maxit=8, tolerance=1e-7, start=NULL)

Arguments

formula a model formula.
data a big.matrix.
fc either column indices or names of variables that are factors.
chunksize an integer maximum size of chunks of data to process iteratively.
weights a one-sided, single term formula specifying weights (see biglm for more information).
sandwich TRUE to compute the Huber/White sandwich covariance matrix (see biglm for more information).
family a glm family object
maxit maximum number of Fisher scoring iterations.
tolerance tolerance for change in coefficient (as multiple of standard error).
start optional starting values for coefficients. If NULL, maxit should be at least 2 as some quantities will not be computed on the first iteration.

Details

See Thomas Lumley's biglm package for more information; chunksize defaults to
floor(nrow(data)/ncol(data)^2).

Value

an object of class biglm.

Author(s)

Michael J. Kane

References

Algorithm AS274 Applied Statistics (1992) Vol. 41, No.2

Thomas Lumley (2005). biglm: bounded memory linear and generalized linear models. R package version 0.4.

See Also

biglm, big.matrix

Examples

# This example is quite silly, using the iris
# data.  But it shows that our wrapper to Lumley's biglm() function produces
# the same answer as the plain old lm() function.

## Not run: 
x <- matrix(unlist(iris), ncol=5)
colnames(x) <- names(iris)
x <- as.big.matrix(x)
head(x)

silly.biglm <- biglm.big.matrix(Sepal.Length ~ Sepal.Width + Species, data=x, fc="Species")
summary(silly.biglm)

y <- data.frame(x[,])
y$Species <- as.factor(y$Species)
head(y)

silly.lm <- lm(Sepal.Length ~ Sepal.Width + Species, data=y)
summary(silly.lm)
## End(Not run)

[Package bigmemory version 2.3 Index]