Skip to Content

Instrukcja korzystania z Biblioteki


Ukryty Internet | Wyszukiwarki specjalistyczne tekstów i źródeł naukowych | Translatory online | Encyklopedie i słowniki online


Astronomia Astrofizyka

Sztuka dawna i współczesna, muzea i kolekcje

Metodologia nauk, Matematyka, Filozofia, Miary i wagi, Pomiary

Substancje, reakcje, energia
Fizyka, chemia i inżynieria materiałowa

Antropologia kulturowa Socjologia Psychologia Zdrowie i medycyna

Przewidywania Kosmologia Religie Ideologia Polityka

Geologia, geofizyka, geochemia, środowisko przyrodnicze

Biologia, biologia molekularna i genetyka

Technologia cyberprzestrzeni, cyberkultura, media i komunikacja

Wiadomości | Gospodarka, biznes, zarządzanie, ekonomia

Budownictwo, energetyka, transport, wytwarzanie, technologie informacyjne

Journal of Statistical Software

Vol. 40, Issue 10, Apr 2011Abstract: In this paper, we present an R package that combines feature-based (X) data and graph-based (G) data for prediction of the response Y . In this particular case, Y is observed for a subset of the observations (labeled) and missing for the remainder (unlabeled).
We examine an approach for fitting Y = Xβ + f(G) where β is a coefficient vector and f is a function over the vertices of the graph. The procedure is semi-supervised in nature (trained on the labeled and unlabeled sets), requiring iterative algorithms for fitting this estimate. The package provides several key functions for fitting and evaluating an estimator of this type. The package is illustrated on a text analysis data set, where the observations are text documents (papers), the response is the category of paper (either applied or theoretical statistics), the X information is the name of the journal in which the paper resides, and the graph is a co-citation network, with each vertex an observation and each edge the number of times that the two papers cite a common paper. An application involving classification of protein location using a protein interaction graph and an application involving classification on a manifold with part of the feature data converted to a graph are also presented. 2011/04/28 - 11:55

Vol. 40, Book Review 3, Apr 2011R CookbookPaul TeetorO’Reilly, 2011ISBN: 978-0-569-80915-7 2011/04/28 - 11:55

Vol. 40, Issue 9, Apr 2011Abstract: We introduce an R package SPECIES for species richness or diversity estimation. This package provides simple R functions to compute point and confidence interval estimates of species number from a few nonparametric and semi-parametric methods. For the methods based on nonparametric maximum likelihood estimation, the R functions are wrappers for Fortran codes for better efficiency. All functions in this package are illustrated using real data sets. 2011/04/26 - 01:39

Vol. 40, Book Review 2, Apr 2011Exploratory Multivariate Analysis by Example Using RFrançois Husson, Sébastien Lê, Jérôme PagèsChapman & Hall/CRC Press, 2011ISBN: 978-1439835807 2011/04/21 - 01:28

Vol. 40, Issue 8, Apr 2011Abstract: The Rcpp package simplifies integrating C++ code with R. It provides a consistent C++ class hierarchy that maps various types of R objects (vectors, matrices, functions, environments, . . . ) to dedicated C++ classes. Object interchange between R and C++ is managed by simple, flexible and extensible concepts which include broad support for C++ Standard Template Library idioms. C++ code can both be compiled, linked and loaded on the fly, or added via packages. Flexible error and exception code handling is provided. Rcpp substantially lowers the barrier for programmers wanting to combine C++ code with R. 2011/04/15 - 02:44

Vol. 40, Issue 7, Apr 2011Abstract: The wgaim (whole genome average interval mapping) package developed in the R system for statistical computing (R Development Core Team 2011) builds on linear mixed modelling techniques by incorporating a whole genome approach to detecting significant quantitative trait loci (QTL) in bi-parental populations. Much of the sophistication is inherited through the well established linear mixed modelling package ASReml-R (Butler et al. 2009). As wgaim uses an extension of interval mapping to incorporate the whole genome into the analysis, functions are provided which allow conversion of genetic data objects created with the qtl package of Broman and Wu (2010) available in R. Results of QTL analyses are available using summary and print methods as well as diagnostic summaries of the selection method. In addition, the package features a flexible linkage map plotting function that can be easily manipulated to provide an aesthetic viewable genetic map. As a visual summary, QTL obtained from one or more models can also be added to the linkage map. 2011/04/11 - 10:28

Vol. 40, Issue 6, Apr 2011Abstract: This article describes the R package DEoptim, which implements the differential evolution algorithm for global optimization of a real-valued function of a real-valued parameter vector. The implementation of differential evolution in DEoptim interfaces with C code for efficiency. The utility of the package is illustrated by case studies in fitting a Parratt model for X-ray reflectometry data and a Markov-switching generalized autoregressive conditional heteroskedasticity model for the returns of the Swiss Market Index. 2011/04/11 - 10:28

Vol. 40, Issue 5, Apr 2011Abstract: Data analysis sometimes requires the relaxation of parametric assumptions in order to gain modeling flexibility and robustness against mis-specification of the probability model. In the Bayesian context, this is accomplished by placing a prior distribution on a function space, such as the space of all probability distributions or the space of all regression functions. Unfortunately, posterior distributions ranging over function spaces are highly complex and hence sampling methods play a key role. This paper provides an introduction to a simple, yet comprehensive, set of programs for the implementation of some Bayesian nonparametric and semiparametric models in R, DPpackage. Currently, DPpackage includes models for marginal and conditional density estimation, receiver operating characteristic curve analysis, interval-censored data, binary regression data, item response data, longitudinal and clustered data using generalized linear mixed models, and regression data using generalized additive models. The package also contains functions to compute pseudo-Bayes factors for model comparison and for eliciting the precision parameter of the Dirichlet process prior, and a general purpose Metropolis sampling algorithm. To maximize computational efficiency, the actual sampling for each model is carried out using compiled C, C++ or Fortran code. 2011/04/11 - 10:28

Vol. 40, Issue 4, Apr 2011Abstract: This article describes the many capabilities offered by the TraMineR toolbox for categorical sequence data. It focuses more specifically on the analysis and rendering of state sequences. Addressed features include the description of sets of sequences by means of transversal aggregated views, the computation of longitudinal characteristics of individual sequences and the measure of pairwise dissimilarities. Special emphasis is put on the multiple ways of visualizing sequences. The core element of the package is the state se- quence object in which we store the set of sequences together with attributes such as the alphabet, state labels and the color palette. The functions can then easily retrieve this information to ensure presentation homogeneity across all printed and graphical displays. The article also demonstrates how TraMineR’s outcomes give access to advanced analyses such as clustering and statistical modeling of sequence data. 2011/04/11 - 10:28

Vol. 40, Issue 3, Apr 2011Abstract: This paper presents the lubridate package for R, which facilitates working with dates and times. Date-times create various technical problems for the data analyst. The paper highlights these problems and offers practical advice on how to solve them using lubridate. The paper also introduces a conceptual framework for arithmetic with date-times in R. 2011/04/11 - 10:28

Vol. 40, Issue 2, Apr 2011Abstract: This paper describes an R package which produces tours of multivariate data. The package includes functions for creating different types of tours, including grand, guided, and little tours, which project multivariate data (p-D) down to 1, 2, 3, or, more generally, d (≤ p) dimensions. The projected data can be rendered as densities or histograms, scatterplots, anaglyphs, glyphs, scatterplot matrices, parallel coordinate plots, time series or images, and viewed using an R graphics device, passed to GGobi, or saved to disk. A tour path can be stored for visualisation or replay. With this package it is possible to quickly experiment with different, and new, approaches to tours of data. This paper contains animations that can be viewed using the Adobe Acrobat PDF viewer. 2011/04/11 - 10:28

Vol. 40, Issue 1, Apr 2011Abstract: Many data analysis problems involve the application of a split-apply-combine strategy, where you break up a big problem into manageable pieces, operate on each piece independently and then put all the pieces back together. This insight gives rise to a new R package that allows you to smoothly apply this strategy, without having to worry about the type of structure in which your data is stored.

The paper includes two case studies showing how these insights make it easier to work with batting records for veteran baseball players and a large 3d array of spatio-temporal ozone measurements. 2011/04/11 - 10:28

Vol. 40, Book Review 1, Apr 2011R in a NutshellJoseph AdlerO'Reilly, 2009ISBN: 978-0-596-80170-0 2011/04/11 - 10:28

Vol. 39, Issue 13, Mar 2011Abstract: The R package HGLMMM has been developed to fit generalized linear models with random effects using the h-likelihood approach. The response variable is allowed to follow a binomial, Poisson, Gaussian or gamma distribution. The distribution of random effects can be specified as Gaussian, gamma, inverse-gamma or beta. Complex structures as multi-membership design or multilevel designs can be handled. Further, dispersion parameters of random components and the residual dispersion (overdispersion) can be modeled as a function of covariates. Overdispersion parameter can be fixed or estimated. Fixed effects in the mean structure can be estimated using extended likelihood or a first order Laplace approximation to the marginal likelihood. Dispersion parameters are estimated using first order adjusted profile likelihood. 2011/03/10 - 14:44

Vol. 39, Issue 12, Mar 2011Abstract: In this paper we elaborate on the potential of the lmer function from the lme4 package in R for item response (IRT) modeling. In line with the package, an IRT framework is described based on generalized linear mixed modeling. The aspects of the framework refer to (a) the kind of covariates -- their mode (person, item, person-by-item), and their being external vs. internal to responses, and (b) the kind of effects the covariates have -- fixed vs. random, and if random, the mode across which the effects are random (persons, items). Based on this framework, three broad categories of models are described: Item covariate models, person covariate models, and person-by-item covariate models, and within each category three types of more specific models are discussed. The models in question are explained and the associated lmer code is given. Examples of models are the linear logistic test model with an error term, differential item functioning models, and local item dependency models. Because the lme4 package is for univariate generalized linear mixed models, neither the two-parameter, and three-parameter models, nor the item response models for polytomous response data, can be estimated with the lmer function. 2011/03/10 - 14:44

Vol. 39, Issue 11, Mar 2011Abstract: We propose an algorithm to compute the cumulative distribution function of the two-sided Kolmogorov-Smirnov test statistic Dn and its complementary distribution in a fast and reliable way. Different approximations are used in different regions of n, x. Java and C programs are available. 2011/03/10 - 14:44

Vol. 39, Issue 10, Mar 2011Abstract: Data from many scientific areas often come with measurement error. Density or distribution function estimation from contaminated data and nonparametric regression with errors in variables are two important topics in measurement error models. In this paper, we present a new software package decon for R, which contains a collection of functions that use the deconvolution kernel methods to deal with the measurement error problems. The functions allow the errors to be either homoscedastic or heteroscedastic. To make the deconvolution estimators computationally more efficient in R, we adapt the fast Fourier transform algorithm for density estimation with error-free data to the deconvolution kernel estimation. We discuss the practical selection of the smoothing parameter in deconvolution methods and illustrate the use of the package through both simulated and real examples. 2011/03/10 - 14:44

Vol. 39, Issue 9, Mar 2011Abstract: The package nacopula provides procedures for constructing nested Archimedean copulas in any dimensions and with any kind of nesting structure, generating vectors of random variates from the constructed objects, computing function values and probabilities of falling into hypercubes, as well as evaluation of characteristics such as Kendall's tau and the tail-dependence coefficients. As by-products, algorithms for various distributions, including exponentially tilted stable and Sibuya distributions, are implemented. Detailed examples are given. 2011/03/10 - 14:44

Vol. 39, Issue 8, Mar 2011Abstract: Logistic regression provides a flexible framework for detecting various types of differential item functioning (DIF). Previous efforts extended the framework by using item response theory (IRT) based trait scores, and by employing an iterative process using group--specific item parameters to account for DIF in the trait scores, analogous to purification approaches used in other DIF detection frameworks. The current investigation advances the technique by developing a computational platform integrating both statistical and IRT procedures into a single program. Furthermore, a Monte Carlo simulation approach was incorporated to derive empirical criteria for various DIF statistics and effect size measures. For purposes of illustration, the procedure was applied to data from a questionnaire of anxiety symptoms for detecting DIF associated with age from the Patient--Reported Outcomes Measurement Information System. 2011/03/10 - 14:44

Vol. 39, Issue 7, Mar 2011Abstract: In order to facilitate teaching complex topics in an interactive way, the authors developed a computer-assisted teaching system, a graphical user interface named TGUI (Teaching Graphical User Interface). TGUI was introduced at the beginning of 2009 in the Austrian Journal of Statistics (Dinges and Templ 2009) as being an effective instrument to train and teach staff on mathematical and statistical topics. While the fundamental principles were retained, the current TGUI system has been undergone a complete redesign. The ultimate goal behind the reimplementation was to share the advantages of TGUI and provide teachers and people who need to hold training courses with a strong tool that can enrich their lectures with interactive features. The idea was to go a step beyond the current modular blended-learning systems (see, e.g., Da Rin 2003) or the related teaching techniques of classroom-voting (see, e.g., Cline 2006). In this paper the authors have attempted to exemplify basic idea and concept of TGUI by means of statistics seminars held at Statistics Austria. The powerful open source software R (R Development Core Team 2010a) is the backend for TGUI, which can therefore be used to process even complex statistical contents. However, with specifically created contents the interactive TGUI system can be used to support a wide range of courses and topics. The open source R packages TGUICore and TGUITeaching are freely available from the Comprehensive R Archive Network at 2011/03/10 - 14:44

Vol. 39, Issue 6, Mar 2011Abstract: Maximum likelihood estimation of a log-concave density has attracted considerable attention over the last few years. Several algorithms have been proposed to estimate such a density. Two of those algorithms, an iterative convex minorant and an active set algorithm, are implemented in the R package logcondens. While these algorithms are discussed elsewhere, we describe in this paper the use of the logcondens package and discuss functions and datasets related to log-concave density estimation contained in the package. In particular, we provide functions to (1) compute the maximum likelihood estimate (MLE) as well as a smoothed log-concave density estimator derived from the MLE, (2) evaluate the estimated density, distribution and quantile functions at arbitrary points, (3) compute the characterizing functions of the MLE, (4) sample from the estimated distribution, and finally (5) perform a two-sample permutation test using a modified Kolmogorov-Smirnov test statistic. In addition, logcondens makes two datasets available that have been used to illustrate log-concave density estimation. 2011/03/10 - 14:44

Vol. 39, Issue 5, Mar 2011Abstract: We introduce a pathwise algorithm for the Cox proportional hazards model, regularized by convex combinations of l1 and l2 penalties (elastic net). Our algorithm fits via cyclical coordinate descent, and employs warm starts to find a solution along a regularization path. We demonstrate the efficacy of our algorithm on real and simulated data sets, and find considerable speedup between our algorithm and competing methods. 2011/03/10 - 14:44

Vol. 39, Issue 4, Mar 2011Abstract: This paper describes the R package mhsmm which implements estimation and prediction methods for hidden Markov and semi-Markov models for multiple observation sequences. Such techniques are of interest when observed data is thought to be dependent on some unobserved (or hidden) state. Hidden Markov models only allow a geometrically distributed sojourn time in a given state, while hidden semi-Markov models extend this by allowing an arbitrary sojourn distribution. We demonstrate the software with simulation examples and an application involving the modelling of the ovarian cycle of dairy cows. 2011/03/10 - 14:44

Vol. 39, Code Snippet 2, Mar 2011 2011/03/01 - 19:33

Vol. 39, Book Review 1, Mar 2011Statistical Methods in e-Commerce ResearchWolfgang Jank, Galit ShmueliJohn Wiley & Sons, 2008ISBN: 978-0-470-1202-5 2011/03/01 - 19:33

Vol. 39, Issue 3, Mar 2011Abstract: We introduce a new mlab software package that implements several recently proposed likelihood-based methods for sufficient dimension reduction. Current capabilities include estimation of reduced subspaces with a fixed dimension d, as well as estimation of d by use of likelihood-ratio testing, permutation testing and information criteria. The methods are suitable for preprocessing data for both regression and classification. Implementations of related estimators are also available. Although the software is more oriented to command-line operation, a graphical user interface is also provided for prototype computations. 2011/03/01 - 19:33

Vol. 39, Issue 2, Mar 2011Abstract: Support in R for state space estimation via Kalman filtering was limited to one package, until fairly recently. In the last five years, the situation has changed with no less than four additional packages offering general implementations of the Kalman filter, including in some cases smoothing, simulation smoothing and other functionality. This paper reviews some of the offerings in R to help the prospective user to make an informed choice. 2011/02/28 - 22:32

Vol. 39, Issue 1, Mar 2011Abstract: The estimation of kernel-smoothed relative risk functions is a useful approach to examining the spatial variation of disease risk. Though there exist several options for performing kernel density estimation in statistical software packages, there have been very few contributions to date that have focused on estimation of a relative risk function per se. Use of a variable or adaptive smoothing parameter for estimation of the individual densities has been shown to provide additional benefits in estimating relative risk and specific computational tools for this approach are essentially absent. Furthermore, little attention has been given to providing methods in available software for any kind of subsequent analysis with respect to an estimated risk function. To facilitate analyses in the field, the R package sparr is introduced, providing the ability to construct both fixed and adaptive kernel-smoothed densities and risk functions, identify statistically significant fluctuations in an estimated risk function through the use of asymptotic tolerance contours, and visualize these objects in flexible and attractive ways. 2011/02/28 - 22:32

Vol. 38, Issue 8, Jan 2011Abstract: Panel data are observations of a continuous-time process at arbitrary times, for example, visits to a hospital to diagnose disease status. Multi-state models for such data are generally based on the Markov assumption. This article reviews the range of Markov models and their extensions which can be fitted to panel-observed data, and their implementation in the msm package for R. Transition intensities may vary between individuals, or with piecewise-constant time-dependent covariates, giving an inhomogeneous Markov model. Hidden Markov models can be used for multi-state processes which are misclassified or observed only through a noisy marker. The package is intended to be straightforward to use, flexible and comprehensively documented. Worked examples are given of the use of msm to model chronic disease progression and screening. Assessment of model fit, and potential future developments of the software, are also discussed. 2011/01/23 - 05:09

Vol. 38, Issue 7, Jan 2011Abstract: Multi-state models are a very useful tool to answer a wide range of questions in survival analysis that cannot, or only in a more complicated way, be answered by classical models. They are suitable for both biomedical and other applications in which time-to-event variables are analyzed. However, they are still not frequently applied. So far, an important reason for this has been the lack of available software. To overcome this problem, we have developed the mstate package in R for the analysis of multi-state models. The package covers all steps of the analysis of multi-state models, from model building and data preparation to estimation and graphical representation of the results. It can be applied to non- and semi-parametric (Cox) models. The package is also suitable for competing risks models, as they are a special category of multi-state models.

This article offers guidelines for the actual use of the software by means of an elaborate multi-state analysis of data describing post-transplant events of patients with blood cancer. The data have been provided by the EBMT (the European Group for Blood and Marrow Transplantation). Special attention will be paid to the modeling of different covariate effects (the same for all transitions or transition-specific) and different baseline hazard assumptions (different for all transitions or equal for some). 2011/01/23 - 05:09

Vol. 38, Issue 6, Jan 2011Abstract: The Lexis class in the R package Epi provides tools for creation, manipulation and display of data from multi-state models. Transitions between states are described by rates (intensities); Lexis objects represent this kind of data and provide tools to show states and transitions annotated by relevant summary numbers. Data can be transformed to a form that allows modelling of several transition rates with common parameters. 2011/01/23 - 05:09

Vol. 38, Issue 5, Jan 2011Abstract: The Lexis class in the R package Epi provides an object-based framework for managing follow-up time on multiple time scales, which is an important feature of prospective epidemiological studies with long duration. Follow-up time may be split either into fixed time bands, or on individual event times and the split data may be used in Poisson regression models that account for the evolution of disease risk on multiple time scales. The summary and plot methods for Lexis objects allow inspection of the follow-up times. 2011/01/23 - 05:09

Vol. 38, Issue 4, Jan 2011Abstract: Multi-State models provide a relevant framework for modelling complex event histories. Quantities of interest are the transition probabilities that can be estimated by the empirical transition matrix, that is also referred to as the Aalen-Johansen estimator. In this paper, we present the R package etm that computes and displays the transition probabilities. etm also features a Greenwood-type estimator of the covariance matrix. The use of the package is illustrated through a prominent example in bone marrow transplant for leukaemia patients. 2011/01/23 - 05:09

Vol. 38, Issue 3, Jan 2011Abstract: In longitudinal studies of disease, patients can experience several events across a followup period. Analysis of such studies can be successfully performed by multi-state models. In the multi-state framework, issues of interest include the study of the relationship between covariates and disease evolution, estimation of transition probabilities, and survival rates. This paper introduces p3state.msm, a software application for R which performs inference in an illness-death model. It describes the capabilities of the program for estimating semi-parametric regression models and for implementing nonparametric estimators for several quantities. The main feature of the package is its ability for obtaining nonMarkov estimates for the transition probabilities. Moreover, the methods can also be used in progressive three-state models. In such a model, estimators for other quantities, such as the bivariate distribution function (for sequentially ordered events), are also given. The software is illustrated using data from the Stanford Heart Transplant Study. 2011/01/23 - 05:09

Vol. 38, Issue 2, Jan 2011Abstract: In this paper we describe flexible competing risks regression models using the comp.risk() function available in the timereg package for R based on Scheike et al. (2008). Regression models are specified for the transition probabilities, that is the cumulative incidence in the competing risks setting. The model contains the Fine and Gray (1999) model as a special case. This can be used to do goodness-of-fit test for the subdistribution hazards’ proportionality assumption (Scheike and Zhang 2008). The program can also construct confidence bands for predicted cumulative incidence curves.

We apply the methods to data on follicular cell lymphoma from Pintilie (2007), where the competing risks are disease relapse and death without relapse. There is important non-proportionality present in the data, and it is demonstrated how one can analyze these data using the flexible regression models. 2011/01/23 - 05:09

Vol. 38, Issue 1, Jan 2011Abstract: There is a clear growing interest, at least in the statistical literature, in competing risks and multi-state models. With the rising interest in competing risks and multi-state models a number of software packages have been developed for the analysis of such models. The present special issue of the Journal of Statistical Software introduces a selection of R packages devoted to competing risks and multi-state models. This introduction to the special issue contains some background and highlights the contents of the contributions. 2011/01/23 - 05:09

Vol. 37, Book Review 3, Dec 2010SAS and RKen Kleinman and Nicholas J. HortonChapman & Hall/CRC, 2010ISBN: 978-1-4200-7057-6 2011/01/23 - 05:09

Vol. 37, Book Review 2, Dec 2010Design and Analysis of Experiments with SASJohn LawsonChapman & Hall/CRC, 2010ISBN: 978-1-4200-6060-7 2011/01/23 - 05:09

Vol. 37, Issue 8, Dec 2010Abstract: Graphical user interfaces (GUIs) are growing in popularity as a complement or alternative to the traditional command line interfaces to R. RGtk2 is an R package for creating GUIs in R. The package provides programmatic access to GTK+ 2.0, an open-source GUI toolkit written in C. To construct a GUI, the R programmer calls RGtk2 functions that map to functions in the underlying GTK+ library. This paper introduces the basic concepts underlying GTK+ and explains how to use RGtk2 to construct GUIs from R. The tutorial is based on simple and pratical programming examples. We also provide more complex examples illustrating the advanced features of the package. The design of the RGtk2 API and the low-level interface from R to GTK+ are discussed at length. We compare RGtk2 to alternative GUI toolkits for R. 2011/01/23 - 05:09