Skip to Content

Instrukcja korzystania z Biblioteki


Ukryty Internet | Wyszukiwarki specjalistyczne tekstów i źródeł naukowych | Translatory online | Encyklopedie i słowniki online


Astronomia Astrofizyka

Sztuka dawna i współczesna, muzea i kolekcje

Metodologia nauk, Matematyka, Filozofia, Miary i wagi, Pomiary

Substancje, reakcje, energia
Fizyka, chemia i inżynieria materiałowa

Antropologia kulturowa Socjologia Psychologia Zdrowie i medycyna

Przewidywania Kosmologia Religie Ideologia Polityka

Geologia, geofizyka, geochemia, środowisko przyrodnicze

Biologia, biologia molekularna i genetyka

Technologia cyberprzestrzeni, cyberkultura, media i komunikacja

Wiadomości | Gospodarka, biznes, zarządzanie, ekonomia

Budownictwo, energetyka, transport, wytwarzanie, technologie informacyjne

Journal of Statistical Software

Vol. 49, Issue 10, Jun 2012Abstract: The gWidgetsWWW package provides a framework for easily developing interactive web pages from within R. It uses the API developed in the gWidgets programming interface to specify the layout of the controls and the relationships between them. The web pages may be served locally under R's built-in web server for help pages or from an rApache-enabled web server. 2012/07/20 - 22:49

Vol. 49, Issue 9, Jun 2012Abstract: R is a free open-source implementation of the S statistical computing language and programming environment. The current status of R is a command line driven interface with no advanced cross-platform graphical user interface (GUI), but it includes tools for building such. Over the past years, proprietary and non-proprietary GUI solutions have emerged, based on internal or external tool kits, with different scopes and technological concepts. For example, Rgui.exe and have become the de facto GUI on the Microsoft Windows and Mac OS X platforms, respectively, for most users. In this paper we discuss RKWard which aims to be both a comprehensive GUI and an integrated development environment for R. RKWard is based on the KDE software libraries. Statistical procedures and plots are implemented using an extendable plugin architecture based on ECMAScript (JavaScript), R, and XML. RKWard provides an excellent tool to manage different types of data objects; even allowing for seamless editing of certain types. The objective of RKWard is to provide a portable and extensible R interface for both basic and advanced statistical and graphical analysis, while not compromising on flexibility and modularity of the R programming environment itself. 2012/07/20 - 22:49

Vol. 49, Issue 8, Jun 2012Abstract: While R has proven itself to be a powerful and flexible tool for data exploration and analysis, it lacks the ease of use present in other software such as SPSS and Minitab. An easy to use graphical user interface (GUI) can help new users accomplish tasks that would otherwise be out of their reach, and improves the efficiency of expert users by replacing fifty key strokes with five mouse clicks. With this in mind, Deducer presents dialogs that are understandable for the beginner, and yet contain all (or most) of the options that an experienced statistician, performing the same task, would want. An Excel-like spreadsheet is included for easy data viewing and editing. Deducer is based on Java's Swing GUI library and can be used on any common operating system. The GUI is independent of the specific R console and can easily be used by calling a text-based menu system. Graphical menus are provided for the JGR console and the Windows R GUI. 2012/07/20 - 22:49

Vol. 49, Issue 7, Jun 2012Abstract: The R Commander graphical user interface to R is extensible via plug-in packages, which integrate seamlessly with the R Commander's menu structure, data, and model handling. The paper describes the RcmdrPlugin.survival package, which makes many of the facilities of the survival package for R available through the R Commander, including Cox and parametric survival models. We explain the structure, capabilities, and limitations of this plug-in package and illustrate its use. 2012/07/20 - 22:49

Vol. 49, Issue 6, Jun 2012Abstract: This paper describes a graphical user interface (GUI) for the tourr package in R. The tour is a dynamic graphical method for viewing multivariate data. The GUI allows users to interact with a tour in order to explore the data for structures like clustering, outliers, nonlinear dependence. Users can pause the tour, choose a subset of variables, color points by other variables, and switch between several different types of tours. 2012/07/20 - 22:49

Vol. 49, Issue 5, Jun 2012Abstract: The R environment provides a natural platform for developing new statistical methods due to the mathematical expressiveness of the language, the large number of existing libraries, and the active developer community. One drawback to R, however, is the learning curve; programming is a deterrent to non-technical users, who typically prefer graphical user interfaces (GUIs) to command line environments. Thus, while statisticians develop new methods in R, practitioners are often behind in terms of the statistical techniques they use as they rely on GUI applications. Meta-analysis is an instructive example; cutting-edge meta-analysis methods are often ignored by the overwhelming majority of practitioners, in part because they have no easy way of applying them. This paper proposes a strategy to close the gap between the statistical state-of-the-science and what is applied in practice. We present open-source meta-analysis software that uses R as the underlying statistical engine, and Python for the GUI. We present a framework that allows methodologists to implement new methods in R that are then automatically integrated into the GUI for use by end-users, so long as the programmer conforms to our interface. Such an approach allows an intuitive interface for non-technical users while leveraging the latest advanced statistical methods implemented by methodologists. 2012/07/20 - 22:49

Vol. 49, Issue 4, Jun 2012Abstract: For many ecological analyses powerful statistical tools are required for a profound analysis of spatial and time based data sets. In order to avoid many common errors of analysis and data acquisition a graphical user interface can help to focus on the task of the analysis and minimize the time to fulfill certain tasks in a programming language like R. In this paper we present a graphical user interface for R embedded in the ecological modeling software Bio7 which is based on an Eclipse rich client platform. We demonstrate that within the Bio7 platform R can not only be effectively combined with Java but also with the powerful components of Eclipse. Additionally we present some custom Bio7 components which interact with R and make use of some useful advanced concepts and libraries of this connection. Our overview on the Bio7 R interface also emphasizes a broad applicability for disciplines beyond ecological modeling. 2012/07/20 - 22:49

Vol. 49, Issue 3, Jun 2012Abstract: In this work the software application called Glotaran is introduced as a Java-based graphical user interface to the R package TIMP, a problem solving environment for fitting superposition models to multi-dimensional data. TIMP uses a command-line user interface for the interaction with data, the specification of models and viewing of analysis results. Instead, Glotaran provides a graphical user interface which features interactive and dynamic data inspection, easier -- assisted by the user interface -- model specification and interactive viewing of results. The interactivity component is especially helpful when working with large, multi-dimensional datasets as often result from time-resolved spectroscopy measurements, allowing the user to easily pre-select and manipulate data before analysis and to quickly zoom in to regions of interest in the analysis results. Glotaran has been developed on top of the NetBeans rich client platform and communicates with R through the Java-to-R interface Rserve. The background and the functionality of the application are described here. In addition, the design, development and implementation process of Glotaran is documented in a generic way. 2012/07/20 - 22:49

Vol. 49, Issue 2, Jun 2012Abstract: Degradation models are widely used to assess the lifetime information for highly reliable products with quality characteristics whose degradation over time can be related to reliability. The performance of a degradation model largely depends on an appropriate model description of the product's degradation path. The cross-platform package iDEMO (integrated degradation models) is developed in R and the interface is built using the Tcl/Tk bindings provided by the tcltk and tcltk2 packages included with R. It is a tool to build a linear degradation model which can simultaneously consider the unit-to-unit variation, time-dependent structure and measurement error in the degradation paths. The package iDEMO provides the maximum likelihood estimates of the unknown parameters, mean-time-to-failure and q-th quantile, and their corresponding confidence intervals based on the different information matrices. In addition, degradation model selection and goodness-of-fit tests are provided to determine and diagnose the degradation model for the user's current data by the commonly used criteria. By only enabling user interface elements when necessary, input errors are minimized. 2012/07/20 - 22:49

Vol. 47, Issue 14, May 2012Abstract: The detection and determination of clusters has been of special interest among researchers from different fields for a long time. In particular, assessing whether the clusters are significant is a question that has been asked by a number of experimenters. In Fuentes and Casella (2009), the authors put forth a new methodology for analyzing clusters. It tests the hypothesis H0 : κ = 1 versus H1 : κ = k in a Bayesian setting, where κ denotes the number of clusters in a population. The bayesclust package implements this approach in R. Here we give an overview of the algorithm and a detailed description of the functions available in the package. The routines in bayesclust allow the user to test for the existence of clusters, and then pick out optimal partitionings of the data. We demonstrate the testing procedure with simulated datasets. 2012/05/19 - 12:13

Vol. 47, Issue 13, May 2012Abstract: Trimmed regions are a powerful tool of multivariate data analysis. They describe a probability distribution in Euclidean d-space regarding location, dispersion, and shape, and they order multivariate data with respect to their centrality. Dyckerhoff and Mosler (2011) have introduced the class of weighted-mean trimmed regions, which possess attrac- tive properties regarding continuity, subadditivity, and monotonicity.
We present an exact algorithm to compute the weighted-mean trimmed regions of a given data cloud in arbitrary dimension d. These trimmed regions are convex polytopes in Rd. To calculate them, the algorithm builds on methods from computational geometry. A characterization of a region’s facets is used, and information about the adjacency of the facets is extracted from the data. A key problem consists in ordering the facets. It is solved by the introduction of a tree-based order, by which the whole surface can be traversed efficiently with the minimal number of computations. The algorithm has been programmed in C++ and is available as the R package WMTregions. 2012/05/19 - 12:13

Vol. 47, Issue 12, May 2012Abstract: Outlying data can heavily influence standard clustering methods. At the same time, clustering principles can be useful when robustifying statistical procedures. These two reasons motivate the development of feasible robust model-based clustering approaches. With this in mind, an R package for performing non-hierarchical robust clustering, called tclust, is presented here. Instead of trying to “fit” noisy data, a proportion α of the most outlying observations is trimmed. The tclust package efficiently handles different cluster scatter constraints. Graphical exploratory tools are also provided to help the user make sensible choices for the trimming proportion as well as the number of clusters to search for. 2012/05/19 - 12:13

Vol. 47, Issue 11, May 2012Abstract: The pcalg package for R can be used for the following two purposes: Causal structure learning and estimation of causal effects from observational data. In this document, we give a brief overview of the methodology, and demonstrate the package’s functionality in both toy examples and applications. 2012/05/19 - 12:13

Vol. 47, Issue 10, May 2012Abstract: This paper introduces rpartScore (Galimberti, Soffritti, and Di Maso 2012), a new R package for building classification trees for ordinal responses, that can be employed whenever a set of scores is assigned to the ordered categories of the response. This package has been created to overcome some problems that produced unexpected results from the package rpartOrdinal (Archer 2010). Explanations for the causes of these unexpected results are provided. The main functionalities of rpartScore are described, and its use is illustrated through some examples. 2012/05/19 - 12:13

Vol. 47, Issue 9, Apr 2012Abstract: For survival data with a large number of explanatory variables, lasso penalized Cox regression is a popular regularization strategy. However, a penalized Cox model may not always provide the best fit to data and can be difficult to estimate in high dimension because of its intrinsic nonlinearity. The semiparametric additive hazards model is a flexible alternative which is a natural survival analogue of the standard linear regression model. Building on this analogy, we develop a cyclic coordinate descent algorithm for fitting the lasso and elastic net penalized additive hazards model. The algorithm requires no nonlinear optimization steps and offers excellent performance and stability. An implementation is available in the R package ahaz. We demonstrate this implementation in a small timing study and in an application to real data. 2012/04/28 - 00:20

Vol. 47, Issue 8, Apr 2012Abstract: Aligment of mass spectrometry (MS) chromatograms is sometimes required prior to sample comparison and data analysis. Without alignment, direct comparison of chromatograms would lead to inaccurate results. We demonstrate a new method for computing a high quality alignment of full length MS chromatograms using variable penalty dynamic time warping. This method aligns signals using local linear shifts without excessive warping that can alter the shape (and area) of chromatogram peaks. The software is available as the R package VPdtw on the Comprehensive R Archive Network and we highlight how one can use this package here. 2012/04/28 - 00:20

Vol. 47, Issue 7, Apr 2012Abstract: Manufacturing processes are often based on more than one quality characteristic. When these variables are correlated the process capability analysis should be performed using multivariate statistical methodologies. Although there is a growing interest in methods for evaluating the capability of multivariate processes, little attention has been given to developing user friendly software for supporting multivariate capability analysis. In this work we introduce the package MPCI for R, which allows to compute multivariate process capability indices. MPCI aims to provide a useful tool for dealing with multivariate capability assessment problems. We illustrate the use of MPCI package through both simulated and real examples. 2012/04/28 - 00:20

Vol. 47, Issue 6, Apr 2012Abstract: We consider the problem of nonparametric density estimation where estimates are constrained to be unimodal. Though several methods have been proposed to achieve this end, each of them has its own drawbacks and none of them have readily-available computer codes. The approach of Braun and Hall (2001), where a kernel density estimator is modified by data sharpening, is one of the most promising options, but optimization difficulties make it hard to use in practice. This paper presents a new algorithm and MATLAB code for finding good unimodal density estimates under the Braun and Hall scheme. The algorithm uses a greedy, feasibility-preserving strategy to ensure that it always returns a unimodal solution. Compared to the incumbent method of optimization, the greedy method is easier to use, runs faster, and produces solutions of comparable quality. It can also be extended to the bivariate case. 2012/04/28 - 00:20

Vol. 47, Issue 5, Apr 2012Abstract: The R package bclust is useful for clustering high-dimensional continuous data. The package uses a parametric spike-and-slab Bayesian model to downweight the effect of noise variables and to quantify the importance of each variable in agglomerative clustering. We take advantage of the existence of closed-form marginal distributions to estimate the model hyper-parameters using empirical Bayes, thereby yielding a fully automatic method. We discuss computational problems arising in implementation of the procedure and illustrate the usefulness of the package through examples. 2012/04/20 - 22:56

Vol. 47, Software Review 1, Apr 2012mathStatica 2.5, version 2.5mathStatica 2012/04/20 - 22:56

Vol. 47, Code Snippet 2, Apr 2012 2012/04/20 - 22:56

Vol. 47, Book Review 1, Apr 2012Modern Fortran: Style and UsageNorman S. Clerman and Walter SpectorCambridge University Press, New York, NY, 2012ISBN: 978-0-521-73052-5 2012/04/20 - 22:56

Vol. 47, Issue 4, Apr 2012Abstract: Frailty models are very useful for analysing correlated survival data, when observations are clustered into groups or for recurrent events. The aim of this article is to present the new version of an R package called frailtypack. This package allows to fit Cox models and four types of frailty models (shared, nested, joint, additive) that could be useful for several issues within biomedical research. It is well adapted to the analysis of recurrent events such as cancer relapses and/or terminal events (death or lost to follow-up). The approach uses maximum penalized likelihood estimation. Right-censored or left-truncated data are considered. It also allows stratification and time-dependent covariates during analysis. 2012/04/20 - 22:56

Vol. 47, Issue 3, Apr 2012Abstract: Classical supervised learning enjoys the luxury of accessing the true known labels for the observations in a modeled dataset. Real life, however, poses an abundance of problems, where the labels are only partially defined, i.e., are uncertain and given only for a subset of observations. Such partial labels can occur regardless of the knowledge source. For example, an experimental assessment of labels may have limited capacity and is prone to measurement errors. Also expert knowledge is often restricted to a specialized area and is thus unlikely to provide trustworthy labels for all observations in the dataset. Partially supervised mixture modeling is able to process such sparse and imprecise input. Here, we present an R package called bgmm, which implements two partially supervised mixture modeling methods: soft-label and belief-based modeling. For completeness, we equipped the package also with the functionality of unsupervised, semi- and fully supervised mixture modeling. On real data we present the usage of bgmm for basic model-fitting in all modeling variants. The package can be applied also to selection of the best-fitting from a set of models with different component numbers or constraints on their structures. This functionality is presented on an artificial dataset, which can be simulated in bgmm from a distribution defined by a given model. 2012/04/20 - 22:56

Vol. 47, Issue 2, Apr 2012Abstract: We present GeoXp, an R package implementing interactive graphics for exploratory spatial data analysis. We use a data set concerning public schools of the French Midi- Pyr ́en ́ees region to illustrate the use of these exploratory techniques based on the coupling between a statistical graph and a map. Besides elementary plots like boxplots, histograms or simple scatterplots, GeoXp also couples maps with Moran scatterplots, variogram clouds, Lorenz curves and other graphical tools. In order to make the most of the multidimensionality of the data, GeoXp includes dimension reduction techniques such as principal components analysis and cluster analysis whose results are also linked to the map. 2012/04/20 - 22:56

Vol. 47, Issue 1, Apr 2012Abstract: splm is an R package for the estimation and testing of various spatial panel data specifications. We consider the implementation of both maximum likelihood and generalized moments estimators in the context of fixed as well as random effects spatial panel data models. This paper is a general description of splm and all functionalities are illustrated using a well-known example taken from Munnell (1990) with productivity data on 48 US states observed over 17 years. We perform comparisons with other available software; and, when this is not possible, Monte Carlo results support our original implementation. 2012/04/20 - 22:56

Vol. 46, Issue 14, Mar 2012Abstract: This article describes the benchden package which implements a set of 28 example densities for nonparametric density estimation in R. In addition to the usual functions that evaluate the density, distribution and quantile functions or generate random variates, a function designed to be specifically useful for larger simulation studies has been added. After describing the set of densities and the usage of the package, a small toy example of a simulation study conducted using the benchden package is given. 2012/04/12 - 20:21

Vol. 46, Issue 13, Mar 2012Abstract: In many medical studies, patients can experience several events. The times between consecutive events (gap times) are often of interest and lead to problems that have received much attention recently. In this work we consider the estimation of the bivariate distribution function for censored gap times, using survivalBIV a software application for R. Some related problems such as the estimation of the marginal distribution of the second gap time is also discussed. It describes the capabilities of the program for estimating these quantities using four different approaches, all using the Kaplan-Meier estimator of survival. One of these estimators is based on Bayes’ theorem and Kaplan-Meier survival function. Two estimators were recently proposed using the Kaplan-Meier estimator pertaining to the distribution of the total time to weight the bivariate data (de Un ̃a-A ́lvarez and Meira-Machado 2008 and de Un ̃a-A ́lvarez and Amorim 2011). The software can also be used to implement the estimator proposed in Lin, Sun, and Ying (1999), which is based on inverse probability of censoring weighted. The software is illustrated using data from a bladder cancer study. 2012/04/12 - 20:21

Vol. 46, Issue 12, Mar 2012Abstract: We present two natural generalizations of the multinomial and multivariate binomial distributions, which arise from the multiplicative binomial distribution of Altham (1978). The resulting two distributions are discussed and we introduce an R package, MM, which includes associated functionality. 2012/04/12 - 20:21

Vol. 46, Issue 11, Mar 2012Abstract: Many high-throughput biological data analyses require the calculation of large correlation matrices and/or clustering of a large number of objects. The standard R function for calculating Pearson correlation can handle calculations without missing values efficiently, but is inefficient when applied to data sets with a relatively small number of missing data. We present an implementation of Pearson correlation calculation that can lead to substantial speedup on data with relatively small number of missing entries. Further, we parallelize all calculations and thus achieve further speedup on systems where parallel processing is available. A robust correlation measure, the biweight midcorrelation, is implemented in a similar manner and provides comparable speed. The functions cor and bicor for fast Pearson and biweight midcorrelation, respectively, are part of the updated, freely available R package WGCNA.
The hierarchical clustering algorithm implemented in R function hclust is an order n3 (n is the number of clustered objects) version of a publicly available clustering algorithm (Murtagh 2012). We present the package flashClust that implements the original algorithm which in practice achieves order approximately n2, leading to substantial time savings when clustering large data sets. 2012/04/12 - 20:21

Vol. 46, Book Review 2, Feb 2012R in ActionRobert I. KabacofManning, Shelter Island, NY, 2011ISBN: 978-1-935-18239-9 2012/04/12 - 20:21

Vol. 46, Issue 10, Feb 2012Abstract: In this paper we present the R package gRain for propagation in graphical independence networks (for which Bayesian networks is a special instance). The paper includes a description of the theory behind the computations. The main part of the paper is an illustration of how to use the package. The paper also illustrates how to turn a graphical model and data into an independence network. 2012/04/12 - 20:21

Vol. 46, Issue 9, Feb 2012Abstract: We present the R package bild for the parametric and graphical analysis of binary longitudinal data. The package performs logistic regression for binary longitudinal data, allowing for serial dependence among observations from a given individual and a random intercept term. Estimation is via maximization of the exact likelihood of a suitably defined model. Missing values and unbalanced data are allowed, with some restrictions. The code of bild is written partly in R language, partly in Fortran 77, interfaced through R. The package is built following the S4 formulation of R methods. 2012/04/12 - 20:21

Vol. 46, Issue 8, Feb 2012Abstract: A multivariate generalization of the emulator technique described by Hankin (2005) is presented in which random multivariate functions may be assessed. In the standard univariate case (Oakley 1999), a Gaussian process, a finite number of observations is made; here, observations of different types are considered. The technique has the property that marginal analysis (that is, considering only a single observation type) reduces exactly to the univariate theory. The associated software is used to analyze datasets from the field of climate change. 2012/04/12 - 20:21

Vol. 46, Issue 7, Jan 2012Abstract: Neural networks are important standard machine learning procedures for classification and regression. We describe the R package RSNNS that provides a convenient interface to the popular Stuttgart Neural Network Simulator SNNS. The main features are (a) encapsulation of the relevant SNNS parts in a C++ class, for sequential and parallel usage of different networks, (b) accessibility of all of the SNNS algorithmic functionality from R using a low-level interface, and (c) a high-level interface for convenient, R-style usage of many standard neural network procedures. The package also includes functions for visualization and analysis of the models and the training procedures, as well as functions for data input/output from/to the original SNNS file formats. 2012/04/12 - 20:21

Vol. 46, Issue 6, Jan 2012Abstract: This paper presents the R package HDclassif which is devoted to the clustering and the discriminant analysis of high-dimensional data. The classification methods proposed in the package result from a new parametrization of the Gaussian mixture model which combines the idea of dimension reduction and model constraints on the covariance matrices. The supervised classification method using this parametrization is called high dimensional discriminant analysis (HDDA). In a similar manner, the associated clustering method is called high dimensional data clustering (HDDC) and uses the expectation-maximization algorithm for inference. In order to correctly fit the data, both methods estimate the specific subspace and the intrinsic dimension of the groups. Due to the constraints on the covariance matrices, the number of parameters to estimate is significantly lower than other model-based methods and this allows the methods to be stable and efficient in high dimensions. Two introductory examples illustrated with R codes allow the user to discover the hdda and hddc functions. Experiments on simulated and real datasets also compare HDDC and HDDA with existing classification methods on high-dimensional datasets. HDclassif is a free software and distributed under the general public license, as part of the R software project. 2012/04/12 - 20:21

Vol. 46, Issue 5, Jan 2012Abstract: Setting the free parameters of classifiers to different values can have a profound impact on their performance. For some methods, specialized tuning algorithms have been developed. These approaches mostly tune parameters according to a single criterion, such as the cross-validation error. However, it is sometimes desirable to obtain parameter values that optimize several concurrent - often conflicting - criteria. The TunePareto package provides a general and highly customizable framework to select optimal parameters for classifiers according to multiple objectives. Several strategies for sampling and optimizing parameters are supplied. The algorithm determines a set of Pareto-optimal parameter configurations and leaves the ultimate decision on the weighting of objectives to the researcher. Decision support is provided by novel visualization techniques. 2012/04/12 - 20:21

Vol. 46, Issue 4, Jan 2012Abstract: Exploratory factor analysis is a widely used statistical technique in the social sciences. It attempts to identify underlying factors that explain the pattern of correlations within a set of observed variables. A statistical software package is needed to perform the calculations. However, there are some limitations with popular statistical software packages, like SPSS. The R programming language is a free software package for statistical and graphical computing. It offers many packages written by contributors from all over the world and programming resources that allow it to overcome the dialog limitations of SPSS. This paper offers an SPSS dialog written in the R programming language with the help of some packages, so that researchers with little or no knowledge in programming, or those who are accustomed to making their calculations based on statistical dialogs, have more options when applying factor analysis to their data and hence can adopt a better approach when dealing with ordinal, Likert-type data. 2012/04/12 - 20:21