IS.18: Longitudinal Modelling of Compositional data

18 August 2020 | 12:00 UTC

Time Zone Converter
Speaker Abstracts
Zoom Access

Compositional data describe amounts of components of specimens, when the size of each specimen is constant. Conveying relative information with respect to some whole occurs in measurements involving probabilities and percentages; relative portions sum up to 100%. Because of this constraint of constant sum, direct applications of multivariate statistical methods may result in spurious conclusions. In medical (or health)  research, this artifact might cause disagreement among study results with regard to associations between microbiome and other factors.

Awareness of problems about the compositional data is rippling through various scientific fields, i.e.,  ecology, microbiology, econometrics, chemometrics, etc. There are two main challenges in developing statistical methods specifically for microbiome study: one is to fully acknowledge the compositional nature of data, and another is to develop longitudinal models to fit the dynamic association among microbiome, environments and host. In this session we choose to address these two methodological aspects in the current compositional data analysis, compositional and longitudinal data analysis, jointly.

To deal with this type of data, one can consider a suitable transformation of data. This removes the problem of a constrained sample space to unconstrained multivariate space, which enables applying off-the-shelf multivariate techniques. Dr. Lawrence David et al. introduced the PhILR transform [1], which incorporates microbial evolutionary models with the isometric log-ration transform. Applying the bacterial phylogenic tree as a natural and informative sequential binary partition with the ILR transform, capturing evolutionary relationships between neighboring bacterial groups, and perform multinomial logistic-normal dynamic linear models (MALLARDs) for dense longitudinal sampling [2] demands a combined approach in bioinformatics as well as biostatistics.

Alternatively, Dr. Janice Scealy et al. proposed transforming compositional data to directional data by the square-root transformation and then data are modelled by using Kent distribution defined on the hypersphere [3]. In a longitudinal setting, a directional mixed effects model for compositional expenditure data is proposed. One of the advantages of this approach is that zero components are accounted for in the models [4]. Moreover random effects representing correlation within a person can be straightforwardly included. The third way to deal with longitudinal compositional data is based on the Dirichlet multinomial regression model with random effects (Prof. Jeanine Houwing-Duistermaat et al., [5]). These models provide straightforward interpretations, but are challenging to fit. This method will be demonstrated using real-world data: the microbiome measurements from an epidemiology study carried out in a helminth endemic rural area in Indonesia. The stool samples were collected from the subjects in a randomized placebo-controlled design; hence inclusion of microbiome data from infected subjects who received placebo treatment makes this study unique.

To sum up, the topics of the presentations will cover methodological developments as well as practical application in longitudinal modelling of compositional data.


  1. Silverman JD,…, David LA. A phylogenetic transform enhances analysis of compositional microbiota data. eLife 2017;6:e21887.
  2. Silverman JD,…, David LA. Dynamic linear models guide design and analysis of microbiota studies within artificial human guts. Microbiome 2018; 6:202
  3. Scealy JL, Welsh AH. Regression for compositional data by using distributions defined on the hypersphere. R. Statist. Soc. B 2011; 73: 351–375.
  4. Scealy JL, Welsh AH. A Directional Mixed Effects Model for Compositional Expenditure Data. Journal of the American Statistical Association 2017; 112:517, 24-36.
  5. Martin I,…, Houwing-Duistermaat JJ. The mixed model for the analysis of a repeated-measurement multivariate count data. Statistics in Medicine 2019; in print.

Session Chair & Discussant: 
Dr. Hae-Won Uh, Department of Biostatistics and Research Support, Div. Julius Center, UMC Utrecht

Session Speakers:

Kimberly Roche, Department of Molecular Genetics and Microbiology, Duke University
Fast and accurate inference of Bayesian multinominal logistic-normal models with application to the analysis of microbiome data

Janice Lea Scealy, Research School of Finance, Actuarial Studies & Statistics, ANU College of Business and Economics
Directional mixed effects models for compositional data

Jeanine Houwing-Duistermaat, Data Analytics and Statistics, Leeds University
Joint models for longitudinally measured multivariate categorical count data and a continuous outcome