The paper refers to a study on different adjuvant formulations of the vaccine against malarial circumsporozoite protein in a clinical trial consisting of 22 rhesus monkeys allocated to 9 experimental conditions. A large number of immunological variables was analysed using multivariate methods (e.g., hierarchical clustering and principal component analysis) and machine learning techniques (e.g., random forest).
The main claim of the paper is that “broad immune-profiling in combination with machine learning methods enabled the reliable and clear definition of immune signatures for different adjuvant formulations”.
Our group discussion raised the following three main issues.
First, multiple testing was controlled by a false discovery rate (FDR) of 0.20. Given the small sample size and the high number of variables analysed, it seems that this too-high FDR was conveniently chosen in order to have statistically significant findings to report in the paper.
Second, it was unclear why data from transient time points were necessary to be included in the analysis. If the vaccine is supposed to elicit a memory immune response, the resulting cellular and humoral immunity should reach a steady state after some point of the trial. Then, why not to compare the different adjuvant formulations only at the end of the trial when the steady state should have been already reached?
Third, some methods were not clearly explained or described (e.g., what was linkage method used in the hierarchical clustering shown in Figure 2?) and some plots showed a quality below to what is recommended for a scientific publication (see the example below).
In conclusion, the statistical approach adopted by the authors should be taken with a grain of salt.
Example of a poor artwork (Figure 4).
In this plot, the top whiskers of the boxplots referring to ALFA and ALFQA adjuvant formulations are not shown.