Every fortnight we run a group meeting to catch up on our projects. Once a month we ask a friend, a colleague, or a collaborator to present his or her research in a friendly and informal environment.

Follow the link below if you would like to attend the meetings.

Science is all about sharing ideas!

20.05.2022 (10:00 CET) - Challenges and recent advances in measuring the statistical hypothesis tests' performances

by Mustafa Cavus, Faculty of Mathematics & Information Science, Poland & Department of Statistics, Eskisehir Technical University, Turkey

Link (Microsoft Teams):

Abstract: Proposing a new parametric hypothesis testing procedure for testing a null hypothesis is hot topic in statistics. It is a critical process consisting of many steps. One of the most crucial steps of this process is the Monte-Carlo simulation study which is conducted to demonstrate the performance of the proposed test. It is very important (1) to determine the configuration of Monte-Carlo simulation studies and (2) to choose the metric(s) used to measure the performance of the proposed test. The most well-known and frequently used metrics are the power of the test and Type 1 error probability. However, when the probability of Type I error differs from the nominal level, this potentially contaminates any power comparisons. This creates a problem of comparing the tests in terms of power when Type I error probabilities are different. To handle with this problem, there are two possible alternative metric groups are significance level-dependent and independent can be used. Whereas power of the test, adjusted power, intrinsic power, penalized power are significance level-dependent metrics, the expected p-values and median p-values are significance level-independent metrics are used to measure the performance of a test. Nevertheless none of these metrics are uniformly better. Another problem in this step is the inconsistency of the configurations used in the simulation studies and using the ambiguity in expressions in the reporting. This situation prevents the comparison of performance results obtained for similar tests proposed in the literature. In this case, it is needed to conduct an additional simulation study, which is not a practical way. To solve this problem, to standardize the simulation configurations is very important but it is not always a practical way.

In this talk, first of all, the performance measure in the process of proposing a hypothesis test and the problems encountered in the simulation studies conducted for measuring its performance and possible solutions are discussed. Then, a novel approach is given to solve the mentioned problems from a different perspective based on an interaction point between statistics and machine learning which is thought to be a more practical way.

15.04.2022 (10:00 CET) - Immunogenetic studies in ME/CFS

by Riad Hajdarevic, Faculty of Medicine, University of Oslo & Oslo University Hospital, Norway

Link (Microsoft Teams):

Abstract: Myalgic encephalopathy/chronic fatigue syndrome (ME/CFS) is a chronic and debilitating disease that affects about 0.1-0.2% of the general population. The core symptoms are persistent debilitating fatigue, post-exertional malaise (PEM) and cognitive dysfunction. Most symptoms of ME/CFS are not disease specific. Additionally, there is a lack of both biomarkers and diagnostic tests for the disease, which makes accurate diagnosis difficult. More than 20 different patient classifications and diagnostic criteria have emerged over the last four decades. Due to this, the patient population can be quite heterogeneous in terms of clinical symptoms and the extent to which the disease impacts quality of life. There are several different theories that aim to explain the disease development of ME/CFS. In this thesis, we have taken as our starting point the growing evidence for an immunological background for ME/CFS pathogenesis. Several studies have pointed to altered NK cells, autoantibodies and T cell abnormalities in ME/CFS patients. In addition, several genetic studies reported significant associations in various immunologically relevant genes. Most of these previous studies have been suboptimal and included heterogeneous patient populations and/or few patients in total. Therefore, we aimed to gain a better understanding of the role of immunologically relevant genes and disease development of ME/CFS. To do this, we employed known strategies from genetic studies in autoimmune disease and applied them to ME/CFS. We used strict quality control and included, to the best of our knowledge, the largest cohort diagnosed with the Canadian consensus criteria.

We first followed up previously performed work by our group that reported associations between ME/CFS and HLA-C: 07: 04 and HLA-DQB1: 03: 03 alleles. The HLA (human leukocyte antigen) region consists a multitude of immunologically relevant genes in addition to the HLA genes, and there is extensive and complex linkage disequilibrium (LD) in the region. The previously observed association signals in the HLA region were fine-mapped by genotyping five additional classical HLA loci and 5,342 SNPs (single nucleotide variants) in 427 Norwegian ME/CFS patients, diagnosed according to the Canadian consensus criteria, and 480 healthy Norwegian controls. The analysis revealed two independent association signals (p ≤ 0.001) represented by the genetic variants rs4711249 in the HLA class I region and rs9275582 in the HLA class II region. The primary association signal in the HLA class II region was located in the vicinity of the HLA-DQ genetic region, most likely due to the HLA-DQB1 gene. In particular, amino acid position 57 (aspartic acid / alanine) in the peptide binding pit of HLA-DQB1, or an SNP upstream of HLA-DQB1 seemed to explain the association signal we observed in the HLA class II region. In the HLA class I region, the putative primary locus was not as clear and could possibly lie outside the classical HLA genes (the association signal spans several genes DDR1, GTF2H4, VARS2, SFTA2 and DPCR1) with expression levels influenced by the ME/CFS associated SNP genotypes. Interestingly, we also observed that > 60% of the patients who responded to cyclophosphamide treatment for ME/CFS had either the rs4711249 risk allele and/or DQB1* 03:03 versus 12% of the patients who did not respond to the treatment. Our findings suggest the involvement of the HLA region, and in particular the HLA-DQB1 gene, in ME/CFS. Although our study is the largest to date, it is still a relatively small study in the context of genetic studies. Our findings need to be replicated in much larger, statistically more representative, cohorts. In particular, it is necessary to investigate the involvement of HLA- 12 DQB1, a gene that contains alleles that increase the risk of several established autoimmune diseases such as celiac disease. Additionally, we aimed to investigate immunologically relevant genes using a genotyping array (iChip) targeting immunological gene regions previously associated with different autoimmune diseases. In addition to the Norwegian cohort of 427 ME/CFS patients (the Canadian consensus criteria), we also analyzed data from two replication cohorts, a Danish one of 460 ME/CFS patients (Canadian consensus criteria) and a data set from the UK Biobank of 2105 self-reported CFS patients. To the best of our knowledge, this is the first ME/CFS genetic association study of this magnitude and it included more than 2,900 patients in total (of whom 887 are diagnosed according to Canadian consensus criteria). We found no ME/CFS risk variants with a genome wide significance level (p <5x10-8), but we identified six gene regions (TPPP, LINC00333, RIN3. IGFBP/IGFBP3, IZUMO1/MAMSTR and ZBTB46/STMN3) with possible association with ME/CFS which require further follow-up in future studies in order to assess whether they are real findings or not. Interestingly, these genes are expressed in disease-relevant tissue, e.g. brain, nerve, skeletal muscle and blood, including immune cells (subgroups of T cells, B cells, NK cells and monocytes). Furthermore, several of the ME/CFS associated SNP genotypes are associated with differential expression levels of these genes. Although we could not identify statistically convincing associations with genetic variants across the three cohorts, we believe that our data sets and analysis represent an important step in the ME/CFS research field. Our study demonstrated that for the future understanding of the genetic architecture of ME/CFS much larger studies are required to established reliable associations. As last part of our study we wanted to investigate previous findings from a genome wide association study of 42 ME/CFS patients who reported significant association with two SNPs in the T cell receptor alpha (TRA) locus (P-value <5x10-8). In order to replicate these previously reported findings, we used a large Norwegian ME/CFS cohort (409 cases and 810 controls) and data from the UK Biobank (2105 cases and 4786 controls). We examined a number of SNPs in the TRA locus, including the two previous ME/CFS-associated variants, rs11157573 and rs17255510. No statistically significant associations were observed in either the Norwegian cohort or UK biobank cohorts. Nevertheless, other SNPs in the region showed weak signs of association (P-value <0.05) in the UK Biobank cohort and meta-analyzes of Norwegian and UK Biobank cohorts, but did not remain associated after applying correction for multiple testing. Thus, we could not confirm associations with genetic variants in the TRA locus in this study.

16.03.2022 (12:00 CET) - Recent advances in nonparametric circular regression estimation

by Andrea Meilán Vila, Carlos III University of Madrid, Spain


Abstract: The analysis of a variable of interest that depends on other variable(s) is a typical issue appearing in many practical problems. Regression analysis provides the statistical tools to address this type of problems. This topic has been deeply studied, especially when the variables in study are of Euclidean type. However, there are situations where the data present certain kind of complexities, for example, the involved variables are of circular or functional type, and the classical regression procedures designed for Euclidean data may not be appropriate. In these scenarios, these techniques would have to be conveniently modified to provide useful results. Moreover, it might occur that the variables of interest can present a certain type of dependence. For example, they can be spatially correlated, where observations that are close in space tend to be more similar than observations that are far apart.

This work aims to design and study new approaches to deal with regression function estimation for models with a circular response and different types of covariates. For an R^d-valued covariate, nonparametric proposals to estimate the circular regression function are provided and studied, under the assumption of independence and also for spatially correlated errors. These estimators are also adapted for regression models with a functional covariate. In the above-mentioned frameworks, the asymptotic bias and variance of the proposed estimators are calculated. Some guidelines for their practical implementation are provided, checking their sample performance through simulations. Finally, the behavior of the estimators is also illustrated with real data sets.

16.02.2022 (11:00 CET) - Statistical and mathematical tools used to help inform public health policy-making in Portugal during the COVID-19 epidemic

by Constantino Pereira Caetano, Instituto Nacional de Saúde Doutor Ricardo Jorge & Center for Computational and Stochastic Mathematics (CEMAT), Portugal


Abstract: the COVID-19 epidemic presented several challenges to public health policy-makers in Portugal. These consisted in evaluating levels of disease transmission in the community, impact of non-pharmaceutical interventions, forecasting different epidemic phases and disease burden on healthcare services, among others. In this talk, we will present an array of statistical and mathematical tools used by the team at the Portuguese National Institute of Health in order to provide insight into these questions. The methods adopted range from statistical techniques to nowcast and forecast the epidemic curve by disease onset, calculating the effective reproduction number and a compartmental susceptible-exposed-infected-recovered model to create"what if" scenarios for future pandemic phases.

30.07.2021 - Antigen receptor allelic exclusion did not arise to avoid autoimmunity but to avoid genome damage

by Jorge Carneiro, Instituto Gulbenkian de Ciência, Portugal

The adaptive immune system of the vertebrates deploys a co-opted transposon machinery, the Rag endonuclease, to generate unlimited diversity by random recombination of the antigen receptor genes. The antigen receptor diversity allows the vertebrate organism to acquire specific immunity to mutating, fast-evolving microorganisms. It has been argued theoretically that specific immunity requires that each lymphocyte bears a unique antigen receptor and that dual receptor lymphocytes would be ambiguous and a potential cause of autoimmune diseases. Since vertebrates are diploid, with maternal and paternal antigen receptor gene alleles, lymphocytes bearing two distinct antigen receptors could arise from independent random recombination of these alleles. Although the vast majority of the lymphocytes show antigen receptor gene allelic exclusion, the conspicuous observation of lymphocytes with two distinct antigen receptors in healthy humans or rodents is puzzling. Using a simple mathematical model of the antigen receptor gene recombination and its potential evolutionary constraints, we demonstrate that allelic exclusion is expected to evolve indirectly from purifying selection against genome damage caused by Rag-mediated illegitimate recombination. This result voids the argument that allelic exclusion is a fundamental property of the adaptive immune system.

This is a joint work of Delphine Pessoa and Jorge Carneiro.

25.06.2021 - Where are my modes?

by Jose Ameijeiras-Alonso, Universidade de Santiago de Campostela, Spain

The main topic of this talk will be about modes that, for continuous data, can be seen as the peaks or maxima in probability densities. Using nonparametric approaches, we will discuss different exploratory and testing tools for determining the number of modes and their estimated location. Kernel density estimators will be employed to estimate the mode locations, and we will discuss the role of the smoothing parameter. Together with the last, the concept of excess mass will be reviewed to introduce a new method for determining the number of modes. Contrary to previous proposals, this new method has a good calibration behavior and presents good power results when testing the general number of modes.