2019-2020

À moins d’indication contraire, tous les séminaires de STATQAM ont lieu à 15h30 au Pavillon Président-Kennedy (PK), PK-5115, 201, avenue du Président-Kennedy, Montréal (QC) H2X 2J6.

Session Automne 2019

Jeudi 5 septembre : Paul Doukhan (Univ. Cergy, France)

Titre : Non stationnarité et applications

Résumé : La stationnarité est une hypothèse courante en statistique des séries temporelles. Celle ci est raisonnable lorsque la dynamique de l’évolution d’un phénomène ne change pas au cours du temps. De fait la condition utile est l’ergodicité car elle conduit à des lois de grands nombres justifiant la consistance d’estimations naturelles. En réalité les dynamiques ne sont souvent pas homogènes dans le temps. L’objectif de l’exposé est de proposer des conditions adaptées à des phénomènes réels. Outre les ruptures de régimes.m, les conditions de stationnarité locales introduites par Dahlhaus semblent une approche adaptée. Nous essaierons de dégager des modèles (par exemple des chaînes de Markov ou des modèles à mémoire plus généraux), des techniques (des propriétés de dépendance, ou des inégalités spécifiques) ainsi que des applications (en apprentissage statistique, en astronomie, météorologie, ou adaptées à la vente en ligne) à ce cadre très général.

Jeudi 12 septembre : Janosch Ortmann (ESG, UQAM)

Title: KPZ universality: last passage percolation, polymers and particles

Abstract: KPZ universality describes a scaling behaviour that differs from the central limit theorem by the size of the fluctuations ($n^{1/3}$ instead of $n^{1/2}$) and the limiting distribution. Instead of the Gaussian, the Tracy-Widom distributions from random matrix theory appear in the limit. It is a long standing conjecture that the KPZ universality class contains a large group of models, including particle systems, last-passage and polymer models. Beyond its physical motivation, the study of KPZ universality involves a surprising range of mathematical tools, including algebra, combinatorics, analysis and stochastic calculus. In this talk, I will give an overview of the KPZ universality class and discuss some specific models, based on joint work with Duncan Dauvergne, Nicos Georgiou, Neil O’Connell, Jeremy Quastel, Daniel Remenik and Bálint Virág.

Jeudi 19 septembre : Paquito Bernard (Sc. de l’activité physique, UQAM)

Titre: Un modèle additif généralisé, pourquoi est-ce utile en sciences de l’activité physique ?

Résumé: La présentation permettra de décrire les caractéristiques et l’utilité d’un modèle additif généralisé (MAG). L’intervenant expliquera en quoi le MAG lui permet de répondre à de nouvelles questions de recherche et partagera quelques applications, notamment sur la modélisation des patrons d’activité physique avec différents indicateurs de santé.

Jeudi 26 septembre : Francisco Cuevas Pacheco (UQAM)

Title: A family of covariance functions for random fields on spheres


Abstract: The Matérn family of isotropic covariance functions has been central to the theoretical development and application of statistical models for geospatial data. For global data defined over the whole sphere representing planet Earth, the natural distance between any two locations is the great circle distance. In this setting, the Matern family of covariance functions has a restriction on the smoothness parameter, making it an unappealing choice to model smooth data. Finding a suitable analogue for modelling data on the sphere is still an open problem. This work proposes a new family of isotropic covariance functions for random fields defined over the sphere. The proposed family has four parameters, one of which indexes the mean square differentiability of the corresponding Gaussian field, and also allows for any admissible range of fractal dimension. We apply the proposed model to a dataset of precipitable water content over a large portion of the Earth, and show that the model gives more precise predictions of the underlying process at unsampled locations than does the Matérn model using chordal distances.

Vendredi 4 octobre : journée d’automne de STATLAB

  • 14h Linda Mhalla (Gerad)
  • 14h45 Florian Maire (UdM)
  • 16h Johanna Nešlehová (McGill, prix CRM-SSC)
    Inscription gratuite (mais obligatoire pour les deux premiers exposés)
    http://www.crm.umontreal.ca/2019/StatLabMethods19/

Jeudi 10 octobre : Sebastian Engelke (Univ. Genève, Suisse)

Title: Causal discovery in heavy-tailed models


Abstract: Causal questions are omnipresent in many scientific problems. While much progress has been made in the analysis of causal relationships between random variables, these methods are not well suited if the causal mechanisms manifest themselves only in extremes. This work aims to connect the two fields of causal inference and extreme value theory. We define the causal tail coefficient that captures asymmetries in the extremal dependence of two random variables. In the population case, the causal tail coefficient is shown to reveal the causal structure if the distribution follows a linear structural causal model. This holds even in the presence of latent common causes that have the same tail index as the observed variables. Based on a consistent estimator of the causal tail coefficient, we propose a computationally highly efficient algorithm that infers causal structure from finitely many data. We prove that our method consistently estimates the causal order and compare it to other well-established and non-extremal approaches in causal discovery on synthetic data. This is joint work with Nicola Gnecco, Nicolai Meinshausen and Jonas Peters; preprint available on https://arxiv.org/abs/1908.05097

Jeudi 17 octobre : Gabor Lugosi (Pompeu Fabra University, Spain)

Title: Network archeology: on revealing the past of random trees


Abstract: Networks are often naturally modeled by random processes in which nodes of the network are added one-by-one, according to some random rule. Uniform and preferential attachment trees are among the simplest examples of such dynamically growing networks. The statistical problems we address in this talk regard discovering the past of the network when a present-day snapshot is observed. Such problems are sometimes termed “network archeology”. We present a few results that show that, even in gigantic networks, a lot of information is preserved from the very early days.

Jeudi 24 octobre : Denis Larocque (HEC Montréal)

Title: Prediction Intervals with Random Forests


Abstract: The classical and most commonly used approach to building prediction intervals is the parametric approach. However, its main drawback is that its validity and performance highly depend on the assumed functional link between the covariates and the response. This research investigates new methods that improve the performance of prediction intervals with random forests. Two aspects are explored: The method used to build the forest and the method used to build the prediction interval. Four methods to build the forest are investigated, three from the CART paradigm and the transformation forest method. For CART forests, in addition to the default least squares splitting rule, two alternative splitting criteria are investigated. We also present and evaluate the performance of five flexible methods for constructing prediction intervals. This yields 20 distinct method variations. To reliably attain the desired confidence level, we include a calibration procedure performed on the out-of-bag information provided by the forest. The 20 method variations are thoroughly investigated and compared to five alternative methods through simulation studies and in real data settings. The results show that the proposed methods are very competitive. They outperform commonly used methods in both in simulation settings and with real data. This is joint work with Marie-Hélène Roy.

Vendredi 1er novembre (colloque ISM-CRM, Burnside Hall room 1104, McGill Univ., 16:00-17:00): Stephen Walker

Title: General Bayesian modeling


Abstract : The work is motivated by the inflexibility of Bayesian modeling; in that only parameters of probability models are required to be connected with data. The idea is to generalize this by allowing arbitrary unknowns to be connected with data via loss functions. An updating process is then detailed which can be viewed as arising in at least a couple of ways – one being purely axiomatically driven. The further exploration of replacing probability model based approaches to inference with loss functions is ongoing. Joint work with Chris Holmes, Pier Giovanni Bissiri and Simon Lyddon.

Jeudi 7 novembre : Tibor Schuster (Department of Family Medicine, McGill)

ANNULÉ

Jeudi 14 novembre : Léo Belzile (HEC, Montréal)

Titre: Espérance de vie humaine à des âges extrêmes


Résumé: Y a-t-il une limite à l’âge humain? Cette étude traite de la mortalité de supercentenaires italiens et français à l’aide d’une combinaison de théorie des valeurs extrêmes, d’analyse de survie et de méthodes d’inférence intensives en calcul. En prenant en compte le plan d’échantillonnage, un modèle avec une cote constante au delà de 108 ans est plausible et aucune différence entre cohorte ou sexe n’est détectée. Ces conclusions sont en adéquation avec des travaux précédents sur la survie de supercentenaires et supportent l’hypothèse qu’il n’existe pas de limite à l’âge humaine, ou alors que cette dernière est si large qu’il est peu probable qu’elle soit observée.

Ce travail est conjoint avec Anthony Davison, Holger Rootzén et Dmitri Zholud.

Vendredi 22 novembre (colloque des sciences mathématiques, 16h00, PK-5115): Don Estep (SFU, CANSSI Director)

Title: Formulation and solution of stochastic inverse problems for science and engineering models


Abstract: The stochastic inverse problem of determining probability structures on input parameters for a physics model corresponding to a given probability structure on the output of the model forms the core of scientific inference and engineering design. We describe a formulation and solution method for stochastic inverse problems that is based on functional analysis, differential geometry, and probability/measure theory. This approach yields a computationally tractable problem while avoiding alterations of the model like regularization and ad hoc assumptions about the probability structures. We present several examples, including a high-dimensional application to determination of parameter fields in storm surge models. We also describe work aimed at defining a notion of condition for stochastic inverse problems and tackling the related problem of designing sets of optimal observable quantities.

Jeudi 28 novembre : Tibor Schuster (Department of Family Medicine, McGill)

Title: Importance of collider stratification bias when estimating variable importance using Random Forests


Abstract: Only recently, advancements in causal inference have provided a sound explanation to two major phenomena that, for decades, have perplexed the scientific community: the ‘birthweight paradox’ and the ‘obesity paradox’. The former indicating that maternal smoking may be linked to lower infant mortality among low birthweight infants. The latter suggesting that obesity conferred a protective effect on mortality in certain subpopulations. In both cases, selection bias due to conditioning on a post-exposure collider variable i.e. collider-stratification bias (CSB), has been determined to be a plausible explanation. Now that machine learning (ML) methods are being increasingly used to identify “predictive factors” from large data, the lessons learned from solving these paradoxes are more important than ever. While CSB has become a widely-recognized concern when estimating exposure-outcome effects, its impact on variable importance measures (VIMs) in ML is not completely understood.
Applying the causal inference framework, we investigated the effect of collider stratification bias on the estimation of variable importance and ranking of predictor candidates using random forests.

Jeudi 5 décembre : Ayoub Belhadji (Univ. Lille, France)

Title: Kernel quadrature with DPPs


Abstract: We study quadrature rules for smooth functions living in an RKHS, using nodes sampled from a projection determinantal point process (DPP) that is a truncated and saturated version of the RKHS kernel. This coupling between the two kernels leads to a fast quadrature error rate. This rate depends on the spectrum of the RKHS kernel. This analysis gives a new insight on the rates of the quadratures based on DPPs especially for high dimensional numerical integration problems.

Session Hiver 2020

Jeudi 9 janvier : TBA

TBA

Jeudi 16 janvier : Éric Marchand (Université de Sherbrooke)

Titre : Estimation par densités prédictives : résultats récents


Résumé : Lors de cet exposé, j’aborderai l’estimation de densités prédictives et des mesures d’efficacité basées sur le risque fréquentiste. Notamment, pour les coûts Kullback-Leibler, de type alpha-divergence, L1 et L2, nous présentons plusieurs résultats de dominance exploitant des techniques d’expansion d’échelle, des liens de dualité avec l’estimation et la prédiction ponctuelle, et l’estimation de Stein pour des pertes concaves. Les modèles étudiés incluent la loi normale multivariée de structures de covariance connue et inconnue, des mélanges de lois de normale, Gamma et des modèles avec restriction ou information additionnelle sur les paramètres.

Jeudi 23 janvier : Philippe Gagnon (Université de Montréal)

Titre : Algorithmes à sauts non-réversibles pour une sélection bayésienne de modèles emboîtés


Résumé : Cette présentation portera sur une méthode de Monte Carlo par chaînes de Markov récemment introduite par Gagnon et Doucet (voir https://arxiv.org/abs/1911.01340). La présentation débutera par une introduction à ce type de méthodes et leurs applications. S’en suivra la description de la méthode de Gagnon et Doucet, et de ses avantages et désavantages. Un exemple d’application sera ensuite présenté.

Vendredi 31 janvier (colloque CRM-ISM) : Ana-Maria Staicu (NCSU)

TBA

Jeudi 6 février : Milica Miocevic (Department of Psychology, McGill University)

Title : Increasing power and data synthesis using Bayesian methods for mediation analysis

Abstract : Mediation analysis is used to study intermediate variables (M) that transmit the effect of an independent variable (X) on a dependent variable (Y). For example, an intervention designed to reduce unhealthy habits (X) might affect fruit and vegetable consumption (M), which in turn might affect general health (Y). In this hypothetical study, the quantity of interest is the indirect effect of the intervention on general health through fruit and vegetable consumption.

Mediation analysis can be performed using both classical (frequentist) and Bayesian approaches. In recent years social science researchers have turned to Bayesian methods when they encounter convergence issues (Chen, Choi, Weiss, & Stapleton, 2014), issues due to small samples (Lee & Song, 2004), and when they wish to report the probability that a parameter lies within a certain interval (Rindskopf, 2012).

The distribution of the mediated effect is often asymmetric (Craig, 1936; Lomnicki, 1967; Springer & Thompson, 1966), and the best classical methods for evaluating the significance of the mediated effect either take the asymmetric distribution of the product into account or make no distributional assumptions at all (Cheung 2007, 2009; MacKinnon, Fritz, Williams, & Lockwood 2007; MacKinnon, Lockwood, & Williams, 2004; MacKinnon, Lockwood, Hoffmann, West, & Sheets, 2002; MacKinnon, et al., 1995; Shrout & Bolger, 2002; Tofighi & MacKinnon, 2011; Valente, Gonzalez, Miočević, & MacKinnon, 2016; Yuan & MacKinnon, 2009).

Bayesian methods can easily accommodate the asymmetric distributions of the mediated effect and other functions of the mediated effect, e.g. effect size measures and causal estimates of indirect and direct effects. Furthermore, Bayesian methods provide an intuitive framework for the inclusion of relevant prior information into the statistical analysis. In this talk I will discuss the advantages of Bayesian mediation analysis, summarize recommendations that can be made for applied researchers based on the methodological literature on Bayesian mediation analysis thus far, and conclude with future directions for this line of research.

Jeudi 13 février : Mireille Schnitzer (Faculté de pharmacie, Université de Montréal)

Title : A model for effect modification using targeted learning with observational data arising from multiple studies

Abstract : When the effect of treatment may vary from individual to individual, precision medicine can be improved by identifying patient covariates to predict the size and direction of the effect at the individual level. However, this task is very statistically challenging and typically requires large amounts of data so that treatment effects may be well-estimated for different combinations of covariate values. One may also impose a working model in order to smooth (or summarize) the covariate-specific effect rather than estimate the effect separately for all possible patient subgroups. When working with observational data one must also adjust for all potential confounders of the treatment-outcome relationship, which can be accomplished with propensity score and/or outcome regression modeling.

Because of the large data requirements, investigators may be interested in using the individual patient data from multiple studies to estimate these treatment effect models. In our study, the data arise from a systematic review of observational studies contrasting different treatment regimens for patients with multidrug-resistant tuberculosis, where multiple antibiotics are taken concurrently over a long period to cure the infection.

Our specific contribution is the usage of targeted learning (TMLE) to develop a doubly-robust estimator for a marginal structural model representing the treatment effect model. When the observational data come from multiple studies, any given treatment may not be observed in all studies. We describe our algorithm and the assumptions necessary for consistent estimation in both the single-study and the meta-analytical settings.

Jeudi 20 février : Mohamed Ouhourane (UQAM)

Title : Group penalized expectile regression

Abstract : The asymmetric least squares (Expectile) regression allows to estimate unknown expectiles of the conditional distribution of a response variable as a function of a set of predictors and can handle heteroscedasticity issues. High dimensional data, such as omics data, are error prone and usually display heterogeneity. Such heterogeneity is often of scientific interest. In this work, we propose the Group Penalized Expectile Regression (GPER) approach, under high dimensional settings. GPER considers implementation of sparse expectile regression with group Lasso penalty and the group non-convex penalties SCAD/ MCP. However, GPER may fail to tell which groups variables are important for the conditional mean and which groups variables are important for the conditional scale/variance. To that end, we further propose a COupled Group Penalized Expectile Regression (COGPER) regression which can be efficiently solved by an algorithm similar to that for solving GPER. We establish theoretical properties of of the proposed approaches. In particular, GPER and COGPER using the SCAD penalty or MCP is shown to consistently identify the two important subsets for the mean and scale simultaneously. We demonstrate the empirical performance of GPER and COGPER by simulated and real data.

Vendredi 28 février (colloque CRM-ISM, Concordia)

TBA

Jeudi 5 mars : Louis-Paul Rivest (Université Laval)

Titre : Comment échantillonner dans une matrice?

Résumé : Cet exposé s’intéresse à l’échantillonnage d’une population de taille MN dont les unités correspondent aux éléments d’une matrice N×M. L’objectif de l’enquête est d’estimer la moyenne d’une variable d’intérêt y pour toutes les unités de la population. Par exemple, dans une enquête auprès de pêcheurs, les lignes de la matrice sont des sites de pêche et les colonnes des journées et y est l’effort de pêche en heures-pêcheurs. Un échantillon est une matrice N×M Z contenant des 0 (pour les unités non-échantillonnées) et des 1 (pour celles dans l’échantillon); on veut sélectionner un échantillon où les totaux colonnes sont fixes (Z·j=n) et où le totaux lignes Zi· sont égaux à des valeurs prédéterminées {mi}. On s’intéresse à un plan de sondage qui donne des probabilités égales à tous les échantillons possibles. Trois algorithmes de sélection sont comparés : l’échantillonnage équilibré de Deville et Tillé, une méthode basée sur la loi hypergéométrique multidimensionnelle et un dernier basé sur une MCMC. Les propriétés du plan de sondage sont établies : des expressions pour les probabilités de sélection simples et conjointes sont présentées de même qu’une formule pour la variance de l’estimateur de Horvitz-Thompson de la moyenne de y. Une étude Monte Carlo de plusieurs estimateurs pour cette variance est présentée de même qu’une application à l’échantillonnage des pêcheurs pour estimer l’effort de pêche sur le bar rayé en Gaspésie. Ce travail est conjoint avec Sergio Ewane Ebouele.

Jeudi 12 mars : Shirin Golchi (Department of Epidemiology, Biostatistics, & Occupational Health, McGill University)

Title : Design and Analysis of Modern Clinical Trials

Abstract : The traditional approach to design and analysis of clinical trials is no longer sufficient for addressing the complex clinical problems of modern day. Flexible and efficient trial designs are needed to collect data enriched with information together with creative inference methodology to combine information from all available sources. This talk covers a number of interesting problems that arise in the field together with proposed methodology within the Bayesian framework.

Jeudi 19 mars : Marzia A. Cremona (Département d’opérations et systèmes de décision, Université Laval)

Title : Probabilistic K-mean with local alignment to locally cluster curves and discover functional motifs

Abstract : In this work we develop a new method to locally cluster misaligned curves and to address the problem of discovering functional motifs, i.e. typical “shapes” that may recur several times along and across a set of curves, capturing important local characteristics of these curves.

We formulate probabilistic K-mean with local alignment, a novel algorithm that leverages ideas from functional data analysis (joint clustering and alignment of curves), bioinformatics (local alignment through the extension of high similarity “seeds”) and fuzzy clustering (curves belonging to more than one cluster, if they contain more than one typical “shape”). Our methodology can employ various dissimilarity measures and incorporate derivatives in the discovery process, in order to capture different shape characteristics.

After demonstrating the performance of our method on simulated data, and showing how it generalizes other clustering methods for functional data, we apply it to discover functional motifs in “Omics” signals related to mutagenesis and genome dynamics.

Joint work with Francesca Chiaromonte.

Jeudi 26 mars : Yousri Henchiri (Université de la Manouba, Université de Tunis El Manar (ENIT-LAMSIN))

Vendredi 3 avril (colloque CRM-ISM, Université de Montréal)

TBA

Jeudi 16 avril : Jonathan Jalbert (Poly Montréal)

TBA