Ken (Kangyi) Peng
Title: Characterizing Stochastic Processes with Applications in Infectious Disease and Sports Analytics
Date: August 1st, 2025
Time: 10:30am
Location: LIB 2020 & Zoom
Supervised by: X. Joan Hu & Tim Swartz
Abstract:
This thesis develops statistical models related to stochastic processes. We present practical strategies for analyses under the proposed models and demonstrate statistical learning through innovative applications with real-world datasets. The work is motivated by two research programs, one on infectious diseases monitoring using wastewater surveillance data, and the other on sports analytics with corner kicks in soccer. The presentation of the thesis is organized accordingly, with three projects in the wastewater surveillance part and two projects in the soccer corner kick part.
In the wastewater surveillance part, we begin by studying time-varying association between COVID-19 hospitalizations and wastewater viral concentrations using a nonlinear model with distributed lag, which is often referred to as a distributed lag nonlinear model (DLNM) in the literature. We then develop an extension of distributed lag models incorporating Markov-modulated random lasting times. A 鈥渟oft鈥 stratification strategy is proposed for the situations where the boundary between strata is ambiguous. In the third project, we jointly model viral signals and hospitalizations as two stochastic processes, connected through a partially hidden infection process. This framework enables indirect inference from aggregated data while the model is constructed at the individual level.
In the sports analytics part, we study the dynamics of corner kick occurrences in soccer, where one corner may lead to quick follow-ups. The first project models corner kick waiting times as a mixture of short and long durations and shows that long waiting times and the time to the first corner in a half follow similar distributions. The second project introduces a novel self-exciting point process with random memory, where the influence of past events persists for a random duration.
Keywords: distributed lag models; joint inference; latent variables; semi-Markov process; self-exciting point process; wastewater surveillance