Project overview
Missing data are a common problem that arise in many fields and can significantly complicate analysis. While there are established ways to handle data that is missing at random, it is often the case that missing readings are structured in some way. Dealing with such structured missingness is substantially more challenging yet occurs commonly in healthcare (and other) contexts. Consequently, there is a need to develop rigorous ways to understand structured missingness and develop tools to handle it appropriately.
In our project, we develop mathematical and computational tools to extract and analyse the patterns of missingness in complex, heterogeneous datasets. We use networks, and higher-order generalisations, to represent potentially very intricate patterns of missingness in data, in a very scalable yet interpretable way, in order to uncover the hidden organisational structure of the missing data.
We will provide a geometric theory of “missingness” and robust computational tools for assessing the extent to which a dataset contains “structured missingness”. As such our project will contribute to the fundamental understanding of data missingness structure, both directly, as well as providing simple metrics that can be used as input for subsequent downstream models (e.g., machine learning tools that predict patient outcomes). Our basic science methodology is general enough that it can be applied to other complex heterogeneous datasets to quantify and analyse general patterns in the organisation of missing data.
In our project, we develop mathematical and computational tools to extract and analyse the patterns of missingness in complex, heterogeneous datasets. We use networks, and higher-order generalisations, to represent potentially very intricate patterns of missingness in data, in a very scalable yet interpretable way, in order to uncover the hidden organisational structure of the missing data.
We will provide a geometric theory of “missingness” and robust computational tools for assessing the extent to which a dataset contains “structured missingness”. As such our project will contribute to the fundamental understanding of data missingness structure, both directly, as well as providing simple metrics that can be used as input for subsequent downstream models (e.g., machine learning tools that predict patient outcomes). Our basic science methodology is general enough that it can be applied to other complex heterogeneous datasets to quantify and analyse general patterns in the organisation of missing data.