2020 Cohort
Matt Allen
PhD project title: Few-shot learning for scalable forest monitoring from remote sensing data Supervisors: Dr Emily Lines (Dept of Geography), Dr Stuart Grieve (QMUL) PhD project description: My project focuses on applying existing and novel techniques based on remote sensing and machine learning literature to remote sensing data, to provide a comprehensive exploration of the process of large-scale tree mortality. Tree mortality - the death of trees in forest and woodland - is a key measure of forest ecosystem health, determining community dynamics, carbon residence times and forest turnover rates. Over the last 50 years, human activity has substantially changed global climate and this is predicted to increase global forest mortality, triggered by increasing temperatures, extended and more frequent drought, and more common insect or pathogen outbreaks - with some change in mortality rates having already been observed. Assessing change in mortality rates and their corresponding causes is challenged by the typical scale at which mortality is assessed, and the understanding of the drivers and spatial patterns of large-scale tree mortality is therefore limited. Literature indicates that the sampling strategy required to detect even changes as large as a doubling in stem mortality is expected to be extensive. For my PhD I will examine the effects of climate-driven mortality at the level of individual stems, by observing changes in growth rate during crown deterioration, observed in UAV data. I will also explore climate-related mortality at a larger scale - by observing changes in tree mortality from satellite data, and inferring the causes of such mortality through spatial and temporal correlations with climate-related stressors. Based on these results, I also aim to predict the extent of future mortality, informed by projections from contemporary climate models. Finally, I intend to use uncertainty estimates from machine learning models to inform ground data acquisition, by selecting sampling locations that result in optimal improvements at the output of other processes, such as land surface modelling. Prior to joining the CDT, I completed an undergraduate and Master's degree in Electrical and Information Sciences at the University of Cambridge. |
Herbie Bradley
PhD project title: Inference Efficiency for Climate Foundation Models Supervisors: Dr Samuel Albanie (Dept of Engineering) PhD project description: In recent years, the field of machine learning research has been gravitating towards the development of larger and more complex models, known as foundation models, which entail the use of greater computational power and resources. This trend has resulted in a significant increase in the energy consumption and cost of these models, with no indication of a slowdown. Consequently, the cumulative emissions resulting from training models for both ML research and commercial deployments has increased considerably over the last three years. Furthermore, as models such as ChatGPT are deployed to hundreds of millions of users worldwide, inference has become a more substantial contributor to the total emission output of ML, with some estimates putting the workload from an average ML model at 80-90% inference processing. For the most commonly used large language models, the environmental and financial costs of inference now outweigh those incurred during the training phase. Due to these factors, there are significant gains to be made in making large foundation model inference more efficient and accessible, both in terms of computational intensity (which translates into emissions), and in terms of hardware cost to run (which determines accessibility). The focus of my project will be on developing efficient inference techniques for large transformer foundation models, which are commonly used in climate AI applications such as weather now-casting and emulation. The primary aim is to enhance the sustainability of these models by minimising their energy consumption and associated emissions during inference. The project will entail the creation and implementation of fast open-source inference frameworks, which will enable climate AI model developers and researchers to deploy their models more efficiently. Techniques such as weight and gradient quantisation, distillation of emergent capabilities for large models, parameter-efficient fine-tuning, speculative sampling techniques, and sparse attention modules will be utilised to achieve this objective. By doing so, this project will contribute to a more sustainable future for AI and ensure that foundation models can be utilised at scale in a responsible and environmentally conscious manner. Prior to joining the CDT, I graduated with a Master's degree in machine learning at the University of Warwick and also completed my undergraduate degree at Warwick majoring in Computer Science and Maths. |
Luke Cullen
PhD project title: Reducing emissions uncertainties using data fusion within graph representations Supervisors: Prof Jonathan Cullen (Dept of Engineering), Prof Srinivasan Keshav (Computer Lab) PhD project description: Accurate global Green-House Gas (GHG) emissions accounting is key for targeting climate change mitigation strategies. The United Nations Framework Convention on Climate Change (UNFCCC) GHG inventory summarises bottom-up emissions estimates divided by country, industry and gas. However, this summary contains significant gaps within industry breakdowns and data from non “Annex I” countries, including those the UN classifies as least developed where reporting mechanisms are often inadequate. In Annex I countries with abundant proxy data, notably concerning energy production and consumption, estimating CO2 emissions indirectly has achieved some success, but current global assessments still rely on extrapolation from, often inaccurate, old or assumed data. I propose to systematically and justifiably fill emissions knowledge gaps using machine learning whilst quantifying prediction uncertainty. Graph representation learning is a rapidly expanding field underpinned by message-passing between nodes, representing entities such as countries or individual facilities, connected by edges assigned between similar nodes. The flexible nature of graph structures allows for node embeddings which avoid the need for elaborate feature engineering and easily accommodate multimodal data inputs, both distinct advantages for this application relative to other machine learning techniques. The primary aim of my PhD project is to complete gaps in the UNFCCC GHG emissions inventory with a systematic mapping of emissions. This will inform international policy-making, help identify carbon leakage and unreported emissions, and with further development could augment and improve Sankey flows, which map supply chain and product flow emissions. The flexible and end-user friendly framework will also be an excellent basis for finer detailed emission source accounting. Prior to joining the CDT, I graduated from the University of Leeds in 2017 with a Masters in geophysics, after which I spent 2 years working in the mining and energy industries in Australia, and 1 year as a software developer in London. In my free time I play rugby, surf and compete in triathlons. |
Arduin Findeis
PhD project title: Automating the evaluation of machine learning applications and their environmental impact Supervisors: Prof Srinivasan Keshav (Dept of Computer Science and Technology), Prof Jon Crowcroft (Dept of Computer Science and Technology) PhD project description: I work on approaches to automate the evaluation of machine learning (ML) applications, aiming to improve our understanding of the potential impact of these applications – especially with respect to the environment. Evaluating ML applications is a time-consuming process involving many people: there are experts curating benchmarks, red teams searching failure modes, crowd-workers labelling vast model logs, and so on. The limitations of current evaluation methods are leading to a growing gap between ML models' progress and our understanding of the implications of this progress – especially with respect to ML models' intended and unintended environmental impacts. My work aims to address this gap by providing automation tools to make ML model evaluation more effective. Currently, I am working on automatic evaluation approaches for language and multimodal models. Previous work introduced tools automating access and generation of evaluation tasks for ML-based low-emission building control. Prior to joining the CDT, I completed an undergraduate degree in Mathematics at the University of Edinburgh and most recently an MPhil in Machine Learning and Machine Intelligence at Cambridge. |
Seb Hickman
PhD project title: Determining the risks and drivers of extreme ozone events during heatwaves with machine learning and causal inference Supervisors: Prof Alex Archibald (Dept of Chemistry), Dr Peer Nowack (UEA) PhD project description: Surface ozone is a major pollutant affecting human health, crop yields and the carbon cycle. Ozone pollution increases the likelihood of respiratory illness in humans, contributing to an estimated 1 million deaths worldwide annually, and an estimated cost of billions of dollars per year in crop losses. Extreme surface ozone events typically occur during heatwaves, and the combined effect of these risks has been estimated as a 0.33% increase in daily deaths. As heatwaves become more common in a warming climate, it is increasingly important to understand and quantify their effects on ozone pollution. Currently, there is considerable uncertainty surrounding the mechanisms driving the relationship between heatwaves and ozone, including the relative contributions of chemical, meteorological and physical processes. A better understanding of the drivers of the relationship between heatwaves and ozone will result in clearer guidance for policy-makers concerned with reducing the adverse effects of ozone pollution. While there is a clear relationship between ozone and temperature established by climate model and observational studies, due to the complexity of the system there is uncertainty surrounding the physical mechanisms that drive this relationship. The core aims of my PhD project are to identify possible drivers, to quantify their relative strengths, and establish a causal model of the system. In the first part of my project I willl build an improved machine learning model, including predictors such as anthropogenic and biogenic emissions, atmospheric stagnation and solar radiation in addition to temperature to better predict ozone concentrations. In the second part of my project I will aim to determine the causal relationships linking variables during extreme ozone events, which has not been treated rigorously in the literature. Using existing knowledge of the physical system, I will build a structural causal model of factors contributing to extreme ozone events. Causal discovery algorithms may also be used (e.g. LPCMCI), which seek to uncover lagged causal relationships in time series data. Prior to joining the CDT, I studied Natural Sciences at Cambridge, specialising in atmospheric chemistry. |
Yilin Li
PhD project title: Using advanced sensor technologies, detailed health outcomes and AI techniques to investigate the underlying mechanisms of air pollution on health Supervisors: Dr Chiara Giorio ( Dept of Chemistry), Prof Rod Jones (Dept of Chemistry), Dr Lia Chatzidiakou (Dept of Chemistry), Prof Mark Girolami (Dept of Engineering) PhD project description: This is an interdisciplinary PhD that brings together computer science and atmospheric science to address some significant uncertainties in the field of air pollution epidemiology. Recent advancements in sensor technologies enable us to expand the coverage of exposure in previously under-researched environments in large-scale health studies in China, Bangladesh, the UK and elsewhere. Together with detailed health and wellbeing investigations, we have collected integrated databases that offer insights on environmental health risks during daily life in a way that has not been possible before. These complex, rapidly expanding databases call for a new methodological framework to answer important epidemiological questions and provide the underpinning science necessary to guide policy and evaluate air quality interventions. We expect that during this PhD we will further develop the analytical tools to revolutionise multi-pollutant personal exposure assessments, as part of the INGENIOUS (Strategic Priority Fund, NERC) and other forthcoming projects. For my PhD, I will first use the AIRLESS (Effects of AIR pollution on cardiopuLmonary disEaSe in urban and peri-urban reSidents in Beijing) databases to develop new analytical methodologies including machine learning, and novel exposure source disaggregation methodologies, to investigate the underlying mechanisms of environmental risks on health. Much of this work is intended to be exploratory but will develop the basis of an ensemble of tools with wider applicability for use in the later phases of the PhD. During the second and third years, the expectation is that further development of the analytical tools will take place and that more exposure and health data will become available, and this will be input into different models to evaluate the potential of different source attribution and data analysis methodologies. Prior to joining the CDT, I completed a Bachelors in Environmental Engineering jointly provided by Nanjing University of Information Science and Technology and the University of Reading. In Reading, I worked on assessment of carbonaceous aerosols in atmospheric fine particles in Nanjing in south China: seasonal variations, components, and potential sources for my dissertation. |
Joycelyn Longdon
PhD project title: Monitoring Ghanaian Forests with Bioacoustics, Machine Learning and Indigenous Knowledge Supervisors: Prof Alan Blackwell (Dept of Computer Science and Technology), Prof Jennifer Gabrys (Dept of Sociology) PhD project description: In recent years, remote sensing and machine learning have emerged as invaluable tools supporting the understanding and monitoring of forest ecosystems to aid conservation efforts. The vast majority of forest conservation research is centered on Indigenous lands, yet it often operates at a significant remove from Indigenous communities themselves, who are the traditional custodians of the conservation priority areas. Although remote sensing and machine learning techniques have supported scalability within the field, these methods can widen the disconnect between conservation projects and indigenous communities and weaken important links and connections to essential local knowledge. There is an urgent need for conservation to make use of state-of-the-art data science, but it is imperative that the benefits gained from including local knowledge and participation in the conservation process are not lost. Through community co-creation, the deployment of acoustic recorders and the subsequent application of machine learning (ML) techniques my project looks to investigate the intersection, or lack thereof, of biodiversity and forest health indicators between Indigenous Knowledge and ML analysis and classification and develop an ethical AI in a forest conservation framework. I am looking to ground my research in supporting new modes of engagement with environmental risk, from global marginalised communities. I'm interested in exploring how visualisation, citizen science, mixed-initiative interaction, crowdsourcing, and distributed cognition can be utilised, for example, in ensuring that indigenous knowledge systems are accommodated in algorithms, infrastructure, and representations. I also run ClimateInColour, a platform at the intersection of climate science and social justice making climate conversations more diverse and accessible. |
Simon Mathis
PhD project title: How will cold adapted life respond to climate change? – Using artificial intelligence to decipher life in the cold. Supervisors: Prof Pietro Lio (Dept of Computer Science and Technology), Dr Melody Clark (British Antarctic Survey) PhD project description: Most natural proteins lose their ability to function as temperature deviates from their optimal operating point. While the stability and function of proteins at ambient and high temperatures is well understood, the way in which proteins and proteostasis work in cold-adapted multicellular life is unknown. Consequently, it is unclear how cold-adapted organisms will respond to even slight changes in temperature. My PhD project aims to help understand the stability of cold-adapted (psychrophilic) proteins and proteostasis in multicellular organisms to shifts in environmental conditions. I will (1) computationally model cold adaptation for single proteins with sequence modelling methods from natural language processing and consecutively (2) extend to the cellular level by modelling the interplay of these proteins in a metabolic model and leveraging ideas from graph machine learning, to answer the following research questions:
I will be using the experimental system Harpagifer antarcticus (a small and abundant Antarctic fish, for which the genome squence is available), to investigate these research questions. Besides helping to decipher the fundamental question of how life works at low temperatures, an understanding of proteins and proteostasis in cold environments has wide-ranging environmental consequences includding understanding the risk to Antarctic biodiversity and combatting climate change with biotechnology. Prior to joining the CDT, I completed a Masters in Physics from ETH Zurich, where I worked on simulating quantum field theory on quantum computers, and spent 1 year working as strategy consultant at BCG and as machine learning engineer at a Swiss tech start-up. Twitter @SimMat20 |
Ira Shokar
PhD project title: Data-Driven Exploration of Parameterisation Schemes within Models of the Tropospheric Mid-Latitudes Supervisors: Prof Peter Haynes (Dept of Applied Math and Theoretical Physics), Prof RIch Kerswell, (Dept of Applied Math and Theoretical Physics) PhD project description: Seasonal and longer-term prediction of mid-latitude weather and climate remains a major challenge due to the complexity of the mechanisms that drive the dynamics as well as their chaotic nature. This PhD project will extend my MRes research, which looked at encapsulating the dynamics of a simplified model of atmospheric circulation by using a neural network to find a reduced-order-model of the system. During the PhD I will investigate the following research questions:
Naturally the two research questions are coupled, as the ability to accurately encapsulate the dynamics of a system will allow us to explore questions regarding its nature, variability, and the tendencies of the parameterisation due to how the model represents these processes, while understanding the variability will inform how best to produce a representation. My project is motivated by the issue of Global Circulation Models (GCMs), that describe oceanic and atmospheric dynamics, being incredibly computationally expensive, and as a result not all scales can be simulated. Processes that take place on length scales smaller than the spatial resolution of the GCM, as well as fast scale dynamics, must be approximated, with these approximations known as sub-grid parameterisations. A large source of model uncertainty is a result of their parameterisation schemes, due to the chaotic nature of the dynamics, leading to questions regarding the fidelity of these models and thus their usefulness. To explore the system variability due to the stochastic parameterisations (addressing question 1), ML will be used to find a mapping between a system state and how stable the system is with regards to the attractor that is driving the system. By quantifying stability this way one can begin to understand the impact of the stochastic forcing on the dynamics of the system, and how observed phenomena are driven. This will also be compared to a stratified system to observe how the parameterisations drive the system compare with a system that exhibits its own internally generated turbulence. We finally will look to develop emulations of stochastic parameterisations using Deep Learning (question 2), with the goal of being able to utilise the computational speed-up of ML networks to include more complexity without increasing the time to run whole models- leading to better projections. Prior to joining the CDT, I completed a Bachelor’s degree in Theoretical Physics at University College, London, writing my thesis on ‘Using Domain Adversarial Networks for Model Classification Robustness’. |
Simon Thomas
PhD project title: Hybrid tropical cyclones hazard modelling Supervisors: Dr Dave Munday (British Antarctic Survey), Dr John Taylor (Dept of Applied Math and Theoretical Physics) PhD project description:Tropical cyclones are a devastating natural hazard. As heat engines running between the sea surface and the tropopause, global warming will lead to them becoming more intense. Their most deadly and costly aspect are storm surges. Storm surge models are computationally expensive, and this limits the reliability information that can be gathered on the exposure of a point on the coast to storm surges. A recent study by UK researchers on tsunami models has shown that replacing expensive physical models with machine learning emulations can decrease the computational cost by several orders of magnitude, and thus lead to more robust information as to the return periods of events of a given size. This has yet to be attempted for storm surge models, and so a goal of this PhD will be to see if this or a similar approach can work. In particular, we will try to see if ‘physics-aware’ machine learning approaches can improve the generalisability of the model. We hope that this work will be useful, as we try to understand how the risks from tropical cyclone storm surges will change under different climate change scenarios in the coming decades. Prior to joining the CDT, I studied Natural Sciences (Physics) at the University of Cambridge. |
Leyu (Natalie) Yao
PhD project title: Machine Learning for Studying Ocean Submesoscale Eddies and Ocean Particle Trajectories Supervisors: Dr John Taylor (Dept of Applied Math and Theoretical Physics), Dr Dan Jones (British Antarctic Survey) PhD project description: My project will start by developing a new method to use Machine Learning techniques (specifically the Gaussian Mixture Model) to identify the presence of submesoscale eddies using vertical density profiles from the upper ocean. Next, the method will be applied to a large global observational database. Beyond that, the plan is to develop a technique to identify coherent structures from data sampled along Lagrangian particle trajectories from models and observations. This will be used to study the influence of submesoscale eddies on ‘active’ particles (e.g. phytoplankton, degrading microplastics) with the ultimate aim of improving parameterizations of these processes in ocean models. Prior to joining the CDT, I finished my undergraduate in the US through a 3/2 Engineering Program between Haverford College and the California Institute of Technology (Caltech). I obtained a Bachelor’s degree in Math and Physics from Haverford and a Bachelor’s degree in Applied and Computational Math (ACM) from Caltech. At Caltech, I worked on a research project on reducing the damage caused by cascading failures in power grids. |