2020 Cohort (MRes)
Matt Allen
PhD project: Automated tree species classification from forest Terrestrial Laser Scanning Data. Supervisors: Dr Emily Lines (Dept of Geography), Dr Stuart Grieve (QMUL) My project focuses on applying existing and novel techniques based on remote sensing and machine learning literature to remote sensing data, to provide a comprehensive exploration of the process of large-scale tree mortality. Tree mortality - the death of trees in forest and woodland - is a key measure of forest ecosystem health, determining community dynamics, carbon residence times and forest turnover rates. Over the last 50 years, human activity has substantially changed global climate and this is predicted to increase global forest mortality, triggered by increasing temperatures, extended and more frequent drought, and more common insect or pathogen outbreaks - with some change in mortality rates having already been observed. Assessing change in mortality rates and their corresponding causes is challenged by the typical scale at which mortality is assessed, and the understanding of the drivers and spatial patterns of large-scale tree mortality is therefore limited. Literature indicates that the sampling strategy required to detect even changes as large as a doubling in stem mortality is expected to be extensive. For my PhD I will examine the effects of climate-driven mortality at the level of individual stems, by observing changes in growth rate during crown deterioration, observed in UAV data. I will also explore climate-related mortality at a larger scale - by observing changes in tree mortality from satellite data, and inferring the causes of such mortality through spatial and temporal correlations with climate-related stressors. Based on these results, I also aim to predict the extent of future mortality, informed by projections from contemporary climate models. Finally, I intend to use uncertainty estimates from machine learning models to inform ground data acquisition, by selecting sampling locations that result in optimal improvements at the output of other processes, such as land surface modelling. Prior to joining the CDT, I completed an undergraduate and Master's degree in Electrical and Information Sciences at the University of Cambridge. |
Herbie Bradley
PhD project: Chemical mechanism emulation in climate model simulations for exascale supercomputing applications Supervisors: Dr Luke Abraham (Dept of Chemistry), Dr Alex Archibald (Dept of Chemistry) Atmospheric chemistry mechanisms consist of a set of chemical equations and reactions that can be translated into a system of coupled ordinary differential equations. Given a set of initial conditions, numerical models will integrate this system forward in time to produce output concentrations for each chemical species at each time step. These models can be very computationally expensive, particularly those with the thousands or tens of thousands of chemical species and reactions that are required to achieve the most accurate models. As a result, atmospheric chemistry is an expensive component of Earth system models, yet this chemistry is important and necessary in order to accurately model chemicals such as ozone, which are not directly emitted into the atmosphere but have widespread effects on climate. The study of emulators (also known as surrogate models) for climate modelling is a rapidly growing area of research that aims to build fast, machine-learned approximations for a large array of computationally expensive and complex numerical models. These emulators can often produce their output orders of magnitude faster than the numerical model they were trained on, and as a result can be used to efficiently quantify the uncertainty in models, correct their biases, or just to make fast predictions using different input parameters. Being able to use fast, reliable emulators is particularly useful for domains which typically require large ensembles of climate projections, such as climate policy analysis or uncertainty quantification. My PhD project will aim to build an emulator for a full state-of-the-art atmospheric chemical mechanism - the UKCA (UK chemistry and aerosols) model - in order to replace this model in the UKESM1 Earth system model. If sufficiently accurate, this emulator would be able to significantly accelerate simulations of past and future climate change. Data for training emulations will primarily come from UKCA model outputs, although large additional datasets are available via the CMIP6 (coupled model intercomparison project 6) archive, data from UKESM1 simulation runs, and numerous observational datasets. The project aim will be accomplished by exploring three major challenges and opportunities (chemically informed machine learning, data efficiency and stability) to improve emulation in this space, and working on new techniques that can address them. I have a Master's degree in machine learning at the University of Warwick and also completed my undergraduate degree at Warwick majoring in Computer Science and Maths. |
Luke Cullen
PhD project: Reducing emissions uncertainties using data fusion within graph representations Supervisors: Dr Jonathan Cullen (Dept of Engineering), Prof Srinivasan Keshav (Computer Lab) Accurate global Green-House Gas (GHG) emissions accounting is key for targeting climate change mitigation strategies. The United Nations Framework Convention on Climate Change (UNFCCC) GHG inventory summarises bottom-up emissions estimates divided by country, industry and gas. However, this summary contains significant gaps within industry breakdowns and data from non “Annex I” countries, including those the UN classifies as least developed where reporting mechanisms are often inadequate. In Annex I countries with abundant proxy data, notably concerning energy production and consumption, estimating CO2 emissions indirectly has achieved some success, but current global assessments still rely on extrapolation from, often inaccurate, old or assumed data. I propose to systematically and justifiably fill emissions knowledge gaps using machine learning whilst quantifying prediction uncertainty. Graph representation learning is a rapidly expanding field underpinned by message-passing between nodes, representing entities such as countries or individual facilities, connected by edges assigned between similar nodes. The flexible nature of graph structures allows for node embeddings which avoid the need for elaborate feature engineering and easily accommodate multimodal data inputs, both distinct advantages for this application relative to other machine learning techniques. The primary aim of my PhD project is to complete gaps in the UNFCCC GHG emissions inventory with a systematic mapping of emissions. This will inform international policy-making, help identify carbon leakage and unreported emissions, and with further development could augment and improve Sankey flows, which map supply chain and product flow emissions. The flexible and end-user friendly framework will also be an excellent basis for finer detailed emission source accounting. I graduated from the University of Leeds in 2017 with a Masters in geophysics, after which I spent 2 years working in the mining and energy industries in Australia, and 1 year as a software developer in London. In my free time, pandemic permitting, I play rugby, surf and compete in triathlons. |
Arduin Findeis
PhD project: Applying reinforcement learning to grid-connected energy systems Supervisors: Prof Srinivasan Keshav (Computer Lab), Prof Jon Crowcroft (Computer Lab) My PhD project focuses on applying reinforcement learning (RL) to building energy optimisation. The make-up of electrical grids is changing: there is an increasing number of energy systems that involve renewable energy generation, energy storage, smart controllable devices, electric vehicle charging and other recently improved technologies. As an example of the rapid rate of change, the global capacity of solar photovoltaic installations has been estimated to have increased by over 700 percent between 2011 and 2019. The energy systems introduced by these changes raise complex control problems. If controlled well, the systems may be able to effectively replace emission-intensive grid energy with local renewable energy and prevent demand peaks that would need to be covered by fossil-fuel generators. Thereby, controllers that are well adapted for these systems have the potential to help mitigate climate change. Existing control methods, such as model predictive control, often lack the flexibility to fully capture the potential cost and emission savings enabled by these systems. In this PhD project I aim to investigate the use of RL in place of such conventional energy system controllers. RL is a general machine learning-based control method that may provide more flexibility than other existing methods. Within the last ten years, the integration deep neural networks in RL methods has allowed for RL to be used to outperform human level performance for the first time for several tasks, including at Atari games and the board game of Go. Building on this work, and other work applying RL to energy systems, this project aims to investigate how RL can be best used to improve energy efficiency in buildings. The initial focus of the project is on a specific kind of residential energy system that combines solar photovoltaic panels with a home battery. Based on the findings from this specific case, more general solutions in the space will be investigated. I completed an undergraduate degree in Mathematics at the University of Edinburgh and most recently an MPhil in Machine Learning and Machine Intelligence at Cambridge, prior to joining the AI4ER CDT. |
Katie Green
PhD project: Understanding ecosystem dynamics to protect marine ecosystems Supervisors: Dr Simeon Hill (British Antarctic Survey) and Bianca Dumitrascu ( Computer Lab) My PhD project will focus on attempting to improve understanding of the dynamics of marine ecosystems and how these are impacted by climate change through the application of machine learning. The motivation behind this project is the importance of ocean conservation and healthy ocean ecosystems for combatting the climate crisis and biodiversity decline as well as the reliance of billions of people worldwide on ocean ecosystems for their livelihoods and food security. Bringing machine learning methods into this field has the potential to improve understanding of the complex and nonlinear dynamics of these systems and their drivers, such as spatial variability and climate variables. The primary challenge of this project will be the adaptation of appropriate machine learning methods to the data which is often sparse and noisy. The results of this work will have implications for policy decisions regarding ocean conservation and fishing. The integration of these findings into policy recommendations will form a key part of this project. Prior to joining the CDT, I completed a Physics degree at Durham University and graduated in 2020. |
Seb Hickman
PhD project: Determining the risks and drivers of extreme ozone events during heatwaves with machine learning and causal inference Supervisors: Dr Alex Archibald (Dept of Chemistry), Dr Peer Nowack (UEA) Surface ozone is a major pollutant affecting human health, crop yields and the carbon cycle. Ozone pollution increases the likelihood of respiratory illness in humans, contributing to an estimated 1 million deaths worldwide annually, and an estimated cost of billions of dollars per year in crop losses. Extreme surface ozone events typically occur during heatwaves, and the combined effect of these risks has been estimated as a 0.33% increase in daily deaths. As heatwaves become more common in a warming climate, it is increasingly important to understand and quantify their effects on ozone pollution. Currently, there is considerable uncertainty surrounding the mechanisms driving the relationship between heatwaves and ozone, including the relative contributions of chemical, meteorological and physical processes. A better understanding of the drivers of the relationship between heatwaves and ozone will result in clearer guidance for policy-makers concerned with reducing the adverse effects of ozone pollution. While there is a clear relationship between ozone and temperature established by climate model and observational studies, due to the complexity of the system there is uncertainty surrounding the physical mechanisms that drive this relationship. The core aims of my PhD project are to identify possible drivers, to quantify their relative strengths, and establish a causal model of the system. In the first part of my project I willl build an improved machine learning model, including predictors such as anthropogenic and biogenic emissions, atmospheric stagnation and solar radiation in addition to temperature to better predict ozone concentrations. In the second part of my project I will aim to determine the causal relationships linking variables during extreme ozone events, which has not been treated rigorously in the literature. Using existing knowledge of the physical system, I will build a structural causal model of factors contributing to extreme ozone events. Causal discovery algorithms may also be used (e.g. LPCMCI), which seek to uncover lagged causal relationships in time series data. Before joining the CDT I studied Natural Sciences at Cambridge, specialising in atmospheric chemistry. |
Yilin Li
PhD Project: Using advanced sensor technologies, detailed health outcomes and AI techniques to investigate the underlying mechanisms of air pollution on health Supervisors: Dr Chiara Giorio ( Dept of Chemistry), Prof Rod Jones (Dept of Chemistry), Dr Lia Chatzidiakou (Dept of Chemistry), Prof Mark Girolami (Dept of Engineering) This is an interdisciplinary PhD that brings together computer science and atmospheric science to address some significant uncertainties in the field of air pollution epidemiology. Recent advancements in sensor technologies enable us to expand the coverage of exposure in previously under-researched environments in large-scale health studies in China, Bangladesh, the UK and elsewhere. Together with detailed health and wellbeing investigations, we have collected integrated databases that offer insights on environmental health risks during daily life in a way that has not been possible before. These complex, rapidly expanding databases call for a new methodological framework to answer important epidemiological questions and provide the underpinning science necessary to guide policy and evaluate air quality interventions. We expect that during this PhD we will further develop the analytical tools to revolutionise multi-pollutant personal exposure assessments, as part of the INGENIOUS (Strategic Priority Fund, NERC) and other forthcoming projects. For my PhD, I will first use the AIRLESS (Effects of AIR pollution on cardiopuLmonary disEaSe in urban and peri-urban reSidents in Beijing) databases to develop new analytical methodologies including machine learning, and novel exposure source disaggregation methodologies, to investigate the underlying mechanisms of environmental risks on health. Much of this work is intended to be exploratory but will develop the basis of an ensemble of tools with wider applicability for use in the later phases of the PhD. During the second and third years, the expectation is that further development of the analytical tools will take place and that more exposure and health data will become available, and this will be input into different models to evaluate the potential of different source attribution and data analysis methodologies. Before coming to Cambridge, I completed a Bachelors in Environmental Engineering jointly provided by Nanjing University of Information Science and Technology and the University of Reading. In Reading, I worked on assessment of carbonaceous aerosols in atmospheric fine particles in Nanjing in south China: seasonal variations, components, and potential sources for my dissertation. |
Joycelyn Longdon
PhD project: Monitoring Ghanaian Forests with Bioacoustics, Machine Learning and Indigenous Knowledge Supervisors: Prof Alan Blackwell (Computer lab), Prof Jennifer Gabrys (Dept of Sociology) In recent years, remote sensing and machine learning have emerged as invaluable tools supporting the understanding and monitoring of forest ecosystems to aid conservation efforts. The vast majority of forest conservation research is centered on Indigenous lands, yet it often operates at a significant remove from Indigenous communities themselves, who are the traditional custodians of the conservation priority areas. Although remote sensing and machine learning techniques have supported scalability within the field, these methods can widen the disconnect between conservation projects and indigenous communities and weaken important links and connections to essential local knowledge. There is an urgent need for conservation to make use of state-of-the-art data science, but it is imperative that the benefits gained from including local knowledge and participation in the conservation process are not lost. Through community co-creation, the deployment of acoustic recorders and the subsequent application of machine learning (ML) techniques my project looks to investigate the intersection, or lack thereof, of biodiversity and forest health indicators between Indigenous Knowledge and ML analysis and classification and develop an ethical AI in a forest conservation framework. I joined the AI4ER CDT in the 2020 Cohort and am looking to ground my research in supporting new modes of engagement with environmental risk, from global marginalised communities. I'm interested in exploring how visualisation, citizen science, mixed-initiative interaction, crowdsourcing, and distributed cognition can be utilised, for example, in ensuring that indigenous knowledge systems are accommodated in algorithms, infrastructure, and representations. I also run ClimateInColour, a platform at the intersection of climate science and social justice making climate conversations more diverse and accessible. |
Simon Mathis
PhD project: How will cold adapted life respond to climate change? – Using artificial intelligence to decipher life in the cold. Superviosrs: Prof Pietro Lio (Computer Lab), Dr Melody Clark (British Antarctic Survey) Most natural proteins lose their ability to function as temperature deviates from their optimal operating point. While the stability and function of proteins at ambient and high temperatures is well understood, the way in which proteins and proteostasis work in cold-adapted multicellular life is unknown. Consequently, it is unclear how cold-adapted organisms will respond to even slight changes in temperature. My PhD project aims to help understand the stability of cold-adapted (psychrophilic) proteins and proteostasis in multicellular organisms to shifts in environmental conditions. I will (1) computationally model cold adaptation for single proteins with sequence modelling methods from natural language processing and consecutively (2) extend to the cellular level by modelling the interplay of these proteins in a metabolic model and leveraging ideas from graph machine learning, to answer the following research questions:
I will be using the experimental system Harpagifer antarcticus (a small and abundant Antarctic fish, for which the genome squence is available), to investigate these research questions. Besides helping to decipher the fundamental question of how life works at low temperatures, an understanding of proteins and proteostasis in cold environments has wide-ranging environmental consequences includding understanding the risk to Antarctic biodiversity and combatting climate change with biotechnology. I joined the CDT in 2020. Before joining, I completed a Masters in Physics from ETH Zurich, where I worked on simulating quantum field theory on quantum computers, and spent 1 year working as strategy consultant at BCG and as machine learning engineer at a Swiss tech start-up. Twitter @SimMat20 |
Ira Shokar
PhD Project: Data-Driven Exploration of Parameterisation Schemes within Models of the Tropospheric Mid-Latitudes Supervisors: Prof Peter Haynes (Dept of Applied Math and Theoretical Physics), Prof RIch Kerswell, (Dept of Applied Math and Theoretical Physics) Seasonal and longer-term prediction of mid-latitude weather and climate remains a major challenge due to the complexity of the mechanisms that drive the dynamics as well as their chaotic nature. This PhD project will extend my MRes research, which looked at encapsulating the dynamics of a simplified model of atmospheric circulation by using a neural network to find a reduced-order-model of the system. During the PhD I will investigate the following research questions:
Naturally the two research questions are coupled, as the ability to accurately encapsulate the dynamics of a system will allow us to explore questions regarding its nature, variability, and the tendencies of the parameterisation due to how the model represents these processes, while understanding the variability will inform how best to produce a representation. My project is motivated by the issue of Global Circulation Models (GCMs), that describe oceanic and atmospheric dynamics, being incredibly computationally expensive, and as a result not all scales can be simulated. Processes that take place on length scales smaller than the spatial resolution of the GCM, as well as fast scale dynamics, must be approximated, with these approximations known as sub-grid parameterisations. A large source of model uncertainty is a result of their parameterisation schemes, due to the chaotic nature of the dynamics, leading to questions regarding the fidelity of these models and thus their usefulness. To explore the system variability due to the stochastic parameterisations (addressing question 1), ML will be used to find a mapping between a system state and how stable the system is with regards to the attractor that is driving the system. By quantifying stability this way one can begin to understand the impact of the stochastic forcing on the dynamics of the system, and how observed phenomena are driven. This will also be compared to a stratified system to observe how the parameterisations drive the system compare with a system that exhibits its own internally generated turbulence. We finally will look to develop emulations of stochastic parameterisations using Deep Learning (question 2), with the goal of being able to utilise the computational speed-up of ML networks to include more complexity without increasing the time to run whole models- leading to better projections. Prior to coming to Cambridge I completed a Bachelor’s degree in Theoretical Physics at University College, London, writing my thesis on ‘Using Domain Adversarial Networks for Model Classification Robustness’. |
Simon Thomas
PhD project: Hybrid tropical cyclones hazard modelling Supervisors: Dr Dan Jones (British Antarctic Survey), Dr John Taylor (Dept of Applied Math and Theoretical Physics) Tropical cyclones are a devastating natural hazard. As heat engines running between the sea surface and the tropopause, global warming will lead to them becoming more intense. Their most deadly and costly aspect are storm surges. Storm surge models are computationally expensive, and this limits the reliability information that can be gathered on the exposure of a point on the coast to storm surges. A recent study by UK researchers on tsunami models has shown that replacing expensive physical models with machine learning emulations can decrease the computational cost by several orders of magnitude, and thus lead to more robust information as to the return periods of events of a given size. This has yet to be attempted for storm surge models, and so a goal of this PhD will be to see if this or a similar approach can work. In particular, we will try to see if ‘physics-aware’ machine learning approaches can improve the generalisability of the model. We hope that this work will be useful, as we try to understand how the risks from tropical cyclone storm surges will change under different climate change scenarios in the coming decades. Before joining the AI4ER programme, I studied Natural Sciences (Physics) at the University of Cambridge. |
Leyu (Natalie) Yao
PhD project: Machine Learning for Studying Ocean Submesoscale Eddies and Ocean Particle Trajectories Supervisors: Dr John Taylor (Dept of Applied Math and Theoretical Physics), Dr Dan Jones (British Antarctic Survey) My project will start by developing a new method to use Machine Learning techniques (specifically the Gaussian Mixture Model) to identify the presence of submesoscale eddies using vertical density profiles from the upper ocean. Next, the method will be applied to a large global observational database. Beyond that, the plan is to develop a technique to identify coherent structures from data sampled along Lagrangian particle trajectories from models and observations. This will be used to study the influence of submesoscale eddies on ‘active’ particles (e.g. phytoplankton, degrading microplastics) with the ultimate aim of improving parameterizations of these processes in ocean models. Before coming to Cambridge, I finished my undergraduate in the US through a 3/2 Engineering Program between Haverford College and the California Institute of Technology (Caltech). I obtained a Bachelor’s degree in Math and Physics from Haverford and a Bachelor’s degree in Applied and Computational Math (ACM) from Caltech. At Caltech, I worked on a research project on reducing the damage caused by cascading failures in power grids. |