## Machine Learning and Climate

Spring 2022 | Columbia University

Instructor: Alp Kucukelbir
Course Assistant(s): Nicolas Beltran (nb2838@columbia.edu)

Day and Time: Tuesdays 4:10 – 6:00 p.m.
Location: Zoom (first two sessions), then Chandler 401

PDF

APPLY HERE

### Overview

In this course, we will study two aspects of how ML interacts with Earth's climate.

First, we will investigate how ML can be used to tackle climate change. We will focus on use cases from transportation, manufacturing, food and agriculture, waste management, and atmospheric studies. We will ask questions like: what are the requirements for applying ML to such problems? How can we evaluate the effectiveness of our analyses?

Second, we will consider ML’s own impact on the climate. We will focus on the energy and computation that goes into designing, training, and deploying modern ML systems. We will ask questions like: how can we accurately track and account for ML’s own energy footprint? What strategies can we employ to minimize it?

By the end of this course, you will learn about modern statistical and causal ML methods and their applications to the climate. Our focus will be the modeling of real-world phenomena using probability models, with a focus on vision, time series forecasting, uncertainty quantification, and causality. In addition, you will gain a deeper understanding about the carbon footprint of ML itself and explore how to mitigate it.

### Prerequisites

This is a graduate-level seminar course. The course is open to select senior undergraduate, masters, and doctoral students; no auditors and no pass/fail.

You should be familiar with machine learning and statistics (for example, you took a class where you learned how to do data analysis). You will be conducting independent data analysis in this course; as such, you must be comfortable programming in Python or R.

You should already have a good climate-related dataset in hand. This may be a dataset published alongside a relevant paper, or a dataset that hasn't been used for ML research yet. If you do not have a dataset readily available, you should have a strategy for simulating data for a relevant use case.

### Structure

This course is based around an individual project that you will summarize in a final technical paper. You will be expected to present a relevant use case of either ML’s application to the climate or a study of ML’s own carbon footprint. Your final paper must contain some form of data analysis using ML.

Each class is based around discussion. The discussion will focus on the readings and on your projects. In the first part of each session, we will discuss the readings. In the second part, one or two students will discuss an aspect of the applied side of the material. This discussion can include

• technical issues (modeling, inference, criticism), or
• an important paper or idea that we are not otherwise covering.

The last two classes are dedicated to project presentations. Each student will record a 5 minute video over 2 slides, highlighting a learning that rest of class would appreciate.

There is no textbook for this course. Instead, we will read 2 papers per week. One paper will comprise an application area: an application of ML to the climate or ML’s own carbon footprint. The other paper(s) will be technical; they will provide the foundation necessary for understanding and exploring the ML technique in consideration.

You are graded on completing weekly response to the readings, working consistently on the final project, and participation in the class. Your course grade will be calculated as follows.

Component Percentage
weekly responses to the readings 25 %
weekly progress reports 10 %
participation in class 10 %
final paper 55 %

Your final paper will be evaluated for its relevance to the course material, technical correctness, and writing quality. There is no expectation to have a positive result in your data analysis by the end of the class; it is perfectly acceptable to reach a negative conclusion (e.g., such and such technique is not as good as the state of the art in forecasting water usage) as a result of your exploration.

### Final Project

Each student will work on an individual project, which you will summarize in a final technical paper. You will showcase and document your work through a private git repository.

• abstract.md
• journal.md
• doc/
• src/
• etc/

This repository will document your exploration and coding through the semester.

The file abstract.md simply contains an abstract of the project. At first, it is an aspirational abstract, one that describes the research program you want to complete. You will refine it through the semester.

The file journal.md is a diary of your progress. It contains dated entries with a description of what you are doing, what you found, what you are thinking, and so on. It is mainly a resource for you, but I will glance at it too (at the end of the semester). Please update and commit it at least once per week.

The doc/ directory contains the LaTeX document that you are writing. We will provide a template for your final paper.

The src/ directory contains the code you are writing. The data you are analyzing should live here too.

The etc/ directory contains anything else — materials, notes, photos of whiteboards, and so on — that you want to keep track of.

There should be nothing else in the top level directory of your repository.

Commit often, at least every week. You are graded on the quality of the project and the path that you took to get there.