Alp Kucukelbir

Machine Learning and Climate

Fall 2023 | Columbia University

Instructor: Alp Kucukelbir
Course Assistant(s): Clayton Sanford (clayton@cs.columbia.edu)

Day and Time: Tuesdays 4:10 – 6:00 p.m.
Location: Uris Hall 326

Weekly Schedule

PDF

READINGS

Application

APPLY HERE

Overview

In this course, we will study two aspects of how ML interacts with Earth's climate.

First, we will investigate how ML can help mitigate climate change. We will focus on use cases from transportation, manufacturing, food and agriculture, waste management, and atmospheric studies. We will ask questions like: what are the requirements for applying ML to such problems? How can we evaluate the effectiveness of our analyses?

Second, we will consider ML’s own impact on the climate. We will focus on the energy and computation that goes into designing, training, and deploying modern ML systems. We will ask questions like: how can we accurately track and account for ML’s own energy footprint? What strategies can we employ to minimize it?

By the end of this course, you will learn about modern statistical and causal ML methods and their applications to the climate. Our focus will be the modeling of real-world phenomena using probability models, with a focus on vision, time series forecasting, uncertainty quantification, and causality. In addition, you will gain a deeper understanding about the carbon footprint of ML itself and explore how to minimize it.

Prerequisites

This is a graduate-level seminar course. The course is open to select senior undergraduate, masters, and doctoral students; no auditors and no pass/fail. All students must submit an application.

You should be familiar with machine learning and statistics (for example, you took a class where you learned how to do data analysis). You will be conducting independent data analysis in this course; as such, you must be comfortable programming in Python or R.

You should already have a good climate-related dataset in hand. This may be a dataset published alongside a relevant paper, or a dataset that hasn't been used for ML research yet. If you do not have a dataset readily available, you should have a strategy for simulating data for a relevant use case.

Here are a few resources for climate-related projects:

Structure

This course is based around an individual project that you will summarize in a final technical paper. You will be expected to present a relevant use case of either ML’s application to the climate or a study of ML’s own carbon footprint. Your final paper must contain some form of data analysis using ML and will be accompanied by a GitHub repository with Python/R code.

Each class is based around discussion. The discussion will focus on the readings and on your projects. In the first part of each session, we will discuss the readings. In the second part, one or two students will present a tidbit. This may include

background about your project,
technical issues (modeling, inference, criticism), or
an important paper or idea that we are not otherwise covering.

The last class is dedicated to project presentations. Each student will record a short video over 2 slides, highlighting a learning that the rest of class would appreciate.

Readings

There is no textbook for this course. Instead, we will read 2 papers per week. One paper will focus on a domain: an application of ML to the climate or ML’s own carbon footprint. The other paper(s) will be technical; they will provide the foundation necessary for understanding and exploring the ML technique in consideration.

Course Grade

You are graded on completing weekly response to the readings, working consistently on the final project, and participation in the class. There is one problem set. Your course grade will be calculated as follows.

Component	Percentage
problem set	10 %
weekly responses to the readings	20 %
weekly progress reports	10 %
participation in class	10 %
final paper	50 %

There is only one problem set for this class; it is due by the end of the third lecture. If you struggle with this problem set, you may want to reconsider taking this course at a later point in your studies. This problem set gauges your comfort level with the statistical fundamentals this course builds upon.

Your weekly response and progress reports are graded solely on submitting them on time. We will score your weekly responses as 0 for "below expectations", 1 for "meets expectations", and 2 for "exceeds expectations". These scores are simply meant to help you calibrate your engagement with the material.

Your final paper will be evaluated for its relevance to the course material, technical correctness, and writing quality. There is no expectation to have a positive result in your data analysis by the end of the class; it is perfectly acceptable to reach a negative conclusion (e.g., such and such technique is not as good as the state of the art in forecasting water usage) as a result of your exploration.

Your final paper should be at most 8 pages long and prepared with the course LaTeX template. This is inclusive of images, tables, and bibliography.

Final Project

Each student will work on an individual project, summarized in a final technical paper. You will showcase and document your work through a private GitHub repository.

Please organize your repository as follows:

abstract.md
journal.md
doc/
src/
etc/

This repository will document your exploration and coding through the semester.

The file abstract.md simply contains an abstract of the project. At first, it is an aspirational abstract, one that describes the research program you want to complete. You will refine it through the semester.

The file journal.md is a diary of your progress. It contains dated entries with a description of what you are doing, what you found, what you are thinking, and so on. It is mainly a resource for you, but I will glance at it too (at the end of the semester). Please update and commit it at least once per week.

The doc/ directory contains the LaTeX document that you are writing. We will provide a template for your final paper.

The src/ directory contains the code you are writing. The data you are analyzing should live here too.

The etc/ directory contains anything else — materials, notes, photos of whiteboards, and so on — that you want to keep track of.

There should be nothing else in the top level directory of your repository.

Commit often, at least every week to provide an update to your journal. You are graded on the quality of the project and the path that you took to get there.