Jae-Won Chung

Ph.D. Candidate @ UMich CSE

Summary

I'm a third year PhD candidate in CSE at the University of Michigan. I build efficient software systems for deep learning, with a recent focus on the efficient management of not only time, but also energy.

I view energy as a new first-class systems resource. I am particularly interested in understanding how energy is different from other resources and building software systems that can reduce energy in a manner orthogonal to hardware advancements.

I lead the ML.ENERGY initiative. I am fortunate to be advised by Professor Mosharaf Chowdhury and be part of SymbioticLab.

Publications

Perseus: Removing Energy Bloat from Large Model Training

Preprint, 2023

Jae-Won Chung, Yile Gu, Insu Jang, Luoxi Meng, Nikhil Bansal, Mosharaf Chowdhury

Chasing Low‑Carbon Electricity for Practical and Sustainable DNN Training

ICLR Workshop (Tackling Climate Change with Machine Learning), 2023

Zhenning Yang, Luoxi Meng, Jae-Won Chung, Mosharaf Chowdhury

Zeus: Understanding and Optimizing GPU Energy Consumption of DNN Training

USENIX NSDI, 2023 (Acceptance rate = 18.38%)

Jie You*, Jae-Won Chung*, Mosharaf Chowdhury (* Equal Contribution)

ShadowTutor: Distributed Partial Distillation for Mobile Video DNN Inference

ACM ICPP, 2020 (Acceptance rate = 28.99%)

Jae-Won Chung, Jae-Yun Kim, Soo-Mook Moon

Experience

Graduate Student Research Assistant

Sep 2021 - Present

Advisor: Prof. Mosharaf Chowdhury

Building energy-efficient software systems for machine learning. I created Zeus, the first energy optimization system for DNN training on GPUs. Zeus is a PyTorch ecosystem project and serves as the bedrock for Chase, a carbon-efficient DNN training solution, the ML.ENERGY Leaderboard, the first energy benchmark for LLM inference, the ML.ENERGY Colosseum, an interactive service that lets users compare LLM responses in terms of both quality and energy consumption, and Perseus, a large model training energy optimizer that reduces per-iteration energy consumption by up to 30% without training slowdown.

Keywords:

  • MLSys
  • Energy
  • LLM
  • Training
  • Inference
  • Open Source

Research Intern

Mar 2020 - May 2022

Advisor: Prof. Byung-Gon Chun

Developed Crane, a GPU cluster manager for elastic AutoML jobs. Wrote components for automatic cluster bootstrapping on Docker Swarm and enabled full operation on top of Kubernetes. Worked on efficient AutoML scheduling policies on GPU clusters.

Keywords:

  • MLSys
  • AutoML
  • Training
  • Cluster Management
  • Scheduling
Dec 2019 - Jun 2020

Advisor: Prof. Soo-Mook Moon

Created ShadowTutor, a server-client collaborative DNN inference system that distills knowledge from a server-side large DNN to a small DNN on the client in an online fashion.

Keywords:

  • MLSys
  • Inference
  • Knowledge Distillation

Research Intern

Jun 2019 - Dec 2019

Advisor: Prof. Kyoung Mu Lee

Worked on finding better meta-initialization points for Model-Agnostic Meta-Learning (MAML) using LSTM-based neural memory modules. Also worked on embedding images of the same class into a single class embedding vector and augmenting MAML with self-attention scores derived from class embeddings.

Keywords:

  • ML
  • Computer Vision
  • Meta-Learning
  • Few-Shot Classification
  • Optimization
Jun 2019 - Aug 2019

Advisor: Prof. Jongho Lee

Designed and implemented CAD-QSMNet, a full deep learning pipeline for Quantitative Susceptibility Mapping (QSM) for brain MRI images, including a new U-Net variant model.

Keywords:

  • ML
  • Computer Vision
  • Medical Imaging
  • Data Engineering

Open Source Projects

Number of stars and forks are as of March 6th, 2024.
BERT4Rec-VAE-Pytorch (318 78)

Implementation of BERT4Rec and Netflix VAE recommendation models.

  • Python
  • PyTorch
  • RecSys
  • |
  • GitHub
Reason (184 4)

A shell for research papers. Supports UNIX-like commands that instead work on a set of research papers.

Zeus (117 16)

An energy measurement and optimization framework for Deep Learning. A PyTorch Ecosystem project.

Pegasus (27 3)

An SSH command runner with a focus on simplicity. Useful when you have a bunch of commands to run and a bunch of SSH nodes available.

Education

  • PhD, Computer Science and Engineering
    (In progress)
    University of Michigan
    Sep 2021 - Present
  • MS, Computer Science and Engineering
    University of Michigan
    Sep 2021 - Apr 2023
  • BS, Electrical and Computer Engineering
    Summa cum laude
    Seoul National University
    Mar 2015 - Aug 2021

Proficiency

Languages

  • Python
  • Rust
  • Go, C++, CUDA, Verilog
  • Zig, JavaScript

Tools and Frameworks

  • FastAPI, Mkdocs, Pandas, NumPy
  • PyTorch, Kubernetes, LaTeX

Others

  • Commandline
  • Neovim
  • GitHub
  • Open Source
  • Documentation

Honors & Awards

  • Second Best Solution in Carbon Hack '22
    $25,000 prize with Chase.
  • Kwanjeong Overseas Scholarship
    $100,000 awarded over four years.
  • Best Tutor Award
    SNU computer architecture, Fall 2020.
  • Kwanjeong Undergraduate Scholarship
    $20,000 awarded over two years.

Teaching

  • Undergrad Operating Systems
    Provided Linux kernel lectures, four Linux-based term projects, and team design reviews.
    Spring 2021
  • Undergrad Computer Architecture
    Gave 30 hours of online lecture as peer tutor. Best tutor award!
    Fall 2020

Community Service

English Proficiency

Interests

  • Software Systems
  • Deep Learning
  • Fingerstyle Guitar