Seminar: (Auto-)ML for tabular data
What is tabular data? And which model would you use for it? Why is tabular data challenging for machine learning? And how would you compare learning approaches on tabular data?
TL;DR Tabular data is omnipresent and tabular ML offers many solutions.
This seminar will navigate the landscape of ML models for tabular data (which is the ideal playground for AutoML). We will read recent research papers in the field of tabular ML with a focus on large- and pretrained neural networks defining model tabular ML. To get excited, you can have a look at this position paper on why we need more tabular foundation models.
Course Title | (Auto-)ML for tabular data | |
---|---|---|
Course ID | ML4501f | |
Registration | ILIAS | |
ECTS | 3 | |
Time | Thursdays, 14:15-15:45 | |
Language | english | |
#participants | max 14 | |
Location | in-person at Maria-von-Linden-Straße 6; seminar room ground floor | |
organized by | Katharina Eggensperger, Amir Rezaei Balef, Mykhailo Koshil |
Why should you attend this seminar?
Tabular data is everywhere any probably you have heard about it in your first machine learning lecture. But what is tabular data? And why is it challenging for machine learning? And what are recent models on this modality?
In this seminar, we will discuss these any many more questions. Additionally, besides learning about this topic and practicing your scientific communication skills, you will also
- learn about key contributions in the field of tabular machine learning
- learn how to assess the experimental setup of empirical comparisons
- be able to discuss recent research on large and pre-trained models for tabular data
- gain experience in reading, understanding and presenting research papers
Requirements
We strongly recommend that you know the foundations of machine learning and deep learning, including modern neural architectures and transformer models. Ideally, you also have some experience in applying ML to get the most out of this seminar.
Topics
The seminar focuses on understanding the challenges of learning from tabular representations. We will discuss research papers trying to understand what makes tabular data a challenging data modality for some model classes and state-of-the-art ML methods build to excel on this data modality.
Date | Content |
---|---|
17.10.2024 | Orga) / How to give a good presentation |
24.10.2024 | no meeting |
31.10.2024 | Intro I |
07.11.2024 | no meeting |
21.11.2024 | #1 Tabular Foundation Models [Position / Elephant] |
28.11.2024 | no meeting |
05.12.2024 | #2 Interpretability [GAM X LLM / TabNet] |
12.12.2024 | #3 In-Context Learning [ForestPFN / MotherNet] |
19.12.2024 | no meeting |
26.12.2024 | 🌲 no meeting |
02.01.2025 | 🎆 no meeting |
09.01.2025 | no meeting |
16.01.2025 | #4 Wrap-Up |
23.01.2025 | buffer / no meeting |
30.01.2025 | buffer / no meeting |
06.02.2025 | buffer / no meeting |
- [Position] Van Breugel et al. Why Tabular Foundation Models Should Be a Research Priority
- [Elephant] Bordt et al. Elephants Never Forget: Memorization and Learning of Tabular Data in Large Language Models (arxiv’24)
- [GAM X LLM] Bordt et al. Data Science with LLMs and Interpretable Models XAI@AAAI’24, Lou et al. Accurate intelligible models with pairwise interactions (KDD’13)
- [TabNet] Arik et al. TabNet: Attentive Interpretable Tabular Learning (AAAI’21)
- [ForestPFN] Breejen et al. Why In-Context Learning Transformers are Tabular Data Classifiers (arxiv’24)
- [MotherNet] Müller et al. MotherNet: A Foundational Hypernetwork for Tabular Classification (arxiv’23)
How the seminar will look like?
We will meet each week (with a few exceptions). In the first few weeks, we will start with introductory lectures on ML for tabular data (why is this an exciting data modality and why we need AutoML for this) and how to critically review and present research papers. After that, each week, we will have presentations, followed by discussions.
Other Important information
Registration: Please register on ILIAS. The signup will kept open and unlimited until the first meeting. The registration opens on September 30th, 12:00, noon. In the first meeting, I will give an introduction to the topic and the papers. Afterward, we’ll do will do the final and also binding registration and assignment. So, please come to the first lecture!
Grading/Presentations: Grades will be based on your presentation, slides, active participation and a short report. Further details will be discussed in the introductory sessions .