Seminar: (Auto-)ML for tabular data

What is tabular data? And which model would you use for it? Why is tabular data challenging for machine learning? And how would you compare learning approaches on tabular data?

TL;DR Tabular data is omnipresent and tabular ML offers many solutions.
This seminar will navigate the landscape of ML models for tabular data (which is the ideal playground for AutoML). We will read recent research papers in the field of tabular ML with a focus on large- and pretrained neural networks defining model tabular ML. To get excited, you can have a look at this position paper on why we need more tabular foundation models.

Course Title	(Auto-)ML for tabular data
Course ID	ML4501f
Registration	ILIAS
ECTS	3
Time	Thursdays, 14:15-15:45
Language	english
#participants	max 14
Location	in-person at Maria-von-Linden-Straße 6; seminar room ground floor
organized by	Katharina Eggensperger, Amir Rezaei Balef, Mykhailo Koshil

Why should you attend this seminar?

Tabular data is everywhere any probably you have heard about it in your first machine learning lecture. But what is tabular data? And why is it challenging for machine learning? And what are recent models on this modality?

In this seminar, we will discuss these any many more questions. Additionally, besides learning about this topic and practicing your scientific communication skills, you will also

learn about key contributions in the field of tabular machine learning
learn how to assess the experimental setup of empirical comparisons
be able to discuss recent research on large and pre-trained models for tabular data
gain experience in reading, understanding and presenting research papers

Requirements

We strongly recommend that you know the foundations of machine learning and deep learning, including modern neural architectures and transformer models. Ideally, you also have some experience in applying ML to get the most out of this seminar.

Topics

The seminar focuses on understanding the challenges of learning from tabular representations. We will discuss research papers trying to understand what makes tabular data a challenging data modality for some model classes and state-of-the-art ML methods build to excel on this data modality.

Date	Content
17.10.2024	Orga) / How to give a good presentation
24.10.2024	no meeting
31.10.2024	Intro I
07.11.2024	no meeting
21.11.2024	#1 Tabular Foundation Models [Position / Elephant]
28.11.2024	no meeting
05.12.2024	#2 Interpretability [GAM X LLM / TabNet]
12.12.2024	#3 In-Context Learning [ForestPFN / MotherNet]
19.12.2024	no meeting
26.12.2024	🌲 no meeting
02.01.2025	🎆 no meeting
09.01.2025	no meeting
16.01.2025	#4 Wrap-Up
23.01.2025	buffer / no meeting
30.01.2025	buffer / no meeting
06.02.2025	buffer / no meeting

[Position] Van Breugel et al. Why Tabular Foundation Models Should Be a Research Priority
[Elephant] Bordt et al. Elephants Never Forget: Memorization and Learning of Tabular Data in Large Language Models (arxiv’24)
[GAM X LLM] Bordt et al. Data Science with LLMs and Interpretable Models XAI@AAAI’24, Lou et al. Accurate intelligible models with pairwise interactions (KDD’13)
[TabNet] Arik et al. TabNet: Attentive Interpretable Tabular Learning (AAAI’21)
[ForestPFN] Breejen et al. Why In-Context Learning Transformers are Tabular Data Classifiers (arxiv’24)
[MotherNet] Müller et al. MotherNet: A Foundational Hypernetwork for Tabular Classification (arxiv’23)

How the seminar will look like?

We will meet each week (with a few exceptions). In the first few weeks, we will start with introductory lectures on ML for tabular data (why is this an exciting data modality and why we need AutoML for this) and how to critically review and present research papers. After that, each week, we will have presentations, followed by discussions.

Other Important information

Registration: Please register on ILIAS. The signup will kept open and unlimited until the first meeting. The registration opens on September 30th, 12:00, noon. In the first meeting, I will give an introduction to the topic and the papers. Afterward, we’ll do will do the final and also binding registration and assignment. So, please come to the first lecture!

Grading/Presentations: Grades will be based on your presentation, slides, active participation and a short report. Further details will be discussed in the introductory sessions .