Seminar: Tabular Machine Learning
Tabular data is everywhere and often at the core of data science tasks, from healthcare to e-commerce and the natural sciences. Yet it comes with unique challenges and research questions for machine learning:
- What makes tabular data different from text or images?
- Which models work best, and why is it hard to beat simple baselines?
- How do recent advances in large and pre-trained models reshape the field?
- What is the role of LLMs in the field of tabular tasks?
In this seminar, we will explore the evolving landscape of ML for tabular data, with a special focus on predictive tasks and the rise of foundation models. We will read and discuss recent research papers and critically examine approaches. As I am new to the department, I am especially excited to use this seminar to dive into an active research area and get to know many of you.
Interested in a teaser? Check out this position paper on why we need more tabular foundation models
| Course Title | Tabular Machine Learning |
|---|---|
| Course ID | INF-MSc-102 |
| Registration | drop me an email |
| ECTS | 3 |
| Time | Wednesdays, 10:15-11:45 |
| Language | english |
| #participants | max 10 |
| Location | in-person Jv25; seminar room 4th floor |
| organized by | Katharina Eggensperger w/ Amir Rezaei Balef, Mykhailo Koshil |
Requirements
Familiarity with basic machine learning concepts (e.g., supervised learning, training/validation/test splits, overfitting), standard ML models, and modern DL architectures. Motivation to read (state-of-the-art) research papers in machine learning.
Topics
The seminar focuses on understanding the challenges of learning from tabular representations. We will discuss research papers trying to understand what makes tabular data a challenging data modality for some model classes and state-of-the-art ML methods build to excel on this data modality.
| Date | Content |
|---|---|
| 15.10.2025 | intro I |
| 22.10.2025 | no meeting |
| 29.10.2025 | no meeting |
| 05.11.2025 | no meeting |
| 12.11.2025 | no meeting |
| 19.11.2025 | intro II |
| 26.11.2025 | #1: Benchmarking / TabPFN |
| 03.12.2025 | no meeting |
| 10.12.2025 | no meeting |
| 17.12.2025 | #2: Scale TabPFN / Other TFMs |
| 24.12.2025 | 🌲 no meeting |
| 31.12.2025 | 🎆 no meeting |
| 07.1.2026 | ⛄ no meeting |
| 14.01.2026 | #3: LLMs / Agents |
| 21.01.2026 | buffer / no meeting |
| 28.01.2026 | buffer / no meeting |
Session #1
- [Benchmarking] Erickson et al. TabArena: A Living Benchmark for Machine Learning on Tabular Data (NeurIPS’25) and Holzmüller at al. Better by Default: Strong Pre-Tuned MLPs and Boosted Trees on Tabular Data (NeurIPS’24)
- [TabPFN] Hollman et al. TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second (ICLR’23) and Hollman et al. Accurate predictions on small data with a tabular foundation model (Nature’25)
Session #2
- [Scale TabPFN] Qu et al. TabICL: A Tabular Foundation Model for In-Context Learning on Large Data (ICML’25) and Zeng et al. [2506.05584] TabFlex: Scaling Tabular Learning to Millions with Linear Attention (ICML’25)
- [Other TFMs] Kim et al. CARTE: pretraining and transfer for tabular learning (ICML’24) Kim at al. Table Foundation Models: on knowledge pre-training for tabular learning (arxiv’25)
Session #3
- [LLMs] Hegselman et al. TabLLM: Few-shot Classification of Tabular Data with Large Language Models (AISTATS’23) and Gardner at al. Large Scale Transfer Learning for Tabular Data via Language Modeling Better by Default: Strong Pre-Tuned MLPs and Boosted Trees on Tabular Data (NeurIPS’24)
- [Agents] Guo et al. DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning (ICML’24) and Nam et al. MLE-STAR: Machine Learning Engineering Agent via Search and Targeted Refinement (NeurIPS’25)
How the seminar will look like?
We will regularly throughout the semester. In the first few weeks, we will start with introductory lectures on ML for tabular data and how to critically review and present research papers. After that, we will have several sessions with presentations, followed by discussions.
Other Important information
Grading/Presentations: Grades will be based on your presentation, slides, active participation and a short report. Further details will be discussed in the introductory sessions .