I'm a Research Scientist at Meta MRS, working on LLM post-training for improving recommendations across the Family of Apps (Reels, Instagram, Facebook and Threads).

I received my PhD from the University of Southern California, advised by Prof. Laurent Itti and Prof. Barry Boehm. I led DeepUSC, supervising 9 MS students in ML research; the group published 4 papers at top venues and placed 4 students in PhD programs.

Research

I'm interested in scaling and generalization of large machine-learning systems, with a current focus on evaluation and post-training. I appreciate good engineering. Most of my published work is grounded in running experiments at scale.

Publications

GISTBench: Evaluating LLM User Understanding via Evidence-Based Interest Verification

Iordanis Fostiropoulos, Muhammad Rafay Azhar, Abdalaziz Sawwan, Boyu Fang, Yuchen Liu, Jiayi Liu, Hanchao Yu, Qi Guo, Jianyu Wang, Fei Liu, Xiangjun Fan

arXiv preprint, 2026

A benchmark for evaluating LLMs' ability to understand users from their interaction histories in recommendation systems. We propose two metric families (Interest Groundedness and Interest Specificity) and evaluate eight open-weight LLMs (7B to 120B), revealing performance bottlenecks in counting and attributing engagement signals across heterogeneous interactions.

Stellar: Systematic Evaluation of Human-Centric Personalized Text-to-Image Methods

Iordanis Fostiropoulos*, Panos Achlioptas*, Alexandros Benetatos*, Dimitris Skourtis* (* equal contribution)

arXiv preprint, 2023

Personalized text-to-image generation centered on human subjects. We introduce a 20,000-prompt dataset, an evaluation framework with metrics aligned to human judgment, and a baseline that achieves SOTA without per-subject fine-tuning.

Stream: A Generalized Continual Learning Benchmark and Baseline

Iordanis Fostiropoulos, Jiaye Zhu, Laurent Itti

arXiv preprint, 2023

A multi-modal General Continual Learning benchmark that constructs long task sequences with controlled learning-gap between tasks. We introduce αMetaSup, a Transformer-based novelty detector trained on a dummy stream, achieving up to 10.5% AUC over prior baselines.

Probing Reasoning of Language Models with Inductive In-Context Learning

Iordanis Fostiropoulos, Laurent Itti

IJCAI (oral), Workshop on Knowledge-Based Compositional Generalization, 2023

Previous work evaluates the reasoning of Language Models (LMs) using tests that can be inapplicable to a LM. We propose an unbiased evaluation setting using complex regular expressions composed of Quasi-Natural Language. We evaluate the inductive reasoning ability of an LM to generate Facts that abide by the Rules. When a Rule is injected in the training sequence, the LM learns to associate Facts with Rules implicitly; when probed with Inductive In-Context Learning, it generates probable Facts.

ABLATOR: Robust Horizontal-Scaling of Machine Learning Ablation Experiments

Iordanis Fostiropoulos, Laurent Itti

AutoML, 2023

Ablation experiments require many trials per ML-model component. We propose ABLATOR, a stateful experiment-design framework that scales a single experiment to thousands of trials while remaining fault-tolerant. We performed the largest ablation experiment for tabular Transformers to date, evaluating 2,337 models, and open-source the framework.

Trustworthy Model Evaluation on a Budget

Iordanis Fostiropoulos, Bowman Brown, Laurent Itti

ICLR, RTML Workshop, 2023

Errors in the ablation setup can lead to incorrect explanations of which method components contribute to performance. We quantify the selection bias of HPO strategies and show that only random sampling produces reliable conclusions about the top and mean performance of a method under a limited compute budget.

Batch Model Consolidation: A Multi-Task Model Consolidation Framework

Iordanis Fostiropoulos, Jiaye Zhu, Laurent Itti

CVPR, 2023

We incrementally learn new tasks for a base model using multiple learners in a distributed fashion, consolidating their knowledge at large incremental steps. Larger consolidation steps reduce catastrophic forgetting compared to smaller steps. Simpler methods outperform state-of-the-art in our challenging Stream Benchmark.

Lightweight Learner for Shared Knowledge Lifelong Learning

Yunhao Ge, Yuecheng Li, Di Wu, Ao Xu, Adam M. Jones, Amanda Sofie Rios, Iordanis Fostiropoulos, Shixian Wen, Po-Hsuan Huang, Zachary William Murdock, Gozde Sahin, Shuo Ni, Kiran Lekkala, Sumedh Anand Sontakke, Laurent Itti

TMLR, 2023

A Shared Knowledge Lifelong Learning (SKILL) challenge: a decentralized population of LL agents each sequentially learn different tasks, then share and consolidate knowledge over a decentralized communication network so all agents end up mastering all tasks.

Implicit Feature Decoupling with Depthwise Quantization

Iordanis Fostiropoulos, Barry Boehm

CVPR, 2022

A Vector Quantization method for latent features that improves image reconstruction. Decomposing features in latent space and learning an auto-encoder end-to-end implicitly decouples features and improves reconstruction.

Multimodal Phased Transformer for Sentiment Analysis

Iordanis Fostiropoulos*, Junyan Cheng*, Barry Boehm, Mohammad Soleymani (* equal contribution)

EMNLP, 2021

The quadratic complexity of self-attention limits Transformer deployment on low-resource devices. We propose multimodal Sparse Phased Transformer (SPT) to alleviate self-attention's complexity and memory footprint.

Learning Hyperbolic Representations of Topological Features

Panagiotis Kyriakis, Iordanis Fostiropoulos, Paul Bogdan

ICLR, 2021

We learn representations of persistence diagrams on the Poincaré ball. By placing infinite-persistence features infinitesimally close to the boundary, their distance to non-essential features approaches infinity, preserving relative importance.

Talks

Transformers And Beyond

Iordanis Fostiropoulos · April 4, 2023

A summary of Transformer models, given to 221 students at the University of Southern California as part of the Artificial Intelligence curriculum (CSCI 561).

Past projects

ABLATOR: a distributed PyTorch training framework for machine-learning experiments. Built during my PhD; six student interns contributed and went on to FAANG.

Service

I review for CVPR, ICCV, NeurIPS, AAAI, AutoML, IJCAI, and AISTATS.