Batch Model Consolidation: A Multi-Task Model Consolidation Framework

Iordanis Fostiropoulos     Jiaye Zhu     Laurent Itti

University of Southern California

[arXiv]     [Code]     [Dataset]


In Continual Learning (CL), a model is required to learn a stream of tasks sequentially without significant performance degradation on previously learned tasks. Current approaches fail for a long sequence of tasks from diverse domains and difficulties. Many of the existing CL approaches are difficult to apply in practice due to excessive memory cost or training time, or are tightly coupled to a single device. With the intuition derived from the widely applied mini-batch training, we propose Batch Model Consolidation (BMC) to support more realistic CL under conditions where multiple agents are exposed to a range of tasks. During a regularization phase, BMC trains multiple expert models in parallel on a set of disjoint tasks. Each expert maintains weight similarity to a base model through a stability loss, and constructs a buffer from a fraction of the task’s data. During the consolidation phase, combine the learned knowledge on `batches’ of expert models using a batched consolidation loss in memory data that aggregates all buffers. We thoroughly evaluate each component of our method in an ablation study and demonstrate the effectiveness on standardized benchmark datasets Split-CIFAR-100, Tiny-ImageNet, and the Stream dataset composed of 71 image classification tasks from diverse domains and difficulties. Our method outperforms the next best CL approach by 70% and is the only approach that can maintain performance at the end of 71 tasks.


Intuition: similar to mini-batch training, batched task training can reduce the local minima and improve the convexity of the loss landscape.


BMC optimizes multiple expert models from a single base model in parallel on different tasks, enforcing parameter-isolation. Experts are regularized during training to reduce the forgetting on tasks learned by base model. A new base model is consolidated by batched distillation from the experts.

A single incremental step of BMC

BMC supports distributed training where experts are trained locally on remote devices. Artifacts are sent back to the central device for consolidation training. The parallelism of this framework enables BMC to learn long task sequences efficiently.

Paralleled multi-expert training framework


AutoDS implements the logic for processing and managing a large sequence of datasets, and provides a method to train on interdisciplinary tasks by projecting all datasets on the same dimension, by extracting features from pre-trained models.

See the repository for dataset installation and usages.

Download the extracted features for Stream datasets here.

Class-Incremental Learning

We show on the Stream dataset with CLIP embedding that our method outperforms all other baselines in the Class-Incremental Learning scenario. Our implementation of BMC as well as the baselines can be found here.

Experiment result on Stream


  title={Batch Model Consolidation: A Multi-Task Model Consolidation Framework},
  author={Fostiropoulos, Iordanis and Zhu, Jiaye and Itti, Laurent},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},