Understanding Sakana.ai's Evolutionary Model Merging | Paper Notes: Evolutionary Optimization of Model Merging Recipes

This is a summary of the paper "Evolutionary Optimization of Model Merging Recipes," which describes Sakana.ai's evolutionary model merging approach.

Introduction

The paper being summarized is:

arxiv.org

All figures shown in this article are cited from the paper above.

Let's dive in.

Note: This article was translated from my original post.

Evolutionary Optimization of Model Merging Recipes

Overview

  • Background
    • Model merging is gaining attention as a cost-effective approach to building models.
  • Challenge
    • However, current model merging relies heavily on human intuition, experience, and domain knowledge—essentially a "black art."
  • What they did
    • Applied evolutionary algorithms to model merging.
    • Combined approaches in both PS (parameter space/weights) and DFS (data flow space/layers).
    • Conducted experiments using two approaches: merging Japanese-math reasoning models and creating a Japanese VLM model.
  • Results
    • Cross-domain model merging (Japanese model + math reasoning model) produced models with superior performance compared to existing models.
    • Achieved results surpassing existing models on VLM tasks involving Japanese cultural context.

Method

  • Performed merging at the PS (parameter space/weight level), DFS (data flow space/layer level), and a combination of both.

Overview of evolutionary model merging

Results

LLM Tasks

Table 1: Performance comparison of LLM models

  • MGSM-JA: Benchmark measuring mathematical reasoning ability in Japanese
  • JP-LMEH: Benchmark measuring general Japanese language ability
  • Source models for merging
    • Model 1: Good at Japanese but poor mathematical ability
    • Models 2, 3: Strong mathematical ability but weak Japanese ability → Poor MGSM-JA results
  • Merged models
    • Overall, results are better than source models
    • Model 4 in particular, despite being created by merging models from different domains, demonstrates high performance
    • PS merge is more effective than DFS merge
    • Combining PS merge and DFS merge (Model 6) can further improve results (on MGSM-JA)

VLM Tasks

Table 3: VLM performance evaluation

  • JA-VG-VQA-500: Benchmark evaluating general VQA (Visual Question Answering) performance in Japanese
  • JA-VLM-Bench-In-the-Wild: Benchmark evaluating complex VQA performance in the context of Japanese culture
  • Both benchmarks were newly created by the authors for this study
  • Merged models show higher performance than source models

Example output for image recognition tasks based on Japanese cultural context

Conclusion/Thoughts

That concludes my summary notes on the paper "Evolutionary Optimization of Model Merging Recipes."

Below are my personal thoughts:

[Related Articles]

en.bioerrorlog.work

References