SpatialReward: Bridging the Perception Gap in Online RL for Image Editing via Explicit Spatial Reasoning

Yancheng Long1*, Yankai Yang1*, Hongyang Wei2, Wei Chen3, Tianke Zhang4, Haonan Fan4, Changyi Liu4, Kaiyu Jiang4, Jiankang Chen4, Kaiyu Tang4, Bin Wen4†✉, Fan Yang4, Tingting Gao4, Han Li4, Shuo Yang1✉
1Harbin Institute of Technology, Shenzhen 2Tsinghua University 3HKUST 4Kuaishou Technology
* Equal contribution Project leader Corresponding authors

Abstract

Online Reinforcement Learning (RL) offers a promising avenue for complex image editing but is currently constrained by the scarcity of reliable and fine-grained reward signals. Existing evaluators frequently struggle with a critical perception gap we term “Attention Collapse,” where models neglect cross-image comparisons and fail to capture fine-grained details, resulting in inaccurate perception and miscalibrated scores. To address these limitations, we propose SpatialReward, a reward model that enforces precise verification via explicit spatial reasoning. By anchoring reasoning to predicted edit regions, SpatialReward grounds semantic judgments in pixel-level evidence, significantly enhancing evaluative accuracy. Trained on a curated 260k spatial-aware dataset, our model achieves state-of-the-art performance on MMRB2 and EditReward-Bench, and outperforms proprietary evaluators on our proposed MultiEditReward-Bench. Furthermore, SpatialReward serves as a robust signal in online RL, boosting OmniGen2 by +0.90 on GEdit-Bench—surpassing the leading discriminative model and doubling the gain of GPT-4o (+0.45). These results demonstrate that spatial reasoning is essential for unlocking effective alignment in image editing.

Overview

Existing reward models suffer from "Attention Collapse," failing to attend to the source image during evaluation. SpatialReward introduces a "Think-with-Boxes" mechanism, predicting edit regions to anchor reasoning and enforce cross-image verification.

SpatialReward Overview

MultiEditReward-Bench (MERBench)

We introduce MultiEditReward-Bench, a challenging benchmark focusing on multi-turn and complex editing scenarios.

MERBench Analysis

Benchmark Results

SpatialReward achieves state-of-the-art performance, outperforming proprietary models on various benchmarks.

Performance Table Category Breakdown

RL Finetuning Results

Using SpatialReward as a reward signal for Online RL significantly improves the performance of base models like OmniGen2.

Training Curves
RL Results

Qualitative Results

RL Training Cases

Attention Visualization

Spatial supervision optimizes the model's attention distribution, focusing on relevant regions.

Attention Cases

BibTeX

@article{long2026spatialreward,
  title={SpatialReward: Bridging the Perception Gap in Online RL for Image Editing via Explicit Spatial Reasoning},
  author={Long, Yancheng and Yang, Yankai and Wei, Hongyang and Chen, Wei and Zhang, Tianke and Fan, Haonan and Liu, Changyi and Jiang, Kaiyu and Chen, Jiankang and Tang, Kaiyu and Wen, Bin and Yang, Fan and Gao, Tingting and Li, Han and Yang, Shuo},
  journal={arXiv preprint arXiv:2602.07458},
  year={2026}
}