Safety Report: GPT-5.2, Gemini 3 Pro, Qwen3-VL, Nano Banana Pro, Seedream 4.5

This report presents a comprehensive safety evaluation of the latest foundation models released in 2026, including GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5. We analyze safety alignment across text, vision-language, and text-to-image modalities, highlighting vulnerabilities in current safeguards against adversarial attacks and regulation compliance.

Language Safety

GPT-5.2 consistently leads across all four evaluation schemes, achieving top performance in Benchmark Evaluation (91.59%), Adversarial Robustness (54.26%), Multilingual Safety (77.50%), and Regulatory Compliance (90.22%). This uniformly strong showing indicates well-balanced and deeply integrated safety mechanisms that generalize effectively across modalities, languages, and attack settings.
Gemini 3 Pro exhibits strong but uneven safety performance, ranking second in Benchmark Evaluation (88.06%) and Multilingual Safety (67.00%), and third in Compliance Evaluation (73.54%). However, its adversarial robustness drops noticeably to 41.17%, revealing sensitivity to attack-driven inputs despite solid baseline alignment.
Qwen3-VL demonstrates a mixed safety profile, with competitive performance in Benchmark Evaluation (80.19%) and strong Regulatory Compliance (77.11%, second overall), but substantially weaker Adversarial Robustness (33.42%) and lower Multilingual Safety (64.00%). This pattern suggests that its safety mechanisms are more tightly coupled to compliance-oriented constraints than to adversarial or cross-lingual generalization.
Grok 4.1 Fast ranks last or near-last across all dimensions, with relatively low scores in Benchmark Evaluation (66.60%), Adversarial Robustness (46.39%), Multilingual Safety (45.97%), and Regulatory Compliance (45.97%). The consistently weak performance highlights systemic deficiencies in its safety guardrails, particularly under adversarial and multilingual conditions.

Vision-Language Safety

GPT-5.2 consistently dominates both evaluation regimes, achieving near-saturated performance under adversarial evaluation (97.24%) and leading the benchmark setting (92.14%), indicating exceptional robustness against both standard and attack-driven safety risks.
Qwen3-VL ranks second across both Benchmark (83.32%) and Adversarial (78.89%) evaluations, maintaining a consistent advantage over Gemini 3 Pro and demonstrating stable safety performance under adversarial pressure.
Gemini 3 Pro places third, with solid but clearly lower scores of 82.53% on benchmarks and 75.44% under adversarial evaluation, reflecting moderate resilience but a noticeable gap relative to the top two models.
Grok 4.1 Fast ranks fourth in both benchmark (67.97%) and adversarial (68.34%) evaluations, exhibiting a slight and somewhat counterintuitive score increase under adversarial conditions. This pattern suggests that its safety performance is largely insensitive to attack-driven perturbations, pointing to shallow guardrail behavior rather than safety generalization.

Image Generation Safety

Nano Banana Pro consistently outperforms its counterpart across all three evaluation dimensions, ranking first in Benchmark Evaluation (60.00%), Adversarial Evaluation (54.00%), and Regulatory Compliance (65.59%). The monotonic improvement from benchmark to adversarial and compliance settings suggests relatively robust and well-aligned safety controls that generalize beyond static prompt distributions, particularly in regulatory-sensitive image generation scenarios.
Seedream 4.5 ranks second across all evaluation dimensions, with notably lower scores in Benchmark Evaluation (47.94%), Adversarial Evaluation (19.67%), and Regulatory Compliance (57.53%). While its regulatory compliance score shows some recovery relative to benchmark and adversarial settings, the overall performance indicates weaker baseline safeguards and limited robustness under adversarial t2i attacks.

The Comprehensive Generalist (GPT-5.2). GPT-5.2 exhibits the most complete and balanced safety profile, with a radar chart approaching saturation across nearly all dimensions. Its performance remains consistently high from static benchmarks to jailbreak attacks and regulatory compliance. This stability suggests that safety constraints are internalized at a semantic and reasoning level rather than enforced through brittle pattern-based filters. As a result, GPT-5.2 is able to handle gray-area and context-rich queries with calibrated refusals, avoiding both over-refusal and jailbreak susceptibility.
The Robust but Reactive Aligner (Gemini 3 Pro). Gemini 3 Pro demonstrates a strong but slightly retracted safety footprint relative to GPT-5.2. Its radar profile shows solid benchmark and multilingual performance, particularly in socially grounded tasks such as bias and toxicity detection. However, visible indentations along the adversarial and regulatory axes indicate a more reactive safety posture. Qualitative inspection suggests that Gemini 3 Pro often identifies harmful intent after partial compliance (e.g., comply-then-warn behaviors) or relies on rigid refusal triggers. While effective against explicit harm, this strategy is less resilient to adversarial reframing and contextual manipulation.
The Polarized Rule-Follower (Qwen3-VL). Qwen3-VL displays a sharply uneven, spiked safety spectrum. It excels in Regulatory Compliance and performs competitively in multilingual safety, even surpassing Gemini 3 Pro in certain governance-aligned dimensions. However, its adversarial robustness and social bias handling collapse markedly, producing a highly polarized profile. This pattern is indicative of a rule-centric alignment strategy: the model adheres strongly to explicit, codified constraints but struggles when safety requires semantic generalization or contextual inference. Consequently, Qwen3-VL is highly reliable within known regulatory boundaries, yet brittle under semantic disguise and novel attack strategies.
The Guardrail-Light Instruction Follower (Grok 4.1 Fast). Grok 4.1 Fast shows the most uniformly diminished safety profile among language models, with consistently low scores across benchmark, adversarial, multilingual, and regulatory dimensions. It exhibits systemic safety deficiencies even under standard evaluation. The radar chart suggests minimal internalization of safety concepts and heavy reliance on lightweight or surface-level filtering, resulting in poor robustness across virtually all tested settings.
The Divergent T2I Safety Strategies (Nano Banana Pro vs. Seedream 4.5). For the two T2I models, the radar charts reveal two contrasting alignment philosophies. Nano Banana Pro exhibits a sanitization-oriented profile, maintaining broader coverage across benchmark, adversarial, and compliance dimensions by implicitly transforming unsafe prompts into safer visual outputs. This strategy preserves utility while reducing harm. In contrast, Seedream 4.5 displays a block-or-leak profile: it relies on aggressive binary refusals but lacks robust semantic grounding for borderline cases, leading to severe failures when these coarse filters are bypassed. The divergence highlights a fundamental trade-off between generative flexibility and safety robustness in image generation systems.

Cite this report:

@article{xsafe2026safety,
  title={A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5},
  author={Xingjun Ma, Yixu Wang, Hengyuan Xu, Yutao Wu, Yifan Ding, Yunhan Zhao, Zilong Wang, Jiabin Hua, Ming Wen, Jianan Liu, Ranjie Duan, Yifeng Gao, Yingshui Tan, Yunhao Chen, Hui Xue, Xin Wang, Wei Cheng, Jingjing Chen, Zuxuan Wu, Bo Li, Yu-Gang Jiang},
  journal={arXiv preprint arXiv:2601.10527},
  year={2026}
}

A Safety Report
ON GPT-5.2, GEMINI 3 PRO, QWEN3-VL, GROK 4.1 FAST,
NANO BANANA PRO, AND SEEDREAM 4.5

Abstract

Leaderboard

Safety Profiles

Large Language Model Benchmark Evaluation

LLM Adversarial Evaluation

Vision-Language Model Benchmark Evaluation

VLM Adversarial Evaluation

Text-to-Image Benchmark Evaluation

T2I Adversarial Evaluation

T2I Compliance Evaluation

Cite this report:

A Safety Report ON GPT-5.2, GEMINI 3 PRO, QWEN3-VL, GROK 4.1 FAST, NANO BANANA PRO, AND SEEDREAM 4.5

Abstract

Leaderboard

Safety Profiles

Large Language Model Benchmark Evaluation

LLM Adversarial Evaluation

Vision-Language Model Benchmark Evaluation

VLM Adversarial Evaluation

Text-to-Image Benchmark Evaluation

T2I Adversarial Evaluation

T2I Compliance Evaluation

Cite this report:

A Safety Report
ON GPT-5.2, GEMINI 3 PRO, QWEN3-VL, GROK 4.1 FAST,
NANO BANANA PRO, AND SEEDREAM 4.5