Tencent HY-WU: Dynamic LoRA for Precise Image Editing

Tencent introduces HY-WU, a Weight Unleashing framework that generates dynamic LoRA adapters to solve gradient conflicts in multi-task image editing.

by HowAIWorks Team
TencentHY-WULoRADynamic AdaptersImage EditingComputer VisionMultimodalHunyuanGenerative AI

Introduction

In the world of generative AI, adapting base models to specific tasks often relies on LoRA (Low-Rank Adaptation). However, standard LoRA uses a "one-size-fits-all" approach, where a single set of trained weights is expected to handle every input. This works well for niche styles but begins to fail when a model is asked to perform contradictory tasks, such as "adding blur" vs. "removing blur."

Tencent has recently unveiled HY-WU (Weight Unleashing), a groundbreaking research series that moves away from static adaptation. Instead of using a fixed adapter, HY-WU employs a dedicated model-generator that synthesizes custom LoRA weights for every single input example during inference. This "dynamic adaptation" allows the model to avoid the internal conflicts that plague traditional shared adapters.

The Challenge: Gradient Conflicts

The primary problem HY-WU addresses is gradient conflict in multi-task learning. When practitioners try to train a single adapter to handle multiple editing tasks—like "aging a face" and "rejuvenating a face"—the gradients during training literally pull the model in opposite directions.

Tencent's researchers measured this phenomenon directly, finding that the cosine similarity between gradients for diverse tasks is consistently negative (averaging around -0.30). In simple terms, when tasks compete for the same weights, the final result is a compromise that satisfies neither task perfectly. A shared adapter is forced to find a middle ground, leading to lower-quality results compared to task-specific models.

HY-WU: Weight Unleashing Architecture

The core of HY-WU is a 8-billion parameter model-generator. This generator acts as a "metamodel" that controls how the base model perceives and acts on an image.

Dynamic Parameter Generation

Unlike traditional SFT (Supervised Fine-Tuning) or static LoRAs, the HY-WU process happens entirely at inference time:

  1. Input Encoding: A joint representation of the source image and the user's text prompt is processed through a SigLIP2 encoder.
  2. Weight Synthesis: The 8B generator takes this input and predicts approximately 0.72 billion LoRA parameters.
  3. Injection: These freshly synthesized matrices are injected into the base model (Tencent's HY-Image-3.0-Instruct).
  4. Inference: The model performs the image edit using weights perfectly tailored to that specific request.

This approach is trained end-to-end using downstream loss, meaning the generator learns how to create the best weights for any given scenario without needing pre-collected adapter checkpoints.

Performance and Results

To test the efficacy of Weight Unleashing, Tencent applied it to the complex problem of text-guided image editing, where conflicts are both frequent and visually obvious.

Human Evaluation (GSB)

In pairwise human evaluations (Good-Same-Bad), HY-WU demonstrated a significant lead over existing open-source and proprietary systems:

  • Dominating Open Source: HY-WU won with a 67–78% margin against leaders like Step1X, Qwen, LongCat, and FLUX.
  • Beating Closed Systems: The model outperformed industry heavyweights like Seedream 4.5 (55.6%) and GPT Image 1.5 (55.5%).
  • The Top Contender: Currently, only Nano Banana 2 and Nano Banana Pro maintain a lead over HY-WU's performance.

Tencent's ablation studies confirmed that this performance boost isn't just a byproduct of having more parameters. When the generator was forced to use "averaged" or shuffled conditions, performance collapsed to the level of the base model, proving that the conditional routing of weights is the secret sauce.

Future Roadmap: Functional Memory

HY-WU is just the first part of a broader research series into Functional Memory for generative models. Tencent has outlined several ambitious goals for the future of this technology:

  • Functional vs. Retrieval: Comparing model-based weight generation with retrieval-augmented approaches to see where each excels.
  • Online Learning: Developing protocols where models can learn new tasks on the fly without "catastrophic forgetting" of old ones.
  • Independent Scaling: Exploring how scaling the generator separately from the base model affects performance.
  • Beyond LoRA: Applying Weight Unleashing to other operators, video generation, and autonomous agent systems.

Conclusion

Tencent's HY-WU marks a shift from static AI models to dynamic, reactive systems that reshape their own "brains" for every task. By unleashing weights from the constraints of a single fixed point in parameter space, Tencent has opened a new path for high-fidelity, multi-task image editing.

Currently, the model is available as the HY-Image-3.0-Instruct stack. However, users should be aware that this precision comes with significant hardware requirements: running the full generator and base model requires a robust setup, typically 8x40 GB or 4x80 GB of VRAM.

Key Takeaways:

  • Dynamic Adapters: Weights are generated unique to each prompt/image pair.
  • No More Conflicts: Solves the problem of contradictory task gradients.
  • SOTA Performance: Outperforms most open-source and several top-tier proprietary editors.
  • Functional Memory: Represents the first step in a new paradigm of generative model research.

Interested in exploring more about image generation? Check out our AI Models guide or dive into the Glossary to learn more about Mixture-of-Experts and foundation models.

Sources

Frequently Asked Questions

HY-WU (Weight Unleashing) is a framework that uses a dedicated model-generator to synthesize unique LoRA adapters for every image-text input pair at inference time.
Standard LoRA uses a fixed set of weights for all tasks, leading to gradient conflicts when tasks are contradictory (e.g., blurring vs. sharpening). HY-WU generates task-specific weights on the fly.
Given the model-generator (8B) and base model (HY-Image-3.0), running the system requires significant VRAM, typically 8x40GB or 4x80GB configurations.
No, the model-generator produces the LoRA matrices directly during the single forward pass of inference, without additional training or optimization.

Continue Your AI Journey

Explore our lessons and glossary to deepen your understanding.