LongCat-Image

Introduction

We introduce LongCat-Image, a pioneering open-source and bilingual (Chinese-English) foundation model for image generation, designed to address core challenges in multilingual text rendering, photorealism, deployment efficiency, and developer accessibility prevalent in current leading models.

LongCat-Image Generation Examples

Key Features

  • ๐ŸŒŸ Exceptional Efficiency and Performance: With only 6B parameters, LongCat-Image surpasses numerous open-source models that are several times larger across multiple benchmarks, demonstrating the immense potential of efficient model design.
  • ๐ŸŒŸ Powerful Chinese Text Rendering: LongCat-Image demonstrates superior accuracy and stability in rendering common Chinese characters compared to existing SOTA open-source models and achieves industry-leading coverage of the Chinese dictionary.
  • ๐ŸŒŸ Remarkable Photorealism: Through an innovative data strategy and training framework, LongCat-Image achieves remarkable photorealism in generated images.

๐ŸŽจ Showcase

LongCat-Image Generation Examples

Quick Start

Installation

pip install git+https://github.com/huggingface/diffusers

Run Text-to-Image Generation

Leveraging a stronger LLM for prompt refinement can further enhance image generation quality. Please refer to inference_t2i.py for detailed usage instructions.

๐Ÿ“ Special Handling for Text Rendering

For both Text-to-Image and Image Editing tasks involving text generation, you must enclose the target text within single or double quotation marks (both English '...' / "..." and Chinese โ€˜...โ€™ / โ€œ...โ€ styles are supported).

Reasoning: The model utilizes a specialized character-level encoding strategy specifically for quoted content. Failure to use explicit quotation marks prevents this mechanism from triggering, which will severely compromise the text rendering capability.

import torch
from diffusers import LongCatImagePipeline

if __name__ == '__main__':
    device = torch.device('cuda')

    pipe = LongCatImagePipeline.from_pretrained("meituan-longcat/LongCat-Image", torch_dtype= torch.bfloat16 )
    # pipe.to(device, torch.bfloat16)  # Uncomment for high VRAM devices (Faster inference)
    pipe.enable_model_cpu_offload()  # Offload to CPU to save VRAM (Required ~17 GB); slower but prevents OOM

    prompt = 'ไธ€ไธชๅนด่ฝป็š„ไบš่ฃ”ๅฅณๆ€ง๏ผŒ่บซ็ฉฟ้ป„่‰ฒ้’ˆ็ป‡่กซ๏ผŒๆญ้…็™ฝ่‰ฒ้กน้“พใ€‚ๅฅน็š„ๅŒๆ‰‹ๆ”พๅœจ่†็›–ไธŠ๏ผŒ่กจๆƒ…ๆฌ้™ใ€‚่ƒŒๆ™ฏๆ˜ฏไธ€ๅ ต็ฒ—็ณ™็š„็ –ๅข™๏ผŒๅˆๅŽ็š„้˜ณๅ…‰ๆธฉๆš–ๅœฐๆด’ๅœจๅฅน่บซไธŠ๏ผŒ่ฅ้€ ๅ‡บไธ€็งๅฎ้™่€Œๆธฉ้ฆจ็š„ๆฐ›ๅ›ดใ€‚้•œๅคด้‡‡็”จไธญ่ท็ฆป่ง†่ง’๏ผŒ็ชๅ‡บๅฅน็š„็ฅžๆ€ๅ’Œๆœ้ฅฐ็š„็ป†่Š‚ใ€‚ๅ…‰็บฟๆŸ”ๅ’Œๅœฐๆ‰“ๅœจๅฅน็š„่„ธไธŠ๏ผŒๅผบ่ฐƒๅฅน็š„ไบ”ๅฎ˜ๅ’Œ้ฅฐๅ“็š„่ดจๆ„Ÿ๏ผŒๅขžๅŠ ็”ป้ข็š„ๅฑ‚ๆฌกๆ„ŸไธŽไบฒๅ’ŒๅŠ›ใ€‚ๆ•ดไธช็”ป้ขๆž„ๅ›พ็ฎ€ๆด๏ผŒ็ –ๅข™็š„็บน็†ไธŽ้˜ณๅ…‰็š„ๅ…‰ๅฝฑๆ•ˆๆžœ็›ธๅพ—็›Šๅฝฐ๏ผŒ็ชๆ˜พๅ‡บไบบ็‰ฉ็š„ไผ˜้›…ไธŽไปŽๅฎนใ€‚'
    
    image = pipe(
        prompt,
        height=768,
        width=1344,
        guidance_scale=4.0,
        num_inference_steps=50,
        num_images_per_prompt=1,
        generator=torch.Generator("cpu").manual_seed(43),
        enable_cfg_renorm=True,
        enable_prompt_rewrite=True
    ).images[0]

    image.save('./t2i_example.png')
Downloads last month
3,199
Inference Providers NEW

Model tree for meituan-longcat/LongCat-Image

Quantizations
1 model

Spaces using meituan-longcat/LongCat-Image 11