BennyDaBall/qwen3-4b-Z-Image-Engineer · missing the system prompt?

5 days ago

mentioned in readme, but not present in repo.

Looks like fun, will compare it to JoZiMagic (Josiefied Qwen3) and Stock encoder

Owner 3 days ago

Whoops! Thank you, good catch. I meant to upload in a separate .json. In my testing I have found that simply using my system prompt with the normal 2507 instruct model gives some good results, too. It's not necessarily about getting "better" quality image outputs using this as an encoder, more so varied outputs. It writes prompts pretty well and is the primary reason I made it...but if you look at the system prompt example I have to reinforce it to use certain words and phrasing - I found that without it Qwen's tiny brain can't capture camera type aesthetics without using "shot on" or "shot with" - phrases like "captured by" aren't specific enough and the model will hallucinate a camera into the image. I could have fixed that in the dataset generation phase by being more specific with my system prompts but I was battling API refusals for certain styles and I got distracted fixing that. 😄

CynicalSpore

1 day ago

Cool your sysprompt infuses each anime depiction with unwanted Photographical Terms so a big no go for me it's each prompt like that for instance :

This is achieved using a Nikon Z9 camera with a 24-70mm f/2.8 lens set to f/2.8 for shallow depth-of-field, resulting in a clean and sharp image. The scene is rendered with a Fuji Pro 400H color grade, producing soft pastels and subtle gradients, while maintaining high resolution. This shot is a medium-full, focused on the subjects,

so absolute unusable in other situations aside from Photographic depictions.
but just my two cents to it.

scruffynerf

1 day ago

Put text into the system prompt, tell it NOT to use camera/photographic terms. Tell it you are doing anime. It works fine.

If he does retrain it, 'removing the camera' is important, because it's being trained for Zimage, which doesn't do that well with that prompt... but it's work-aroundable.

CynicalSpore

1 day ago

•

edited 1 day ago

looks like you got the camera prompts engraved in the dataset so a more open light setting would fix that by not mention any kind of relationship to a camera model at all use light's like

Top-left soft diffuse
Top-right soft diffuse
Top-center spotlight
Bottom-left rim light
Bottom-right rim light
Back silhouette light
Front flat fill
High-contrast top-down
Low-contrast top-down
Side rim left
Side rim right
Two-point cross lighting
Three-point classic (key left)
Three-point classic (key right)
Butterfly (butterfly shadow under nose)
Short lighting (face mostly in shadow)
Broad lighting (face mostly lit)
Up-light dramatic
Underfill soft
Warm key (implied warmer tone)
Cool key (implied cooler tone)
Hard single-source
Soft multi-source
Windowpane soft (large rectangular softbox)
Overhead softbox broad
Overhead hard spot
Low-angle hard shadow
Rim halo backlight
High rim with soft feather
Split lighting (half face lit)
Rembrandt (triangle of light)
Classic portrait loop shadow
Strong cast shadow from blinds
Dappled leaf light
Speckled light (through mesh)
Lantern top glow
Candle flicker edge light
Torchlight low warm
Neon edge hard lines
Single shaft through doorway
Spotlight vignette center
Spot off-center dramatic
Broad soft vignette
Foggy diffuse backlight
Hazy high-key fill
Low-key with deep blacks
High-key bright even light
Soft ambient overcast
Direct sun hard shadows
Late-afternoon long shadows
Golden-hour side light
Blue-hour cool soft
Silvery moonlight soft
Strobe freeze hard edges
Soft bounce from floor
Bounce from ceiling warm
Bounce from wall cool
Under-table glow low fill
Reflector fill from left
Reflector fill from right
Rim plus soft front fill
Crossed backlights (X shaped)
Split plus rim accent
Edge backlight halo above head
Hairlight narrow strip
Knife-edge side light
Painted shadow shapes (gobo)
Patterned window shadow (curtains)
Patterned lattice shadow
Checkerboard light pattern
Strong ambient with single bright key
Muted ambient with sharp key
Hard shadow silhouette foreground object
Shadow-cast storytelling silhouette
Spot-on-ground pool of light
Top rim fade to dark base
Bottom fill lifting shadows slightly
Colored-gel implied tone (monochrome value shift)
High-contrast chiaroscuro
Soft chiaroscuro gradient
Low, wide soft wrap light
Narrow focused tube light
Vertical strip light left
Vertical strip light right
Diagonal top-left beam
Diagonal top-right beam
Hard edge from doorway left
Hard edge from doorway right
Back-to-front gradient (bright back)
Front-to-back gradient (bright front)
Headlight (face-centered oval)
Ground-reflected rim (specular)
Silhouette with rim separation
Translucent diffuser soft glow
Soft cloud-filtered sun
Painted spotlight edge feathered
Hard circular spotlight with crisp rim
Floodlight broad even wash
Tunnel light (dark edges)
V-shaped slash light across subject
Band of light across torso
Top slice light across eyes only
Under-shelf glow (subtle up light)
Over-the-shoulder back key
Inside-frame glow (frame lit)
Center-lit vignette with dark corners
Mirror-reflected soft fill
Strong shadow cast behind subject
Double shadow (two light sources offset)
Triple shadow (three colored-value sources)
Hard rim with soft central fill
Soft rim with hard center key
Ambient café lamp pool
Streetlamp single-source cast
Firelight warm flicker low contrast
Incandescent lamp top warm glow
Fluorescent broad flat light
Halogen harsh warm single spot
Studio umbrella soft fill
Softbox beveled soft wrap
Beauty dish focused soft center
Broad cinematic top key
Low cinematic kicker from side
Moody single-beam through smoke
High-contrast theatrical downlight
Soft theater footlight lift
Subtle rim on hands only
Accent on object pocket only
Shadow play from stair slats
Spotlight halo with feathered edge
Light through frosted glass soft pattern
Distant light silhouetting horizon
Near-side glow emphasizing texture
Far-side glow low intensity
Glancing low-angle sheen highlights
Matte soft even tone lighting
Specular hard highlights on metal
Satin soft highlights on fabric
Textured side light to reveal grain
Low diffuse wrap around small subject
High-contrast face-back split
Soft ambient with sharp shadow accent
Isolated head-in-pool dramatic
Strong key with dim overall ambient
Strong ambient with subtle highlights
Rim-only (subject mostly dark)
Fill-only (no distinct key)
Directional wash from upper-left
Directional wash from upper-right
Subsurface implied glow (soft internal light)

it would be much better ask Chat GPT or similar ones to get a clean light setup with much more Prompt options to take for the model without being forced to use terms like Nikon XYZ stuff

edit: aside from that you made a decent Camera simulation prompter for sure. 😁👍😉

CynicalSpore

1 day ago

•

edited 1 day ago

Put text into the system prompt, tell it NOT to use camera/photographic terms. Tell it you are doing anime. It works fine.

If he does retrain it, 'removing the camera' is important, because it's being trained for Zimage, which doesn't do that well with that prompt... but it's work-aroundable.

sure "DO NOT" works very pleasant

sets some nice Flags in your context as well thanks but no.

scruffynerf

1 day ago

sure "DO NOT" works very pleasant

"avoid" works fine. I use it. It's an LLM, not an image prompt. It handles negatives just fine.

I was going to suggest JoZiMagic for Anime generation, but given your attitude, please DO NOT use it, as you are the worst sort of user. Have a nice day.

scruffynerf changed discussion status to closed 1 day ago

CynicalSpore

1 day ago

but given your attitude....

🤣 do yourself a favor and cry outside