The shape

What we're building, at a glance.

Eight nodes that turn flat product flatlays into a consistent, on-model photoshoot — same canvas produces both stills and motion.

01 · Describe Flatlay → text

02 · Model Generate + describe

03 · Settings Camera + light

04 · Combine Outfit on model

05 · Poses Per-garment session

06 · Render ChatGPT Image 2

07 · Export Hero frames

08 · Motion Kling 2

The trick that makes this consistent: every visual element — products, model, scene — is converted to a precise text description first. The renderer reads the same text spec for every frame, which is why the model and outfits stay locked across an entire collection.

Step 01 · Describe products

Turn flatlays into structured descriptions.

Drop each product flatlay into an image-to-text node. The job here isn't generation — it's extraction. You want a precise, shoppable description of every garment: silhouette, fabric, color, finish, hardware, fit. This text becomes the canonical reference the rest of the workflow uses.

Input flatlay-{sku}.png

01 · Image-to-text Vision describer GPT-4o or Claude Sonnet

Output product-spec.txt

Use a strict description schema — silhouette, fabric, color, fit, hardware
One flatlay per garment; don't combine items at this stage
Save the text spec next to the flatlay — you'll reuse it across every shot of that garment

Step 02 · Generate & describe the model

Lock the model identity.

Generate the model once, then describe them in detail with another image-to-text pass. This text spec is what keeps the model identical across hundreds of frames — face, body, hair, skin, posture. Without this step, the model drifts every render.

02a · Generate Model render

02b · Describe Image-to-text Face · body · hair · skin

Output model-spec.txt

Generate the model standing neutrally, full body, plain background
Describe in granular detail — eye color, jaw shape, height proportions, hair texture
Lock this spec. Every future render references the same text — that's the consistency trick

Step 03 · Photo settings

Describe the shoot itself.

This is your art direction layer — written once, applied to every frame. Camera angle, lens, light direction, color temperature, mood. Treat it like a DOP brief.

Camera:        Sony A7R IV, 50mm prime, f/2.8
Framing:       Full body, slight three-quarter angle, eye level
Light:         Soft natural daylight from camera left
               Warm afternoon, ~5500K, gentle falloff
Setting:       Sunlit Stockholm apartment, white plaster walls
               Wooden floor, minimal styling, calm and neutral
Mood:          Editorial, restrained, premium
Post:          Slight film grain, neutral grade, no heavy contrast

Camera + lens — drives perspective, compression, depth of field
Light — direction, quality, color temperature
Setting — concrete place, never "stylish location"
Mood — three adjectives, not seven
Post — grade, grain, contrast — describe the finish, not the filter

Step 04 · Combine outfit on model

Put the full outfit on the model.

Now the three text specs come together — model, outfit (multiple garments), photo settings — and the renderer produces the first composite: the model wearing the styled outfit in the described scene. This is the canonical "look" frame.

Inputs model + outfit + settings

04 · Combine Outfit composite Multi-spec render

Output look-01.png

Treat this output as the style master. Every per-garment shot you generate next will reference back to it visually — that's how the look stays internally consistent across the catalog.

Look master — model in cognac suede jacket and grey trousers, full body — Look master · cognac suede jacket + grey trousers · the canonical reference for every subsequent frame in the session

Generate 3–4 candidates of the look frame; pick the strongest
Lock the seed and the look-frame reference once you have a winner
If a garment drifts from its flatlay, regenerate with stronger weight on that product spec

Step 05 · Per-garment poses

One photo session per garment.

For each garment in the outfit, you now run a dedicated session — same model, same scene, but a list of pose prompts that show the product from the angles e-commerce and editorial actually need.

Front, hero pose

Model standing centered, calm posture, full garment visible. The PDP main shot.

Three-quarter

Slight body rotation, hand placement that shows fit and detail. The "lifestyle" frame.

Detail / crop

Tighter framing on the garment — fabric, stitching, hardware. Used in carousels and paid creative.

Movement

Walking, looking off-frame, hand in pocket. Adds editorial range to the same look.

Write each pose as one short paragraph — body, gaze, hands, framing
4–6 poses per garment is the sweet spot for a full PDP set + paid creative
Keep the model and scene specs locked — only the pose prompt changes between frames

Pose 01 — front, hero — Per-garment pose session · same model, same scene, same outfit · only the pose prompt changes between frames

Step 06 · Render

Generate with ChatGPT Image 2.

The pose prompts feed into the render node. We use ChatGPT Image 2 for fashion stills — it preserves fabric, hands, and product detail better than the alternatives at this price point, and it follows long structured prompts well, which is exactly what this workflow produces.

Inputs All specs + pose

06 · Render ChatGPT Image 2 2:3 · 1024×1536

Output frame-{n}.png

Run the render in batch — one job covers all the poses for a single garment. Cost lands around $0.40–0.80 per frame; a full 4–6-pose session is $2–4 per garment.

Render 1–2 candidates per pose; keep the strongest
Lock seeds once you have a winning frame for that look
Hands and logos are still where you spend cleanup time — flag them on review

Step 07 · Export

Export the hero frames.

Save out a high-resolution PNG master and a smaller JPEG variant per pose. The PNG is your archive. The JPEG is what feeds the CMS, the paid creative pipeline, and the motion step that follows.

07 · Export frames/*.png + .jpg 2048×3072 · sRGB

Naming convention. Use {brand}_{sku}_{look}_{pose}.png from day one. You'll thank yourself when the catalog hits 500 frames.

Exported hero frame · Lazio black suede look · pose 01 — Final hero frames · Lazio black suede session · ready for PDP, paid creative and editorial use

Exported hero frame · Lazio black suede look · pose 02 — Final hero frames · Lazio black suede session · ready for PDP, paid creative and editorial use

Step 08 · Motion (optional)

Make the still breathe.

Connect any exported hero frame to a video-generation node — we use Kling 2 for fashion motion. The result is a 3–6 second loop where the model takes a breath, turns slightly, the fabric moves in the wind. Same canvas, one extra branch.

From step 07 hero-frame.png

08 · Motion Kling 2 · img2video 5s · 9:16 · 24fps

Output hero-motion.mp4

Cost runs roughly $1.50–2.50 per clip. Use it for PDP hero loops, IG/TikTok organic, paid social — same source frame, three formats, one workflow.

Motion · Kling 2 generated from a single hero frame

Keep the motion subtle — a small movement reads more "premium" than dramatic action
Generate at 9:16 by default, then crop down to 1:1 and 16:9 from the same render
Loop trimming: cut at the moment of stillness, not mid-motion

That's the workflow

From flat product to campaign frame.

This is the simplest version of the workflow we run for fashion clients. From here you add: brand-specific calibration, batch generation across a full collection, automated brief-to-render hand-offs, and the analytics layer that tracks which renders convert. But the core canvas stays the same.

If you want help wiring this up for your brand — or a fully calibrated workflow built and handed over — we do that.

sebastian@kirimedia.co →

Set up a simple photoshoot in Weavy.