Fashion shoots are getting 90% cheaper. Most of the tools come from China.
0What actually changed
For most of 2024 the conversation around AI fashion imagery was hand-wavy. Models had artifacts, fabric draped wrong, hands had six fingers. Catalogue teams looked at the output and went back to the studio.
That’s over. In the last twelve months three things converged: image models stopped failing on hands and fabric, video models started preserving garment edges through movement, and the frontier engines doing it became cheaply accessible by API — Kling, Seedance, Hailuo, Vidu out of China, plus GPT Image 2.0 (OpenAI), Nano Banana (Google) and Flux Kontext (Black Forest Labs) on the stills side. The cost floor for usable product imagery dropped by roughly an order of magnitude in a year.
ASOS reported a 23% cut in photography production costs in 2024. ByteDance internal data shows a four-person Chinese food brand made a 90-second spot for ¥4,000–5,000 (~$700) in five hours that hit five billion views. Shein has been openly using AI-generated models in its catalog for a year. None of this is experimental anymore. It’s operational, and it’s cheap.
The catch: no single engine produces a coherent brand catalog on its own. Nano Banana is best-in-class for character-consistent editing of an existing photo, but it doesn’t synthesize from scratch as cleanly. GPT Image 2.0 nails prompt adherence but can drift on brand consistency across a series. Kling moves fabric beautifully but its native UX is generic. Seedance ships polish but lives inside ByteDance’s distribution stack. Flux Kontext does precise masked edits but doesn’t do motion. Each engine wins one phase of the workflow. Whoever composes them wins the catalog.
The image engines we wire
For stills — model on white, lifestyle, on-model try-on, background swap, market translation — three engines each win a different phase of the pipeline. We wire them as branches inside the Weavy graph and pick automatically based on the input and the goal.
Nano Banana — Google Gemini 2.5 Flash Image
Best-in-class for editing an existing photo while keeping a person, garment or product visually consistent across a series. Where it shines: starting from one good hero shot and producing 20 brand-consistent variations — different backgrounds, angles, lifestyle context — without the subject drifting. The character-consistency is what makes it the spine of any catalog refresh that starts from real photography.
GPT Image 2.0 — OpenAI
Strongest on prompt adherence and from-scratch synthesis. Use it when there’s no source photo and you need to generate a product visualisation from a brief — a category page hero, a lifestyle scene, a campaign concept. Less consistent than Nano Banana across a multi-shot series, but far better at translating a directed prompt into a single intentional image.
Flux Kontext — Black Forest Labs
Precision masked editing. The use case: replace a background, swap a model’s outfit, change a product colorway, retouch a specific region — without disturbing the rest of the image. Pairs naturally with mask-extraction nodes in Weavy. The fidelity inside the masked region is the highest of the three for surgical edits.
Seedance image — ByteDance
ByteDance’s image companion to the Seedance video model. Good for production-grade synthesis at volume when you need a stylistic match with a Seedance-generated motion clip — image and video stay coherent. Less useful as a standalone stills engine; sits in the pipeline for cross-format consistency.
The video stack — where Chinese tools lead outright
On AI fashion video (lookbooks, social spots, animated try-on) the gap with Western tooling is the widest. Four names every production team we talk to is using.
Kling AI — Kuaishou
Currently the best in any language for fabric movement and texture preservation through motion. The go-to for cinematic fashion content where edges, logos and folds need to stay coherent through a camera move. Built-in virtual try-on workflow that generates short on-model clips from a still product image. Kling 2.6 handles wool, denim and silk convincingly; very fine knits still struggle.
Seedance 2.0 — ByteDance
ByteDance’s flagship for production-oriented video. The metric that matters here is “usability per generation” — Seedance reports ~90%, against an industry norm closer to 20% a year ago. Studios queue six hours and work at 3am for off-peak capacity, which tells you everything about demand. Distribution lane straight into Douyin, TikTok and CapCut, the latter with ~800M users — that’s a delivery advantage Western tools can’t match.
Hailuo AI — MiniMax
Faster, looser, mood-driven. Use it when you need brand or lifestyle content with personality and don’t need the precise garment fidelity that Kling offers. Good fit for top-of-funnel social and Story-format campaigns.
Vidu — Shengshu, Alibaba-backed
Strong on emotional storytelling and multi-character scenes. Vidu Q3 added real-time in-playback editing at ~2 second latency in 1080p, which changes the iteration loop for short-form ad creative. The PixVerse / Vidu pair (both Alibaba-funded) recently raised $300M and $290M respectively, so the roadmap is well-resourced.
Where AI is good — and where it isn’t
We’ve tested most of these on real client SKUs across denim, knitwear, outerwear, lingerie and jewelry. The pattern is consistent.
AI is production-ready for:
- Standard apparel on a model — t-shirts, dresses, jackets, trousers
- Background swaps and lifestyle relocation
- Translation of existing campaigns into new markets
- Animated try-on for social
- Cinematic lookbook video on Kling
- A/B test variants at scale (50+ creative permutations in a day)
AI still loses on:
- Sheer fabric, sequins, lace, fine knit detail
- Anything where the buyer will zoom to inspect stitching
- Editorial brand campaigns where the photographer is the asset
- Jewelry and watches at close range
- Authentic skin texture for premium beauty positioning
- Anything that depends on a specific named talent or muse
The honest read: AI handles the bottom 60–80% of any fashion brand’s imagery volume — refreshes, secondary angles, market translations, social variants. Hero shots and editorial keep using a real photographer. The economics flip when you stop framing this as “AI vs. shoots” and start framing it as “AI for volume, shoots for hero.” That’s how every brand we work with that’s done it well has structured the workflow.
How we build it: Weavy as the orchestration layer
We don’t ship clients a list of SaaS tools to subscribe to. We compose the right engines — for that brand, for that catalog, for the verticals it sells in — into a custom pipeline inside Weavy, a node-based AI canvas that lets us wire any model to any other.
A typical brand pipeline looks like this:
Inputs — the client’s product photo, a brand reference set (existing campaign imagery, color palette, model preferences), and asset metadata. Brand consistency is enforced via reference images and, where the brand has enough imagery, a custom LoRA fine-tuned on their existing catalog.
Routing by job-to-be-done — variations from an existing hero shot route to Nano Banana for character-consistent editing. From-scratch synthesis (a new lifestyle scene, a category hero, a concept image) routes to GPT Image 2.0 for prompt adherence. Background swaps, colorway changes and surgical retouches route to a Flux Kontext branch with mask extraction upstream. Each route is a separate node graph, picked automatically based on the input.
Generation, edit, enhance — engine output → mask refinement → relight to match brand lighting → upscale to PDP/print resolution. A Compare node sits at the branch points so we can A/B candidate engines on a sample asset before locking the route.
Motion branch — for any asset flagged for video, the same image flows into a Kling node for product detail, a Hailuo node for mood/lifestyle social, or a Seedance node for full ad creative depending on the format brief.
Output as a Design App — once the pipeline is dialed in, we publish it as a Design App: the client’s team uploads a new product photo, picks the format (PDP square, IG vertical, ad creative), and gets brand-consistent output without touching the underlying graph.
The agency value isn’t access to the engines — anyone can subscribe to Kling, run GPT Image 2.0 via API, or sign up for a SaaS that bundles a couple of them. The value is the pipeline: the routing logic, the brand-consistency layer, the comparison harness, and the Design App that lets the client’s team run it without learning Weavy. We build it once per brand, hand over the keys, and tune as new engines ship.
This isn’t fashion-only. We’ve built variants of this pipeline for fashion (on-model catalog, lookbook video), for construction (product-in-context renders, before/after generation), and for interior (room scenes, swatch-to-room, lifestyle staging). The architecture is the same — what changes is which engine wins which phase. Any vertical that ships a physical product can run on it.
For the unchanged hero shot — once per quarter, real photographer, hand-picked final images — we still recommend a real studio day. That’s the brand asset. The Weavy pipeline handles everything downstream of it.
The actual numbers
| Path | Cost per quarterly catalog refresh | What you get |
|---|---|---|
| Traditional | $5,400 | Photographer ($1,500), studio + lighting ($800), models ($1,200), hair/makeup ($600), retouching ($800), agency coordination ($500) |
| AI-led pipeline | $700 | Engine API spend across Weavy nodes ($120 per refresh at typical volume), one hero shoot amortized across the quarter ($300), pipeline maintenance ($280) |
That delta — $4,700 per quarter, ~$19,000 per year — is real cash that goes into either media spend or product margin. For a brand running on a 30% blended margin, that’s roughly $63,000 in additional GMV needed to replicate the same bottom-line impact through the topline. AI imagery is quietly one of the highest-ROI capability shifts available to a DTC fashion brand right now.
The one-time setup of the Weavy pipeline is a separate engagement (we scope it per brand based on SKU complexity and catalog volume), but it’s a fixed cost, not a recurring one. After handover, the client’s team runs the Design App on their own.
IP, model rights, disclosure
Two real concerns, one fake one.
Real: model likeness rights. If you upload a real person’s image (yourself, a founder, an ambassador) to generate try-on photos, you have full rights. If you use a stock AI model from one of the engines — Kling’s library, an off-the-shelf SaaS roster — you’re using synthetic identities the vendor licenses you. Read the terms; most are generous for commercial use, but some restrict redistribution as a real person.
Real: disclosure to consumers. Several markets are moving toward AI-generated imagery disclosure — France already requires it for retouched fashion ads, and the EU AI Act’s general-purpose disclosure rules hit some categories in 2026. Best practice: tag AI-generated imagery in alt text and on the PDP where it appears. Consumers who care can find it; the rest don’t notice.
Fake: copyright on generated images. US Copyright Office guidance has consistently held that purely AI-generated output isn’t copyrightable, but composite images where a human contributes meaningful creative direction (prompt, edits, composition) are. For ecommerce catalog imagery, copyright protection is rarely the moat — being first to market and having on-brand consistency is. Don’t optimize for copyright; optimize for conversion.
If we’re a fit — fashion, construction, interior, or any other category that ships a physical product — send a short note about your brand, your markets and your existing catalog volume. We’ll come back with a scoped plan and a timeline.