AI Image Composer in 1 Day: React 19 + Gemini Multimodal — Case Study

The problem

Hours of manual AI prompting per artwork

The creative studio's product: personalized AI paintings where the client's face is blended into an artistic scene — oil painting style, 3D render style, fantasy illustration. Each artwork requires "Identity Replacement" — taking the client's photo and integrating their likeness into a pre-designed artistic composition while maintaining the scene's style, lighting, and composition.

Before the tool, the workflow was: load the client photo into an AI image tool, manually craft a detailed prompt describing the identity replacement, generate 3-5 variations, select the best, refine with another round of prompting if the likeness was off, repeat until the output was client-ready. On a good day, 30-45 minutes per artwork. On a bad day, 2+ hours.

The studio had a growing order queue and a bottleneck: one person who knew how to prompt AI image tools correctly. If they were unavailable, orders piled up. The team couldn't scale without removing this single-point dependency.

Why the existing workflow was hard to automate

The core technical challenge is "Identity Replacement with Style Transfer" — preserve the client's face geometry and likeness while applying the scene's artistic style to the entire image. Too much style transfer loses the identity. Too little produces an out-of-place realistic face in an artistic painting.

Generic image generation tools handle this with manual prompting because they're built for creative exploration, not repeatable production. The solution needed to encode the right prompting strategy into a UI so that anyone on the team could produce consistent results without prompting expertise.

The 1-day build

How it went from requirement to production

Morning — Research (2h)

Stress-tested Gemini's gemini-2.0-flash-exp-image-generation in Google AI Studio. Focused on identity replacement with style transfer across 3 art styles: oil painting, 3D render, fantasy illustration. Found the prompt structure that produced consistent results before writing any code.

Late morning — Architecture (1h)

Decided on stack: React 19 for UI (fast, component-based, good mobile support), Canvas API for client-side preprocessing, Node.js backend for API calls, Railway for deployment. No overengineering — the simplest thing that could work in production.

Afternoon — Build (4h)

Frontend: photo upload UI, style template selector, result display, download button. Backend: image preprocessing endpoint, Gemini API call with the calibrated prompt, error handling. Canvas API compression for iPhone photos (15MB+ originals needed to be under 2MB to avoid API timeouts).

Evening — Deploy and handoff (1h)

Railway deployment with environment variable configuration. Tested on 5 real client photos across 3 styles. Passed to the client with a 10-minute walkthrough. Tool was in use by the team the next morning.

Technical details

Under the hood

Canvas API preprocessing

iPhone photos are 12-15MB. Sending raw to Gemini API causes timeouts on slow mobile connections. Canvas API resizes and compresses client-side before upload: canvas.toBlob(callback, 'image/jpeg', 0.85) with target dimensions capped at 2048px. Result: 200-800KB, under 200ms on any phone, no visible quality loss at artwork generation scale.

Prompt engineering for identity replacement

The system prompt is the core IP of the tool. It instructs Gemini to: preserve facial geometry and distinctive features of the subject, apply the artistic style only to background and non-face elements, maintain the scene's lighting direction as it falls on the subject's face, avoid the "uncanny valley" by not over-smoothing skin textures. The prompt was calibrated over ~30 test generations across different face types and art styles.

Multi-office API key injection

The studio runs two offices (Poland/Ukraine) with separate billing. Each office authenticates with its own API key stored in local storage. The backend receives the key per request and uses it for the Gemini call. Simple, audit-free, zero-infrastructure cost allocation.

What made it work

Mobile-first by constraint

The studio's team works on phones during client visits — showing style options and capturing client photos in real time. A desktop-first web app would have been unused.

Mobile-first decisions that mattered:

Single-column layout with large tap targets — no small buttons
Camera input via <input type="file" accept="image/*" capture="user"> — opens the camera directly on mobile
Loading state with animation — generation takes 8-15 seconds; users need feedback that something is happening
Result auto-downloads on mobile or shows full-screen for easy sharing to client chat
No login required — any team member can use it without account setup

The "natural language corrections" feature

Gemini handles follow-up instructions naturally: "change the background from forest to cityscape", "make the style more painterly", "adjust the lighting to look like sunset". Users can request changes via a text field below the result. This replaced the need for a complex template management system — the model handles variation requests conversationally.

Results and lessons

What happened after deployment

The tool went live the day after the build. Results from the first month:

85-90% of team uses it for standard orders — the remaining 10-15% are complex requests requiring custom prompting
Order processing time: hours → 30 seconds per artwork for standard requests
Single-point-of-failure eliminated — any team member can process orders
Client satisfaction improved — faster turnaround, more variation options shown during the visit

What I'd do differently with more time

With a 1-week build window instead of 1 day, I would add: order history and result storage (currently results are downloaded and managed manually), batch processing for multiple style variations in one request, and a proper admin panel for managing style templates. The current version handles 90% of use cases without these — but they're the natural next improvements.

When 1 day is the right scope

Not every automation needs to be a full product. The studio's bottleneck was specific, well-understood, and repeatable. A focused 1-day tool that solves one problem well is often more valuable than a 2-week project with features nobody uses. The test: can the target users operate it without training? If yes, ship it.

For your business

Where this pattern applies

The AI Image Composer pattern — "automate repetitive AI prompting into a one-click tool" — applies wherever a business has:

A repeatable visual task currently done manually with AI tools (Midjourney, DALL-E, Stable Diffusion, Gemini)
Non-technical staff who need to produce consistent results without prompting expertise
A production volume where manual prompting doesn't scale

Examples relevant to German businesses:

E-commerce product photography — automated background replacement and scene generation for product photos (Produktfotografie automatisieren)
Marketing agencies — templated ad creative generation from brand guidelines and client assets
Real estate — automated property photo enhancement and virtual staging (virtuelle Möblierung)
Print and personalization businesses — customized greeting cards, invitations, certificates at scale

Tech stack

Tools used

React 19 TypeScript Gemini AI Canvas API Node.js Railway VS Code Claude Code

Haben Sie repetitive visuelle Workflows in Ihrem Betrieb?

I build AI automation tools that turn expert manual workflows into one-click tools for your team. Creative automation, product photography, visual content pipelines. Germany and EU focus. Let's talk.

Contact me See project page

← Back to articles

AI Image Composer Built in 1 Day: React 19 + Gemini Multimodal