Hours of manual AI prompting per artwork
The creative studio's product: personalized AI paintings where the client's face is blended into an artistic scene β oil painting style, 3D render style, fantasy illustration. Each artwork requires "Identity Replacement" β taking the client's photo and integrating their likeness into a pre-designed artistic composition while maintaining the scene's style, lighting, and composition.
Before the tool, the workflow was: load the client photo into an AI image tool, manually craft a detailed prompt describing the identity replacement, generate 3-5 variations, select the best, refine with another round of prompting if the likeness was off, repeat until the output was client-ready. On a good day, 30-45 minutes per artwork. On a bad day, 2+ hours.
The studio had a growing order queue and a bottleneck: one person who knew how to prompt AI image tools correctly. If they were unavailable, orders piled up. The team couldn't scale without removing this single-point dependency.
Why the existing workflow was hard to automate
The core technical challenge is "Identity Replacement with Style Transfer" β preserve the client's face geometry and likeness while applying the scene's artistic style to the entire image. Too much style transfer loses the identity. Too little produces an out-of-place realistic face in an artistic painting.
Generic image generation tools handle this with manual prompting because they're built for creative exploration, not repeatable production. The solution needed to encode the right prompting strategy into a UI so that anyone on the team could produce consistent results without prompting expertise.
How it went from requirement to production
gemini-2.0-flash-exp-image-generation in Google AI Studio. Focused on identity replacement with style transfer across 3 art styles: oil painting, 3D render, fantasy illustration. Found the prompt structure that produced consistent results before writing any code.Under the hood
Canvas API preprocessing
iPhone photos are 12-15MB. Sending raw to Gemini API causes timeouts on slow mobile connections. Canvas API resizes and compresses client-side before upload: canvas.toBlob(callback, 'image/jpeg', 0.85) with target dimensions capped at 2048px. Result: 200-800KB, under 200ms on any phone, no visible quality loss at artwork generation scale.
Prompt engineering for identity replacement
The system prompt is the core IP of the tool. It instructs Gemini to: preserve facial geometry and distinctive features of the subject, apply the artistic style only to background and non-face elements, maintain the scene's lighting direction as it falls on the subject's face, avoid the "uncanny valley" by not over-smoothing skin textures. The prompt was calibrated over ~30 test generations across different face types and art styles.
Multi-office API key injection
The studio runs two offices (Poland/Ukraine) with separate billing. Each office authenticates with its own API key stored in local storage. The backend receives the key per request and uses it for the Gemini call. Simple, audit-free, zero-infrastructure cost allocation.
Mobile-first by constraint
The studio's team works on phones during client visits β showing style options and capturing client photos in real time. A desktop-first web app would have been unused.
Mobile-first decisions that mattered:
- Single-column layout with large tap targets β no small buttons
- Camera input via
<input type="file" accept="image/*" capture="user">β opens the camera directly on mobile - Loading state with animation β generation takes 8-15 seconds; users need feedback that something is happening
- Result auto-downloads on mobile or shows full-screen for easy sharing to client chat
- No login required β any team member can use it without account setup
The "natural language corrections" feature
Gemini handles follow-up instructions naturally: "change the background from forest to cityscape", "make the style more painterly", "adjust the lighting to look like sunset". Users can request changes via a text field below the result. This replaced the need for a complex template management system β the model handles variation requests conversationally.
What happened after deployment
The tool went live the day after the build. Results from the first month:
- 85-90% of team uses it for standard orders β the remaining 10-15% are complex requests requiring custom prompting
- Order processing time: hours β 30 seconds per artwork for standard requests
- Single-point-of-failure eliminated β any team member can process orders
- Client satisfaction improved β faster turnaround, more variation options shown during the visit
What I'd do differently with more time
With a 1-week build window instead of 1 day, I would add: order history and result storage (currently results are downloaded and managed manually), batch processing for multiple style variations in one request, and a proper admin panel for managing style templates. The current version handles 90% of use cases without these β but they're the natural next improvements.
When 1 day is the right scope
Not every automation needs to be a full product. The studio's bottleneck was specific, well-understood, and repeatable. A focused 1-day tool that solves one problem well is often more valuable than a 2-week project with features nobody uses. The test: can the target users operate it without training? If yes, ship it.
Where this pattern applies
The AI Image Composer pattern β "automate repetitive AI prompting into a one-click tool" β applies wherever a business has:
- A repeatable visual task currently done manually with AI tools (Midjourney, DALL-E, Stable Diffusion, Gemini)
- Non-technical staff who need to produce consistent results without prompting expertise
- A production volume where manual prompting doesn't scale
Examples relevant to German businesses:
- E-commerce product photography β automated background replacement and scene generation for product photos (Produktfotografie automatisieren)
- Marketing agencies β templated ad creative generation from brand guidelines and client assets
- Real estate β automated property photo enhancement and virtual staging (virtuelle MΓΆblierung)
- Print and personalization businesses β customized greeting cards, invitations, certificates at scale
Tools used
I build AI automation tools that turn expert manual workflows into one-click tools for your team. Creative automation, product photography, visual content pipelines. Germany and EU focus. Let's talk.