the weirdest thing that worked for me building with claude: i drew coordinates directly onto my template images, and claude can see everything

A developer built a zine-making app by designing templates in Figma and manually annotating editable zones with colored rectangles on a separate layer, then fed both images to Claude to extract coordinates and generate layout definitions. This approach condensed weeks of manual layout engine work into an afternoon and eliminated the need for code changes when adding new templates. The developer found that providing Claude with pre-established design constraints—color palette, typography, and visual hierarchy—produced more faithful implementations than requesting generative design.

Detailed Analysis

A developer building a 90s/Y2K-aesthetic zine-making app called Zinecore discovered an unexpectedly efficient workflow for handling complex, visually designed UI templates: drawing colored annotation rectangles directly onto template images and feeding those annotated images to Claude for coordinate extraction and editable-zone definition. Rather than programmatically constructing each template layout in code — an approach that would have constrained design to what could be hand-built and produced generic, grid-like results — the developer designed templates in Figma, exported them as flat PNGs, and overlaid a separate annotation layer marking photo slots in red and text zones in blue. Claude interpreted both images, extracted spatial coordinates, and generated the interactive tap target definitions. The developer estimates this reduced what would have been weeks of custom layout engine development to a single afternoon.

The practical payoff extended beyond initial build time. Because the coordinate-extraction logic lives in Claude rather than in handwritten code, adding a new template requires only designing it and drawing annotation boxes — no code changes are needed. This transforms what is typically a deeply technical, developer-gated process into a design-tool workflow that scales with creative output rather than engineering capacity. The approach exploits Claude's multimodal vision capabilities not as a novelty but as a genuine architectural substitution, replacing a class of bespoke infrastructure entirely.

The developer frames a broader methodology around a clear division of cognitive labor: all design decisions — color palettes, type hierarchy, spacing references, annotated screenshots of inspirational apps — are resolved on paper before Claude is ever involved. The explicit argument is that Claude is highly capable at faithful implementation of a specific, well-constrained vision, but substantially weaker at originating that vision. Vague prompts like "make it look 90s" produce generic outputs requiring days of iterative correction, while precise inputs — hex codes pulled from reference photos, named apps whose spacing is being mimicked, visual problems described in terms of weight and rhythm rather than subjective dissatisfaction — produce accurate implementations quickly.

This workflow reflects a maturing pattern in how skilled developers are learning to use large language models in creative production contexts. The failure mode the developer explicitly warns against — delegating creative direction to the model and then attempting to nudge it toward a vision — is a common one, and the solution offered is essentially a discipline of constraint-setting: do the judgment work first, then use the model's execution capacity. The annotated-image technique is a concrete instantiation of this philosophy applied to a problem (spatial layout definition) that would not intuitively seem amenable to a vision-model solution.

The Zinecore case also illustrates how multimodal capabilities are beginning to reshape not just what AI tools can do, but what human developers need to build at all. By offloading coordinate parsing and zone-definition generation to Claude's image understanding, the developer eliminated an entire category of bespoke tooling from the project. As models become more reliable at interpreting structured visual annotations, this pattern — using human-drawn markup as a lightweight communication protocol between designer intent and machine implementation — is likely to appear across a wider range of development contexts, particularly in domains like mobile UI, data visualization, and document layout where spatial relationships are primary.

Read original article →

Detailed Analysis

Don't Miss a Deploy