← Reddit

How to teach Claude to build levels for my 2D game?

Reddit · Websl · May 10, 2026
A developer using Claude Code to build a 2D Phaser game encounters limitations in the AI's level generation, which produces only linear, repetitive designs despite attempts to improve it with reference examples. The developer seeks methods to teach Claude to understand level design concepts like multiple paths with risk-reward tradeoffs and to validate whether generated levels are playable. Current constraints include Claude's inability to interact quickly enough with the browser-based game to test levels through automated means.

Detailed Analysis

A developer working with Claude Code on a browser-based 2D platformer built with Phaser.js has surfaced a nuanced set of limitations and opportunities in AI-assisted game development. The developer has successfully used Claude to create a level-generation script capable of producing functional JSON-encoded levels, complete with physics calculations such as player jump arc modeling, gap placement, and basic enemy positioning. However, the generated levels suffer from a lack of design sophistication — they tend to be linear and repetitive, failing to capture higher-order design concepts such as branching paths with meaningful tradeoffs (e.g., "slow but safe" vs. "short but risky" routes). The developer's attempts to provide hand-crafted reference levels as context have not meaningfully transferred design intent into the generation script, pointing to a fundamental gap between procedural code and internalized design heuristics.

The core challenge the developer faces is one of knowledge representation and feedback loop closure. Claude, operating through Claude Code, can generate and manipulate structured data like JSON level files and can reason about physics constraints when given explicit functions to call. What it currently lacks in this workflow is a reliable way to evaluate the qualitative player experience of a generated level — whether a path is genuinely "hard but rewarding" or simply "hard and punishing," for instance. The developer notes that Playwright-based browser automation is insufficient for real-time game input, effectively cutting off a natural avenue for Claude to test levels dynamically through play. Without a simulation or validation layer that can assess beatability, path variety, and pacing, the generation loop remains open-ended, producing outputs that are technically valid but creatively shallow.

The situation reflects a broader tension in applying large language models to creative and procedural domains: LLMs excel at pattern completion and structural generation but struggle to internalize implicit design grammars without explicit scaffolding. The developer's instinct to provide hand-crafted reference levels is directionally correct but highlights the need for that reference data to be structured in a way Claude can meaningfully generalize from — for example, annotated level files that explicitly tag design intentions (path type, difficulty curve, intended player emotion) rather than raw JSON. Additionally, building a headless simulation layer — a script that runs the level's physics logic without a rendering engine — could give Claude a programmatic way to validate beatability by simulating player movement through a level's geometry, effectively closing the feedback loop that the browser's real-time input requirement currently blocks.

This use case sits at the intersection of two accelerating trends: AI-assisted software development and AI-driven procedural content generation (PCG). While tools like Claude Code have made AI a productive coding collaborator, game level design represents a domain where the gap between syntactic correctness and semantic quality is especially pronounced. The field of PCG has long grappled with this — algorithmically generated content is easy to make valid and hard to make interesting. What makes this developer's situation notable is the attempt to use a conversational AI not just as a code generator but as a design collaborator, which requires the model to reason about player psychology, pacing, and emergent challenge — capabilities that remain at the frontier of what current LLM-based tooling can reliably deliver without significant prompt engineering, structured context, and custom evaluation infrastructure.

Read original article →