Detailed Analysis
A hands-on comparison of three leading AI models — Claude Sonnet 4.6, ChatGPT 5.5 (via Codex), and Gemini 3.1 Pro (via Gemini CLI) — tested their ability to replicate a kitchen cabinet design in a parametric 3D modeling workflow, yielding notable differences in speed, functionality, and practical usefulness. The author, a designer working on a kitchen project, challenged each model to reconstruct a cabinet file from scratch without copying the source, requiring the models to demonstrate genuine modeling capability including panel dimensions, accessories, and user-defined parameters.
Claude Sonnet 4.6 distinguished itself in raw speed and immediate feature completeness, producing accessories and user parameters on the very first request — something neither competitor matched in round one. However, the parameters it generated were non-functional, meaning changes to their values did not propagate through the model as they would in a true parametric design environment. The limitation was compounded by Claude's free-tier usage cap, which cut the session short after a single attempt and prevented further refinement. ChatGPT 5.5, running through Codex, abandoned the parametric approach entirely and fell back on direct modeling, which produced reasonable results including wood material texturing on the first try but rendered the output largely incompatible with professional parametric workflows. It did earn an informal point for embedding its own name on the front panel of the cabinet. Gemini 3.1 Pro, despite suffering significant technical access issues that cost nearly 45 minutes of troubleshooting, ultimately produced the most workflow-compatible result: functional user parameters that correctly updated the model on change, though it required a second prompt to add the accessories.
All three models shared a common and meaningful failure: while they correctly placed each panel in its proper dimensional position, none of them accurately modeled the joinery — specifically the grooves that side panels require to receive a back panel, and the corresponding cuts needed for proper fit. This is a technically precise detail that distinguishes surface-level geometry from construction-ready modeling, and its universal absence across all three systems points to a significant gap between AI-assisted dimensional replication and genuine understanding of fabrication logic.
The results reflect a broader pattern in current AI capability: these systems perform well as accelerators for structured, repeatable subtasks within a larger human-directed workflow, but fall short when tasks require embedded domain knowledge about how physical components are actually constructed and assembled. The author's framing — that AI works best as a "dynamic add-on" rather than a standalone modeler — captures an insight increasingly echoed across professional fields integrating generative AI. The planned follow-up test using Claude Opus, described as the superior model tier, in a more refined workflow suggests the author views current limitations as partly a function of model selection and prompt engineering rather than fundamental ceiling on what AI modeling assistance can achieve.
Read original article →