I got tired of AI gaslighting across claude and gpt. pivoted the whole startup to fix it.

A startup called Serno pivoted to address widespread AI hallucination problems users faced when verifying answers across different models. The team developed a canvas mode where multiple AI models investigate different angles of complex questions and debate each other while users observe and guide the discussion. The platform maintains a standard chat interface for routine queries while reserving canvas mode for high-stakes questions where accuracy matters.

Detailed Analysis

Serno.ai, a startup founded around a multi-model AI experiment called Roundtable, has pivoted its core product after discovering that users were not engaging with the tool recreationally but were instead running substantive, high-stakes questions through it. The founder's original prototype placed two AI systems in a head-to-head debate format within a single chat interface. The unexpected depth of user engagement — people treating it as a serious research and decision-support tool — revealed a genuine and widespread frustration: users were manually hopping between Claude, GPT, and other models in separate browser tabs, attempting to triangulate which model was hallucinating or offering the most reliable answer. This fragmented, tab-switching workflow exposed a real gap in how AI tools are currently delivered to end users.

The central problem the founder identifies is what they label "AI gaslighting" — the tendency of individual large language models to deliver confident, polished, essay-style responses that mask underlying assumptions, knowledge gaps, or outright fabrications. This is not a new criticism of LLMs, but the framing is pointed: a single model generating a "polite essay" is structurally incapable of surfacing its own blind spots, and the conversational chat UI compounds the problem by encouraging passive acceptance of whatever answer appears. Serno's response is a "canvas mode" that deploys multiple models to investigate distinct angles of a question simultaneously and then stages a structured debate between them, presenting the user with a map of disagreement, uncertainty, and alternative directions rather than a single authoritative output.

The architectural decision is significant from a product-design standpoint. Rather than treating the chat box as the universal UI for AI interaction, Serno bifurcates its interface: standard chat remains for low-stakes, everyday queries, while canvas mode is reserved for questions where epistemic rigor matters. This distinction acknowledges something that most major AI product teams have been slow to act on — that the conversational metaphor, borrowed from messaging apps, is poorly suited to complex analytical tasks. Multi-model debate surfaces divergence as a feature rather than a bug, effectively externalizing the uncertainty that any single model would otherwise smooth over.

Notably, the founder discloses that Claude performs the majority of the computational work within Serno, and that Claude was also the primary tool used to build the product itself. This dual role — Claude as both development assistant and production backbone — reflects a broader pattern in the current AI startup ecosystem, where Anthropic's model has become a foundational layer for third-party builders, particularly those focused on reasoning-heavy or research-oriented applications. Claude's relatively strong performance on complex analytical tasks and its longer context windows have made it a preferred choice for products where answer quality, not just fluency, is the core value proposition.

The broader trend Serno sits within is the emerging category of "AI orchestration" products — tools that treat individual LLMs not as end-points but as components within a larger epistemic pipeline. As model capabilities across frontier providers converge, competitive differentiation increasingly lives in how models are structured against each other, how outputs are synthesized, and how uncertainty is communicated to users. Serno's canvas mode is an early-stage articulation of this philosophy, positioning the startup in a space that is likely to grow substantially as users become more sophisticated about the limitations of single-model outputs and begin demanding workflows that make disagreement and doubt legible rather than hidden.

Read original article →

Detailed Analysis

Don't Miss a Deploy