Based on my experience:Codex vs Claude vs Gemini

I got gemini with something that I forgot. I got claude $100 plan from a client (to finish his work faster) but i already purchased chatgpt $100 plan. So: Gemini: 3.1 pro & 3.5 flash: It's far better than claude & codex(except coding), if you provide full

Detailed Analysis

A developer sharing hands-on experience across three major AI platforms — Google Gemini, OpenAI's Codex, and Anthropic's Claude — offers a granular, practitioner-level comparison that cuts through marketing claims to focus on real-world utility. The author, operating on premium subscription tiers for both Claude (the $100/month plan) and ChatGPT/Codex (also $100/month), evaluates each platform across coding capability, information retrieval accuracy, context handling, and usage efficiency. The verdict is notably fragmented: no single platform dominates across all dimensions, with each exhibiting meaningful strengths and frustrating weaknesses that depend heavily on the task at hand.

Gemini's performance is characterized as strong for general information retrieval and research tasks, particularly when the user provides full contextual input, but the platform falls short for coding workflows. The author's specific complaint about Gemini is a lack of contextual inference — the model requires exhaustive, explicit specification for coding tasks rather than being able to extrapolate from reference examples. This stands in contrast to Claude and Codex, both of which the author credits with the ability to replicate the style and structure of existing pages or files with minimal prompting. Gemini's Flash variants are dismissed as fast but low-quality, while Pro models are deemed functional though still inferior for development work.

Claude's evaluation centers on significant version-to-version inconsistency across Opus 4.5, 4.6, and 4.7. The author describes Opus 4.5 as unreliable — producing unpredictable quality swings serious enough to warrant mandatory file backups before use — while Opus 4.6 earned enough trust for direct production deployment. Opus 4.7, however, is described as having degraded noticeably over time, with quality declining an estimated 30–40% and usage limit consumption accelerating, ultimately placing it back in the unreliable category alongside 4.5. This perceived degradation in Claude's newer models echoes a pattern frequently reported in AI user communities, where model updates intended to improve behavior can inadvertently shift performance in ways that frustrate established workflows.

Codex, built on GPT-5.3 and GPT-5.5, earns praise for coding proficiency and image generation but is criticized for factual reliability, particularly on time-sensitive queries involving recent regulatory or governmental changes. The author's experience highlights a persistent weakness in large language models: confident but inaccurate retrieval of recent real-world information even when web search tools are available. A further operational concern with Codex is performance throttling — the author observed measurable slowdowns and quality degradation after consuming roughly 50% of the weekly usage limit, a degradation significant enough to prompt subscription cancellation. The author's overall ranking places Claude ahead of Codex for frontend development and UI/UX quality, while crediting Codex with an edge in backend tasks following the GPT-5.5 update.

The broader significance of this comparative account lies in what it reveals about the current state of the premium AI assistant market. Despite substantial monthly investment — $100 per platform — no single tool delivers consistent, reliable performance across the full spectrum of professional development tasks. The observation that both Claude and Codex are "making mistakes & not reliable as previous months" points to a systemic challenge in the industry: as these models undergo rapid iterative updates, users who have calibrated their workflows to specific model behaviors find themselves repeatedly recalibrating. The fragmentation of capability across frontend, backend, research, and context management suggests that sophisticated users are likely to continue operating across multiple platforms simultaneously rather than consolidating around a single tool, a dynamic that complicates the path to sustainable subscription economics for any one provider.

Read original article →

Detailed Analysis

Don't Miss a Deploy