← Reddit

Claude wrote me a native DOS/Win Benchmark tool in C

Reddit · mwdmeyer · April 25, 2026
A DOS/Windows 95-98 benchmark tool designed to test CPU performance on 2D RTS games like StarCraft has been created and can run on systems up to Windows 11. The application includes both non-MMX and MMX tests, with an accompanying website where users can submit their benchmark scores online for comparison. The creator has tested the tool on various processors including Pentium and K6-3 systems and invites the community to run benchmarks on legacy hardware such as Cyrix processors.

Detailed Analysis

A hobbyist developer has released a retro-focused CPU benchmark utility called STDOS, built collaboratively with Anthropic's Claude AI assistant using Claude Code over approximately 15–20 hours of development time. The tool targets DOS and Windows 9x environments but maintains compatibility through Windows 11, and is specifically designed to measure the kind of integer-based, 2D computation workloads characteristic of classic real-time strategy games like StarCraft 1. The project includes both an MMX and a non-MMX test path, reflecting the developer's interest in comparing period-accurate processors such as the Cyrix 6x86, Pentium MMX, and AMD K6-3. A companion website at starrts.vogonswiki.com allows users to submit benchmark scores online, functioning similarly to modern crowd-sourced performance databases like Geekbench.

The project illustrates a meaningful practical application of Claude Code's agentic coding capabilities: producing functional, low-level C code targeting legacy system constraints, including real-mode DOS compatibility and MMX instruction set extensions. Writing C for DOS and Win9x environments demands familiarity with older compiler toolchains such as Turbo C or Borland C++, memory segmentation models, and hardware-adjacent programming patterns that differ substantially from modern development targets. The fact that the developer credits Claude as the primary code author — with human time spent largely on direction, testing, and integration across roughly two dozen hours — suggests the AI was handling non-trivial implementation details, not merely boilerplate scaffolding.

This use case connects to a broader pattern in which AI coding assistants are increasingly being applied to niche, technically constrained domains rather than only mainstream software stacks. Anthropic has highlighted Claude's performance on real-world software engineering benchmarks like SWE-bench and Terminal-Bench 2.0, where it demonstrates agentic capability in autonomous, multi-step coding tasks. Retro computing, with its strict hardware targets, compiler quirks, and absence of modern abstractions, represents a genuinely demanding test of a model's ability to reason about legacy technical constraints — making this project an informally compelling demonstration of Claude Code's generalization beyond contemporary development environments.

The community dimension of the project is also notable. By building a score-submission website alongside the benchmark binary itself, the developer is effectively crowd-sourcing a historical performance dataset for processors that modern benchmarking infrastructure has long since stopped supporting. Cyrix processors in particular occupy a contested place in computing history — often maligned for performance inconsistencies relative to Intel counterparts at equivalent clock speeds — and comparative data gathered on original hardware running period-appropriate software loads would carry genuine value for retrocomputing researchers and enthusiasts. The developer's stated willingness to open-source both the benchmark and the website if community interest warrants it further aligns the project with the collaborative ethos of retro hardware preservation communities.

Taken together, the project reflects how AI-assisted development is lowering the barrier to entry for technically ambitious hobbyist software, particularly in domains where the required knowledge is specialized and documentation is sparse. Building a cross-era benchmark in C that compiles for DOS, executes correctly on 1990s hardware, and also submits data to a modern web backend is a multi-layered engineering task. Claude Code's role in compressing that effort into a 15–20 hour personal project signals that AI pair-programming tools are becoming meaningfully useful not just for enterprise developers working in mainstream languages and frameworks, but for individuals pursuing technically deep, historically motivated software goals at the fringes of the computing ecosystem.

Read original article →