I think I know why deepseek is so good

Detailed Analysis

DeepSeek, the Chinese AI laboratory that garnered significant international attention in early 2025 for producing frontier-level models at reportedly low cost, appears to have exhibited a telling behavioral artifact: when prompted in certain ways, its model responded by identifying itself as "Claude, made by Anthropic." This slip, captured in a screenshot shared on Reddit and widely circulated, strongly suggests that DeepSeek's models were trained on outputs generated by Claude — a practice known in the AI industry as model distillation or, when done without authorization, as "model stealing" or unauthorized knowledge distillation.

The significance of this finding lies in what it implies about how DeepSeek achieved its remarkably capable performance. Training a model on the outputs of a more capable proprietary model — essentially using that model's responses as synthetic training data — can dramatically accelerate capability development and reduce the need for expensive human-annotated datasets. If DeepSeek leveraged Claude's outputs at scale, it would help explain how the lab produced models competitive with those from OpenAI and Anthropic at a fraction of the reported training cost. The "Claude, made by Anthropic" response leaking through is a classic fingerprint of this technique: the student model absorbs not just the knowledge and reasoning patterns of the teacher model but also its identity markers.

Anthropic's terms of service, like those of most major AI providers, explicitly prohibit using Claude's outputs to train competing AI models. This makes the implied practice not just a technical curiosity but a potential legal and ethical violation. Anthropic has not publicly commented extensively on the specific DeepSeek situation, but the company has been vocal about protecting its intellectual property and the integrity of its models. OpenAI raised similar concerns about DeepSeek around the same period, alleging that DeepSeek may have used OpenAI's API outputs for training purposes as well.

This incident connects to a much broader and accelerating tension in the AI industry around data provenance, synthetic data, and the boundaries of permissible training practices. As frontier models become increasingly powerful, their outputs become extraordinarily valuable as training signal — creating strong incentives for competitors to harvest them. The "identity leak" phenomenon, where a distilled model reveals its true teacher, has emerged as an important forensic tool for detecting unauthorized distillation, and researchers have begun developing more systematic methods for watermarking model outputs to make such attribution more reliable and legally actionable.

The episode underscores a structural challenge facing leading AI labs: the very act of deploying capable models through APIs or consumer products exposes their outputs to potential harvesting at scale. DeepSeek's apparent success — whether through distillation, independent research, or some combination — demonstrated that the gap between frontier Western labs and well-resourced competitors can close faster than anticipated, reshaping geopolitical and commercial calculations around AI leadership. For Anthropic specifically, the incident highlights that Claude's capabilities may be inadvertently accelerating competitors' development, raising difficult questions about deployment strategy, API access controls, and the enforceability of terms of service against international actors.

Read original article →

Detailed Analysis

Don't Miss a Deploy