← Reddit

How I use Claude for enterprise coding WITHOUT sending raw code: Exploiting Claude's multilingual latent space with "Kanji" AST obfuscation.

Reddit · Other_Train9419 · April 29, 2026
A developer created Verantyx, a local IDE that prevents proprietary code from being sent to Claude's API by converting Abstract Syntax Trees into Kanji-obfuscated structural puzzles. Claude's multilingual semantic understanding of Kanji allows it to solve these puzzles by retaining abstract structural context while never seeing the raw business logic. The approach works reliably most of the time but occasionally fails when Claude violates structural constraints, preventing reverse-compilation back to the original code.

Detailed Analysis

A developer posting to r/ClaudeAI has detailed a custom middleware architecture called "Verantyx" that attempts to solve a genuine tension in enterprise AI adoption: the compliance risk of transmitting proprietary source code to external APIs. The proposed technique, dubbed "Kanji Topology" or JCross IR, involves intercepting code at the Abstract Syntax Tree level and remapping identifiers — function names, variable names, domain-specific business logic strings — to Japanese Kanji characters chosen for their semantic proximity to the original token's conceptual role. For example, a function named `calculateQ3Revenue()` becomes `_JCross_算_ext_04()`, where 算 (meaning "calculate" or "arithmetic") is intended to preserve structural meaning without exposing the English-language proprietary string. Claude then operates on this obfuscated intermediate representation, returns a patch, and a local reverse-compilation dictionary maps the result back to the original codebase. The author describes a real failure mode: Claude occasionally ignores structural constraints and hallucinates a new AST topology, breaking the reverse-compilation mapping entirely.

The core technical claim — that Claude's multilingual training gives it a meaningful latent representation of Kanji semantics that can carry structural coding intent — is plausible in principle but unverified in practice. Research context confirms that no established or documented methodology exists in Anthropic's own Claude Code documentation or in the broader practitioner literature for this specific approach. Claude is demonstrably multilingual and does encode semantic relationships across languages in shared latent space, which is a well-understood property of large transformer models trained on diverse corpora. However, leveraging that property as a deliberate obfuscation-and-semantic-preservation mechanism for AST-level code transformation is a novel, experimental, and largely speculative application. The author's own admission that hallucination causes reverse-compilation failures is a significant reliability signal: Claude was not designed to operate on custom intermediate representations with structural fidelity guarantees, and the system's fragility under those conditions reflects that design boundary.

The compliance problem the author is attempting to solve is genuine and pressing across enterprise software development. Organizations subject to regulations such as SOC 2, HIPAA, or internal IP protection mandates face real friction when adopting frontier AI models, because invoking a cloud API with raw source code containing business logic, hardcoded credentials, or customer-referencing identifiers constitutes a data exposure event. Anthropic's own Claude Code tooling addresses this through a different architectural philosophy: directory isolation, permission-gated file access, Git-based checkpointing, and on-premises or sandboxed execution environments. These approaches keep proprietary code local not by obfuscating it before transmission but by limiting what Claude can access and act upon in the first place. The Verantyx approach takes an orthogonal path — transmit, but transform — which introduces its own failure modes and requires maintaining a local translation dictionary whose integrity is critical to the entire workflow.

The experiment sits within a broader industry pattern of developers building middleware layers to mediate between compliance requirements and the raw capability of frontier AI APIs. Similar patterns have emerged with retrieval-augmented generation pipelines that anonymize PII before embedding, tokenization-level redaction proxies, and prompt firewalls that strip sensitive context before API calls. The Kanji AST approach is more structurally sophisticated than simple string redaction — it attempts to preserve semantic signal across the obfuscation boundary rather than simply removing context — but it also introduces a harder engineering problem: the semantic fidelity of the mapping must be high enough that Claude can reason correctly, yet the surface-level representation must be opaque enough to satisfy compliance requirements. That is a narrow target, and the hallucination failure mode the author documents suggests the system has not yet reliably hit it. Whether cross-lingual semantic compression becomes a recognized pattern in DevSecOps, as the author predicts, will depend on whether the reliability gap can be closed — likely requiring tighter structural prompting, constrained decoding, or model fine-tuning on the custom IR format, none of which are trivial engineering challenges.

Read original article →