01
No-crate implementation posture
The active runtime direction is a no-third-party-crate Rust/WASM target. The strongest supply-chain control is the smallest dependency graph possible. Unsafe code is confined to the unavoidable WASM memory boundary and must be paired with validation.
- No ML framework
- No tokenizer dependency
- No general-purpose web framework
- Isolated unsafe boundary
02
Generation loop
Generation uses preallocated runtime structures: model-owned scratch arenas, reusable logits, fixed-buffer sampling, bounded decoding, and clean request state. This keeps the hot path stable and makes failure states testable.
- ForwardScratch
- Reusable logits
- Fixed top-k candidate cap
- 64 KiB output cap
03
Diagnostics
The runtime exposes information the UI can display: model load state, quantization mode, prompt token count, generated token count, KV length, scratch usage, adapter apply count, and assembly checksums. These are part of correctness, not decorative telemetry.
- Local-only diagnostics
- No external analytics
- Escaped JSON output
- Reset and free-model state handling