
Architecture Specification
Quartz Prism-1
The 26B local model built for long-form video. Four compute axes. Runs on your GPU.
Scroll
Axis 01 · Sequence
Block Diffusion & Drafting
Controls which tokens generate in parallel using Manifold Block Diffusion and factor-level drafting. Designed for structure-consistent decoding.
Axis 02 · Depth
MoD & Spectrum Gating
Per-token layer budget using LayerSkip, CLaSp, and Attention Spectrum Uncertainty Gating for maximal reasoning efficiency.
Axis 03 · Relation
Tensor Product Attention
Replaces bilinear global attention to open a third relational axis. Uses Clifford Geometric Attention for directional reasoning-heavy layers.
Axis 04 · Hierarchy
Trivector Encoding
Controls granularity and structural hierarchy. Emerges from Clifford machinery to maintain deep narrative context over 256K tokens.
26B
Total Parameters
3.8B active during inference via TD-MoE compression.
256K
Context Target
Gated DeltaNet linear-memory for extreme long-range support.
MoE
Symphony Specialists
Code, reasoning, and video-oriented specialist expert forks.
Deployment Profile
Hardware
Format
VRAM
Speed
A100 40G
BF16
24–26 GB
3,000–5,000 t/s
2× RTX 4090
Q8_0
13–14 GB
2,500–4,500 t/s
RTX 4090 24G
Q4KM
7.5–8.5 GB
1,200–1,800 t/s
RTX 3060 12G
IQ3XS
5–6 GB
220–480 t/s
