Spaces:
Running on Zero
Running on Zero
| # OBLITERATUS Pipeline Efficiency Audit | |
| **Date:** 2026-03-03 | |
| **Scope:** All obliteration methods in `abliterate.py` (5,076 lines), `bayesian_optimizer.py`, `informed_pipeline.py`, and 4 ablation strategies. | |
| --- | |
| ## Executive Summary | |
| The 6-stage pipeline (SUMMON → PROBE → DISTILL → EXCISE → VERIFY → REBIRTH) is architecturally sound with good separation of concerns. Memory hygiene between stages is correct. The rank-1 projection math is efficient. Quantization handling is robust. | |
| **8 concrete efficiency issues found.** Estimated cumulative impact: **~40-60% wall-clock reduction** on typical runs (8B model, advanced/surgical methods). Ordered by ROI (ease × impact). | |
| --- | |
| ## HIGH PRIORITY (Fix This Week) | |
| ### 1. PROBE runs 1,536 prompts with zero batching | |
| **Location:** `abliterate.py:1074-1088` | |
| **Impact:** Largest single wall-clock bottleneck (~77s on 8B model, reducible to ~10s) | |
| The activation collection loop processes each prompt individually with a full forward pass + GC cycle between each one. With 512 harmful + 512 harmless + 512 jailbreak prompts = 1,536 serial forward passes. | |
| The `_free_gpu_memory()` call at line 1086 is **inside the per-prompt loop**, adding ~20ms × 1,536 = 30s of pure garbage collection overhead. | |
| ```python | |
| # CURRENT (serial) | |
| for i, prompt in enumerate(prompts): | |
| inputs = tokenizer(prompt, return_tensors="pt", ...) | |
| model(**inputs) | |
| del inputs | |
| self._free_gpu_memory() # <-- 30s wasted | |
| ``` | |
| **Fix:** Batch prompts (batch_size=8-16). Hooks already handle batch dimension correctly via `hidden[:, -1, :]`. Move `_free_gpu_memory()` to run every N batches, not every prompt. | |
| **Speedup:** ~7-8x on PROBE stage. | |
| --- | |
| ### 2. VERIFY generates 30 completions sequentially — no batching | |
| **Location:** `abliterate.py:4622-4670` | |
| **Impact:** Second-largest wall-clock cost (~57s on 8B model, reducible to ~15s) | |
| Each of the 30 refusal-test prompts gets an independent `model.generate(max_new_tokens=128)` call. At ~15ms/token on an 8B model, that's 30 × 128 × 15ms ≈ 57s. | |
| **Fix:** Batch the generation calls (batch_size=4-8). `model.generate()` supports batched inputs natively. The tokenizer already handles padding. | |
| **Speedup:** ~4x on VERIFY stage. | |
| --- | |
| ### 3. SAE training is forced to CPU with no early stopping | |
| **Location:** `abliterate.py:1579-1583` | |
| **Impact:** Moderate — adds ~20-40s per run when SAE features are enabled (surgical, nuclear methods) | |
| SAE training runs 30 fixed epochs per strong layer on CPU. With 15-20 strong layers, that's 450-600 CPU training epochs. No convergence check, no early stopping. | |
| The `device="cpu"` is overly conservative — the memory-aware cap at line 1570-1578 already validates GPU headroom, and a typical SAE encoder (expansion=2, hidden_dim=4096) is only ~128MB. | |
| **Fix:** | |
| 1. Add early stopping when reconstruction loss plateaus (< 0.1% improvement over 3 epochs) | |
| 2. Use GPU when `free_mb > sae_mem_mb + 1024` (1GB headroom) | |
| 3. Reduce default epochs from 30 to 15 with convergence guard | |
| --- | |
| ## MEDIUM PRIORITY (Fix This Sprint) | |
| ### 4. `_distill_inner()` is a degraded copy of `_distill()` — drops half the SOTA techniques | |
| **Location:** `abliterate.py:2958-3055` vs `1102-1750` | |
| **Impact:** Quality regression on refinement passes 2+, not pure compute waste | |
| The iterative refinement path calls `_distill_inner()` which is a simplified ~100-line copy that skips: Wasserstein-optimal extraction, layer-adaptive strength, float layer interpolation, SAE features, EGA, CoT-aware orthogonalization, and RDO refinement. | |
| This means "true iterative refinement" actually produces **worse directions on later passes** because it drops the analysis-guided enhancements. | |
| **Fix:** Extract shared SVD/direction logic into `_extract_directions(full_features=True/False)` and call from both paths. At minimum, keep whitened SVD and jailbreak-contrastive blending in the inner path. | |
| --- | |
| ### 5. Bayesian optimizer clones ALL weight tensors — ~7GB memory overhead | |
| **Location:** `bayesian_optimizer.py:300-341` | |
| **Impact:** Memory pressure on GPU-constrained setups; 50× full-restore cycles | |
| The optimizer saves a complete clone of every weight tensor across all strong layers. For a 7B model with 32 layers, that's ~7GB of clones sitting in memory during all 50 trials. | |
| After each trial, `_restore_all()` copies all clones back — 50 trials × full-model memcpy. | |
| **Fix (easy):** Only clone weights in `_strong_layers` (already partially done, but `named_parameters()` crawl still catches everything). Drop the `seen_data_ptrs` set once the loop is tightened. | |
| **Fix (better):** Store the projection delta `Δ = scale * d @ (d^T @ W)` per layer instead of cloning the full weight. Rollback = `W += Δ`. This reduces storage from O(hidden_dim²) to O(hidden_dim) per direction per layer. | |
| --- | |
| ### 6. Norm computation in `_project_out_advanced()` traverses the full matrix twice | |
| **Location:** `abliterate.py:3477-3486` | |
| **Impact:** ~4,800 unnecessary full-matrix norm computations per run (8-direction surgical) | |
| When `norm_preserve=True`, the code computes `W.norm()` before projection and `W.norm()` after projection. Each norm traverses the full weight matrix (16M elements for 4096×4096). | |
| With 8 directions × 30 layers × 10 weight matrices = 2,400 projections → 4,800 norm calls → 77 billion unnecessary FLOPs. | |
| **Fix:** After rank-1 update `W' = W - scale * d @ (d^T @ W)`, the new norm satisfies: | |
| `||W'||² = ||W||² - 2·scale·||d^T @ W||² + scale²·||d^T @ W||²·||d||²` | |
| Since `||d|| = 1`: `||W'||² = ||W||² - scale·(2 - scale)·||coeff||²` | |
| This replaces a 16M-element norm with a single `coeff.pow(2).sum()` call (~4K FLOPs). | |
| --- | |
| ## LOW PRIORITY (Backlog) | |
| ### 7. Gram-Schmidt appears 3 times as O(n²) nested loops | |
| **Location:** `abliterate.py:1168-1173`, `1361-1367`, `3038-3044` | |
| **Impact:** Minimal compute but code quality issue | |
| Three separate implementations of the same Gram-Schmidt orthogonalization with nested Python loops. With n_directions=8, it's 28 dot products per call — trivial compute but (a) DRY violation, (b) numerically inferior to `torch.linalg.qr()`. | |
| **Fix:** Extract to `_orthogonalize_subspace(sub: Tensor) -> Tensor` using QR decomposition. Single call site, single test, better numerics. | |
| --- | |
| ### 8. Pre-EXCISE baseline KL capture re-forward-passes 100 prompts already seen in PROBE | |
| **Location:** `abliterate.py:2313-2366` | |
| **Impact:** ~700ms wasted (minor) | |
| `_capture_baseline_kl_logits()` runs 100 harmless prompts through the model to capture pre-EXCISE logits. But PROBE already ran those same prompts and captured hidden states at every layer. The logits could be computed as `lm_head(last_hidden_state)` — a single matmul. | |
| **Fix:** After PROBE, compute `baseline_logits = model.lm_head(harmful_means[last_layer])` on the cached activations. Skip the 100-prompt forward pass entirely. | |
| --- | |
| ## What's Done Well | |
| | Area | Assessment | | |
| |------|------------| | |
| | **Stage-boundary memory cleanup** | Correct — `_free_gpu_memory()` + explicit dict clearing between stages | | |
| | **Rank-1 projection math** | Efficient — `W @ d` then `d.T * coeff` instead of materializing `I - dd^T` | | |
| | **Quantization dequant/requant** | Robust — handles bitsandbytes NF4, GPTQ, AWQ; fails loudly on unsupported formats | | |
| | **Incremental expert mean** | Smart — Welford running mean in `_transplant_expert_weights()` avoids stacking all expert weights | | |
| | **Router stabilization** | Defensive — `_stabilize_router_weights()` after MoE projection prevents CUDA crashes | | |
| | **Large model mode** | Pragmatic — caps directions, SAE features, refinement passes for 120B+ models | | |
| | **Event emission** | Clean — `_emit()` / `_on_stage()` / `_on_log()` callbacks for UI integration without coupling | | |
| --- | |
| ## Method Efficiency Comparison | |
| | Method | PROBE Cost | DISTILL Cost | EXCISE Cost | VERIFY Cost | Primary Bottleneck | | |
| |--------|-----------|-------------|-------------|-------------|-------------------| | |
| | **basic** | 1x (1,024 prompts) | 1x (diff-in-means) | 1x (~10 projections) | 1x | PROBE | | |
| | **advanced** | 2x (re-probe on pass 2) | 2x (re-distill) | 2x (2 passes) | 1x | PROBE × 2 | | |
| | **aggressive** | 3x (re-probe on passes 2,3) | 3x (re-distill) | 3x (3 passes, 8 dirs) | 1x | PROBE × 3 | | |
| | **surgical** | 1.5x (+jailbreak prompts) | 2x (SAE training) | 2x (head surgery + EGA) | 1x | SAE on CPU | | |
| | **optimized** | 1.5x (+jailbreak) | 1x | 50x (Bayesian trials) | 1x | Bayesian optimizer | | |
| | **inverted** | 1.5x (+jailbreak) | 1x | 2x (reflection math) | 1x | PROBE | | |
| | **nuclear** | 1.5x (+jailbreak) | 2x (SAE) | 3x (all techniques) | 1x | SAE + PROBE | | |
| | **informed** | 1x | 1.5x (analysis modules) | 1x-3x (dynamic) | 1.5x (Ouroboros check) | Analysis modules | | |
| --- | |
| ## Prioritized Action Plan | |
| 1. **Batch PROBE forward passes** — immediate 7-8x speedup on largest bottleneck | |
| 2. **Batch VERIFY generation** — immediate 4x speedup on second bottleneck | |
| 3. **Add SAE early stopping + GPU path** — 2-3x speedup on SAE-enabled methods | |
| 4. **Unify `_distill` / `_distill_inner`** — quality fix, prevents direction degradation | |
| 5. **Optimize Bayesian rollback storage** — memory fix for GPU-constrained users | |
| 6. **Analytical norm computation** — eliminates 77B unnecessary FLOPs | |
| 7. **DRY Gram-Schmidt** — code quality | |
| 8. **Cache KL baseline from PROBE** — minor speedup | |