Spaces:

pliny-the-prompter
/

obliteratus

Running on Zero

App Files Files Community

obliteratus / PIPELINE_EFFICIENCY_AUDIT.md

pliny-the-prompter

Upload 130 files

ae16715 verified 3 months ago

preview code

raw

history blame contribute delete

9.38 kB

	# OBLITERATUS Pipeline Efficiency Audit

	Date: 2026-03-03
	Scope: All obliteration methods in `abliterate.py` (5,076 lines), `bayesian_optimizer.py`, `informed_pipeline.py`, and 4 ablation strategies.

	---

	## Executive Summary

	The 6-stage pipeline (SUMMON → PROBE → DISTILL → EXCISE → VERIFY → REBIRTH) is architecturally sound with good separation of concerns. Memory hygiene between stages is correct. The rank-1 projection math is efficient. Quantization handling is robust.

	8 concrete efficiency issues found. Estimated cumulative impact: ~40-60% wall-clock reduction on typical runs (8B model, advanced/surgical methods). Ordered by ROI (ease × impact).

	---

	## HIGH PRIORITY (Fix This Week)

	### 1. PROBE runs 1,536 prompts with zero batching

	Location: `abliterate.py:1074-1088`
	Impact: Largest single wall-clock bottleneck (~77s on 8B model, reducible to ~10s)

	The activation collection loop processes each prompt individually with a full forward pass + GC cycle between each one. With 512 harmful + 512 harmless + 512 jailbreak prompts = 1,536 serial forward passes.

	The `_free_gpu_memory()` call at line 1086 is inside the per-prompt loop, adding ~20ms × 1,536 = 30s of pure garbage collection overhead.

	```python
	# CURRENT (serial)
	for i, prompt in enumerate(prompts):
	inputs = tokenizer(prompt, return_tensors="pt", ...)
	model(**inputs)
	del inputs
	self._free_gpu_memory() # <-- 30s wasted
	```

	Fix: Batch prompts (batch_size=8-16). Hooks already handle batch dimension correctly via `hidden[:, -1, :]`. Move `_free_gpu_memory()` to run every N batches, not every prompt.

	Speedup: ~7-8x on PROBE stage.

	---

	### 2. VERIFY generates 30 completions sequentially — no batching

	Location: `abliterate.py:4622-4670`
	Impact: Second-largest wall-clock cost (~57s on 8B model, reducible to ~15s)

	Each of the 30 refusal-test prompts gets an independent `model.generate(max_new_tokens=128)` call. At ~15ms/token on an 8B model, that's 30 × 128 × 15ms ≈ 57s.

	Fix: Batch the generation calls (batch_size=4-8). `model.generate()` supports batched inputs natively. The tokenizer already handles padding.

	Speedup: ~4x on VERIFY stage.

	---

	### 3. SAE training is forced to CPU with no early stopping

	Location: `abliterate.py:1579-1583`
	Impact: Moderate — adds ~20-40s per run when SAE features are enabled (surgical, nuclear methods)

	SAE training runs 30 fixed epochs per strong layer on CPU. With 15-20 strong layers, that's 450-600 CPU training epochs. No convergence check, no early stopping.

	The `device="cpu"` is overly conservative — the memory-aware cap at line 1570-1578 already validates GPU headroom, and a typical SAE encoder (expansion=2, hidden_dim=4096) is only ~128MB.

	Fix:
	1. Add early stopping when reconstruction loss plateaus (< 0.1% improvement over 3 epochs)
	2. Use GPU when `free_mb > sae_mem_mb + 1024` (1GB headroom)
	3. Reduce default epochs from 30 to 15 with convergence guard

	---

	## MEDIUM PRIORITY (Fix This Sprint)

	### 4. `_distill_inner()` is a degraded copy of `_distill()` — drops half the SOTA techniques

	Location: `abliterate.py:2958-3055` vs `1102-1750`
	Impact: Quality regression on refinement passes 2+, not pure compute waste

	The iterative refinement path calls `_distill_inner()` which is a simplified ~100-line copy that skips: Wasserstein-optimal extraction, layer-adaptive strength, float layer interpolation, SAE features, EGA, CoT-aware orthogonalization, and RDO refinement.

	This means "true iterative refinement" actually produces worse directions on later passes because it drops the analysis-guided enhancements.

	Fix: Extract shared SVD/direction logic into `_extract_directions(full_features=True/False)` and call from both paths. At minimum, keep whitened SVD and jailbreak-contrastive blending in the inner path.

	---

	### 5. Bayesian optimizer clones ALL weight tensors — ~7GB memory overhead

	Location: `bayesian_optimizer.py:300-341`
	Impact: Memory pressure on GPU-constrained setups; 50× full-restore cycles

	The optimizer saves a complete clone of every weight tensor across all strong layers. For a 7B model with 32 layers, that's ~7GB of clones sitting in memory during all 50 trials.

	After each trial, `_restore_all()` copies all clones back — 50 trials × full-model memcpy.

	Fix (easy): Only clone weights in `_strong_layers` (already partially done, but `named_parameters()` crawl still catches everything). Drop the `seen_data_ptrs` set once the loop is tightened.

	Fix (better): Store the projection delta `Δ = scale * d @ (d^T @ W)` per layer instead of cloning the full weight. Rollback = `W += Δ`. This reduces storage from O(hidden_dim²) to O(hidden_dim) per direction per layer.

	---

	### 6. Norm computation in `_project_out_advanced()` traverses the full matrix twice

	Location: `abliterate.py:3477-3486`
	Impact: ~4,800 unnecessary full-matrix norm computations per run (8-direction surgical)

	When `norm_preserve=True`, the code computes `W.norm()` before projection and `W.norm()` after projection. Each norm traverses the full weight matrix (16M elements for 4096×4096).

	With 8 directions × 30 layers × 10 weight matrices = 2,400 projections → 4,800 norm calls → 77 billion unnecessary FLOPs.

	Fix: After rank-1 update `W' = W - scale * d @ (d^T @ W)`, the new norm satisfies:
	`\|\|W'\|\|² = \|\|W\|\|² - 2·scale·\|\|d^T @ W\|\|² + scale²·\|\|d^T @ W\|\|²·\|\|d\|\|²`

	Since `\|\|d\|\| = 1`: `\|\|W'\|\|² = \|\|W\|\|² - scale·(2 - scale)·\|\|coeff\|\|²`

	This replaces a 16M-element norm with a single `coeff.pow(2).sum()` call (~4K FLOPs).

	---

	## LOW PRIORITY (Backlog)

	### 7. Gram-Schmidt appears 3 times as O(n²) nested loops

	Location: `abliterate.py:1168-1173`, `1361-1367`, `3038-3044`
	Impact: Minimal compute but code quality issue

	Three separate implementations of the same Gram-Schmidt orthogonalization with nested Python loops. With n_directions=8, it's 28 dot products per call — trivial compute but (a) DRY violation, (b) numerically inferior to `torch.linalg.qr()`.

	Fix: Extract to `_orthogonalize_subspace(sub: Tensor) -> Tensor` using QR decomposition. Single call site, single test, better numerics.

	---

	### 8. Pre-EXCISE baseline KL capture re-forward-passes 100 prompts already seen in PROBE

	Location: `abliterate.py:2313-2366`
	Impact: ~700ms wasted (minor)

	`_capture_baseline_kl_logits()` runs 100 harmless prompts through the model to capture pre-EXCISE logits. But PROBE already ran those same prompts and captured hidden states at every layer. The logits could be computed as `lm_head(last_hidden_state)` — a single matmul.

	Fix: After PROBE, compute `baseline_logits = model.lm_head(harmful_means[last_layer])` on the cached activations. Skip the 100-prompt forward pass entirely.

	---

	## What's Done Well

	\| Area \| Assessment \|
	\|------\|------------\|
	\| Stage-boundary memory cleanup \| Correct — `_free_gpu_memory()` + explicit dict clearing between stages \|
	\| Rank-1 projection math \| Efficient — `W @ d` then `d.T * coeff` instead of materializing `I - dd^T` \|
	\| Quantization dequant/requant \| Robust — handles bitsandbytes NF4, GPTQ, AWQ; fails loudly on unsupported formats \|
	\| Incremental expert mean \| Smart — Welford running mean in `_transplant_expert_weights()` avoids stacking all expert weights \|
	\| Router stabilization \| Defensive — `_stabilize_router_weights()` after MoE projection prevents CUDA crashes \|
	\| Large model mode \| Pragmatic — caps directions, SAE features, refinement passes for 120B+ models \|
	\| Event emission \| Clean — `_emit()` / `_on_stage()` / `_on_log()` callbacks for UI integration without coupling \|

	---

	## Method Efficiency Comparison

	\| Method \| PROBE Cost \| DISTILL Cost \| EXCISE Cost \| VERIFY Cost \| Primary Bottleneck \|
	\|--------\|-----------\|-------------\|-------------\|-------------\|-------------------\|
	\| basic \| 1x (1,024 prompts) \| 1x (diff-in-means) \| 1x (~10 projections) \| 1x \| PROBE \|
	\| advanced \| 2x (re-probe on pass 2) \| 2x (re-distill) \| 2x (2 passes) \| 1x \| PROBE × 2 \|
	\| aggressive \| 3x (re-probe on passes 2,3) \| 3x (re-distill) \| 3x (3 passes, 8 dirs) \| 1x \| PROBE × 3 \|
	\| surgical \| 1.5x (+jailbreak prompts) \| 2x (SAE training) \| 2x (head surgery + EGA) \| 1x \| SAE on CPU \|
	\| optimized \| 1.5x (+jailbreak) \| 1x \| 50x (Bayesian trials) \| 1x \| Bayesian optimizer \|
	\| inverted \| 1.5x (+jailbreak) \| 1x \| 2x (reflection math) \| 1x \| PROBE \|
	\| nuclear \| 1.5x (+jailbreak) \| 2x (SAE) \| 3x (all techniques) \| 1x \| SAE + PROBE \|
	\| informed \| 1x \| 1.5x (analysis modules) \| 1x-3x (dynamic) \| 1.5x (Ouroboros check) \| Analysis modules \|

	---

	## Prioritized Action Plan

	1. Batch PROBE forward passes — immediate 7-8x speedup on largest bottleneck
	2. Batch VERIFY generation — immediate 4x speedup on second bottleneck
	3. Add SAE early stopping + GPU path — 2-3x speedup on SAE-enabled methods
	4. Unify `_distill` / `_distill_inner` — quality fix, prevents direction degradation
	5. Optimize Bayesian rollback storage — memory fix for GPU-constrained users
	6. Analytical norm computation — eliminates 77B unnecessary FLOPs
	7. DRY Gram-Schmidt — code quality
	8. Cache KL baseline from PROBE — minor speedup

	# OBLITERATUS Pipeline Efficiency Audit

	Date: 2026-03-03
	Scope: All obliteration methods in `abliterate.py` (5,076 lines), `bayesian_optimizer.py`, `informed_pipeline.py`, and 4 ablation strategies.

	---

	## Executive Summary

	The 6-stage pipeline (SUMMON → PROBE → DISTILL → EXCISE → VERIFY → REBIRTH) is architecturally sound with good separation of concerns. Memory hygiene between stages is correct. The rank-1 projection math is efficient. Quantization handling is robust.

	8 concrete efficiency issues found. Estimated cumulative impact: ~40-60% wall-clock reduction on typical runs (8B model, advanced/surgical methods). Ordered by ROI (ease × impact).

	---

	## HIGH PRIORITY (Fix This Week)

	### 1. PROBE runs 1,536 prompts with zero batching

	Location: `abliterate.py:1074-1088`
	Impact: Largest single wall-clock bottleneck (~77s on 8B model, reducible to ~10s)

	The activation collection loop processes each prompt individually with a full forward pass + GC cycle between each one. With 512 harmful + 512 harmless + 512 jailbreak prompts = 1,536 serial forward passes.

	The `_free_gpu_memory()` call at line 1086 is inside the per-prompt loop, adding ~20ms × 1,536 = 30s of pure garbage collection overhead.

	```python
	# CURRENT (serial)
	for i, prompt in enumerate(prompts):
	inputs = tokenizer(prompt, return_tensors="pt", ...)
	model(**inputs)
	del inputs
	self._free_gpu_memory() # <-- 30s wasted
	```

	Fix: Batch prompts (batch_size=8-16). Hooks already handle batch dimension correctly via `hidden[:, -1, :]`. Move `_free_gpu_memory()` to run every N batches, not every prompt.

	Speedup: ~7-8x on PROBE stage.

	---

	### 2. VERIFY generates 30 completions sequentially — no batching

	Location: `abliterate.py:4622-4670`
	Impact: Second-largest wall-clock cost (~57s on 8B model, reducible to ~15s)

	Each of the 30 refusal-test prompts gets an independent `model.generate(max_new_tokens=128)` call. At ~15ms/token on an 8B model, that's 30 × 128 × 15ms ≈ 57s.

	Fix: Batch the generation calls (batch_size=4-8). `model.generate()` supports batched inputs natively. The tokenizer already handles padding.

	Speedup: ~4x on VERIFY stage.

	---

	### 3. SAE training is forced to CPU with no early stopping

	Location: `abliterate.py:1579-1583`
	Impact: Moderate — adds ~20-40s per run when SAE features are enabled (surgical, nuclear methods)

	SAE training runs 30 fixed epochs per strong layer on CPU. With 15-20 strong layers, that's 450-600 CPU training epochs. No convergence check, no early stopping.

	The `device="cpu"` is overly conservative — the memory-aware cap at line 1570-1578 already validates GPU headroom, and a typical SAE encoder (expansion=2, hidden_dim=4096) is only ~128MB.

	Fix:
	1. Add early stopping when reconstruction loss plateaus (< 0.1% improvement over 3 epochs)
	2. Use GPU when `free_mb > sae_mem_mb + 1024` (1GB headroom)
	3. Reduce default epochs from 30 to 15 with convergence guard

	---

	## MEDIUM PRIORITY (Fix This Sprint)

	### 4. `_distill_inner()` is a degraded copy of `_distill()` — drops half the SOTA techniques

	Location: `abliterate.py:2958-3055` vs `1102-1750`
	Impact: Quality regression on refinement passes 2+, not pure compute waste

	The iterative refinement path calls `_distill_inner()` which is a simplified ~100-line copy that skips: Wasserstein-optimal extraction, layer-adaptive strength, float layer interpolation, SAE features, EGA, CoT-aware orthogonalization, and RDO refinement.

	This means "true iterative refinement" actually produces worse directions on later passes because it drops the analysis-guided enhancements.

	Fix: Extract shared SVD/direction logic into `_extract_directions(full_features=True/False)` and call from both paths. At minimum, keep whitened SVD and jailbreak-contrastive blending in the inner path.

	---

	### 5. Bayesian optimizer clones ALL weight tensors — ~7GB memory overhead

	Location: `bayesian_optimizer.py:300-341`
	Impact: Memory pressure on GPU-constrained setups; 50× full-restore cycles

	The optimizer saves a complete clone of every weight tensor across all strong layers. For a 7B model with 32 layers, that's ~7GB of clones sitting in memory during all 50 trials.

	After each trial, `_restore_all()` copies all clones back — 50 trials × full-model memcpy.

	Fix (easy): Only clone weights in `_strong_layers` (already partially done, but `named_parameters()` crawl still catches everything). Drop the `seen_data_ptrs` set once the loop is tightened.

	Fix (better): Store the projection delta `Δ = scale * d @ (d^T @ W)` per layer instead of cloning the full weight. Rollback = `W += Δ`. This reduces storage from O(hidden_dim²) to O(hidden_dim) per direction per layer.

	---

	### 6. Norm computation in `_project_out_advanced()` traverses the full matrix twice

	Location: `abliterate.py:3477-3486`
	Impact: ~4,800 unnecessary full-matrix norm computations per run (8-direction surgical)

	When `norm_preserve=True`, the code computes `W.norm()` before projection and `W.norm()` after projection. Each norm traverses the full weight matrix (16M elements for 4096×4096).

	With 8 directions × 30 layers × 10 weight matrices = 2,400 projections → 4,800 norm calls → 77 billion unnecessary FLOPs.

	Fix: After rank-1 update `W' = W - scale * d @ (d^T @ W)`, the new norm satisfies:
	`\|\|W'\|\|² = \|\|W\|\|² - 2·scale·\|\|d^T @ W\|\|² + scale²·\|\|d^T @ W\|\|²·\|\|d\|\|²`

	Since `\|\|d\|\| = 1`: `\|\|W'\|\|² = \|\|W\|\|² - scale·(2 - scale)·\|\|coeff\|\|²`

	This replaces a 16M-element norm with a single `coeff.pow(2).sum()` call (~4K FLOPs).

	---

	## LOW PRIORITY (Backlog)

	### 7. Gram-Schmidt appears 3 times as O(n²) nested loops

	Location: `abliterate.py:1168-1173`, `1361-1367`, `3038-3044`
	Impact: Minimal compute but code quality issue

	Three separate implementations of the same Gram-Schmidt orthogonalization with nested Python loops. With n_directions=8, it's 28 dot products per call — trivial compute but (a) DRY violation, (b) numerically inferior to `torch.linalg.qr()`.

	Fix: Extract to `_orthogonalize_subspace(sub: Tensor) -> Tensor` using QR decomposition. Single call site, single test, better numerics.

	---

	### 8. Pre-EXCISE baseline KL capture re-forward-passes 100 prompts already seen in PROBE

	Location: `abliterate.py:2313-2366`
	Impact: ~700ms wasted (minor)

	`_capture_baseline_kl_logits()` runs 100 harmless prompts through the model to capture pre-EXCISE logits. But PROBE already ran those same prompts and captured hidden states at every layer. The logits could be computed as `lm_head(last_hidden_state)` — a single matmul.

	Fix: After PROBE, compute `baseline_logits = model.lm_head(harmful_means[last_layer])` on the cached activations. Skip the 100-prompt forward pass entirely.

	---

	## What's Done Well

	\| Area \| Assessment \|
	\|------\|------------\|
	\| Stage-boundary memory cleanup \| Correct — `_free_gpu_memory()` + explicit dict clearing between stages \|
	\| Rank-1 projection math \| Efficient — `W @ d` then `d.T * coeff` instead of materializing `I - dd^T` \|
	\| Quantization dequant/requant \| Robust — handles bitsandbytes NF4, GPTQ, AWQ; fails loudly on unsupported formats \|
	\| Incremental expert mean \| Smart — Welford running mean in `_transplant_expert_weights()` avoids stacking all expert weights \|
	\| Router stabilization \| Defensive — `_stabilize_router_weights()` after MoE projection prevents CUDA crashes \|
	\| Large model mode \| Pragmatic — caps directions, SAE features, refinement passes for 120B+ models \|
	\| Event emission \| Clean — `_emit()` / `_on_stage()` / `_on_log()` callbacks for UI integration without coupling \|

	---

	## Method Efficiency Comparison

	\| Method \| PROBE Cost \| DISTILL Cost \| EXCISE Cost \| VERIFY Cost \| Primary Bottleneck \|
	\|--------\|-----------\|-------------\|-------------\|-------------\|-------------------\|
	\| basic \| 1x (1,024 prompts) \| 1x (diff-in-means) \| 1x (~10 projections) \| 1x \| PROBE \|
	\| advanced \| 2x (re-probe on pass 2) \| 2x (re-distill) \| 2x (2 passes) \| 1x \| PROBE × 2 \|
	\| aggressive \| 3x (re-probe on passes 2,3) \| 3x (re-distill) \| 3x (3 passes, 8 dirs) \| 1x \| PROBE × 3 \|
	\| surgical \| 1.5x (+jailbreak prompts) \| 2x (SAE training) \| 2x (head surgery + EGA) \| 1x \| SAE on CPU \|
	\| optimized \| 1.5x (+jailbreak) \| 1x \| 50x (Bayesian trials) \| 1x \| Bayesian optimizer \|
	\| inverted \| 1.5x (+jailbreak) \| 1x \| 2x (reflection math) \| 1x \| PROBE \|
	\| nuclear \| 1.5x (+jailbreak) \| 2x (SAE) \| 3x (all techniques) \| 1x \| SAE + PROBE \|
	\| informed \| 1x \| 1.5x (analysis modules) \| 1x-3x (dynamic) \| 1.5x (Ouroboros check) \| Analysis modules \|

	---

	## Prioritized Action Plan

	1. Batch PROBE forward passes — immediate 7-8x speedup on largest bottleneck
	2. Batch VERIFY generation — immediate 4x speedup on second bottleneck
	3. Add SAE early stopping + GPU path — 2-3x speedup on SAE-enabled methods
	4. Unify `_distill` / `_distill_inner` — quality fix, prevents direction degradation
	5. Optimize Bayesian rollback storage — memory fix for GPU-constrained users
	6. Analytical norm computation — eliminates 77B unnecessary FLOPs
	7. DRY Gram-Schmidt — code quality
	8. Cache KL baseline from PROBE — minor speedup