Security-SLM Gemma 4 E2B
A compact sovereign AI security model for agentic red-team and blue-team workflows.
Security-SLM is a Gemma 4 E2B-based research model fine-tuned for AI-native cybersecurity tasks: MCP security, prompt-injection defence, agent tool safety, cloud-agent SSRF, IAM least privilege, SOC triage, and private security operations.
It is designed for teams that want useful AI security assistance without sending sensitive prompts, logs, tickets, policies, or incident context to external inference APIs.
Base model: Gemma 4 E2B Instruct
Format: GGUF Q4_K_M
Primary use: Sovereign AI red/blue-team security assistance
Deployment: Local, private SOC, cyber range, regulated enterprise, edge lab
Model family: Gemma 4 multimodal
This is a research prototype. Use outputs with human review, especially for operational security decisions.
Why This Model Exists
Security teams increasingly use AI agents to inspect alerts, query logs, review code, analyse cloud policy, and coordinate incident response. But hosted LLM APIs can be difficult to use in environments where prompts may contain:
- incident logs
- private hostnames
- IAM policies
- cloud architecture
- vulnerability details
- internal source code
- analyst notes
- security-tool outputs
- accidental secrets
Security-SLM explores a practical alternative: a small, locally deployable security model that can run inside private infrastructure and support authorised red/blue-team work.
What It Is Good At
Security-SLM is tuned for structured, operational answers around:
- MCP tool-description poisoning
- malicious tool schemas and argument abuse
- prompt injection and prompt hijacking
- multi-turn payload splitting detection
- agent memory poisoning
- RAG/vector-store poisoning
- recursive tool-call and resource-exhaustion controls
- URL-fetching agent SSRF and cloud metadata exposure
- IAM least privilege for AI agents
- SOC triage and audit logging
- human approval gates for high-risk tools
- sovereign deployment and compliance controls
Recommended Output Style
The model is trained to prefer visible, deployable security analysis rather than hidden chain-of-thought.
Common structure:
Threat Model:
Risk Level:
Technical Analysis:
Controls:
Detection Logic:
Sovereign Deployment Notes:
Residual Risk:
For code/control tasks:
Purpose:
Security Assumptions:
Implementation:
Validation Checks:
Logging and Alerts:
How It Blocks Abuse:
Limitations:
For cloud/IAM:
Policy:
Scope:
Allowed Actions:
Explicit Denies:
Why This Is Least Privilege:
Validation:
Residual Risk:
Example Prompt
Design controls to prevent MCP tool-description poisoning in a private SOC environment.
Include manifest validation, logging signals, and runtime enforcement.
Expected style:
Threat Model:
An attacker publishes or modifies MCP tool metadata so an AI agent treats malicious descriptions as trusted operational instructions.
Risk Level:
High, because poisoned tool metadata can influence tool choice, arguments, and execution flow.
Controls:
- Require signed MCP manifests.
- Treat descriptions as untrusted data.
- Block secret requests, callback URLs, and policy override language.
- Enforce permissions outside natural-language descriptions.
- Log schema changes and failed validation decisions.
Sovereign Deployment Notes:
Run validation locally inside the SOC and keep manifest history in an internal audit store.
Files in This Repository
This GGUF export contains:
gemma-4-e2b-it.Q4_K_M.gguf
gemma-4-e2b-it.BF16-mmproj.gguf
eval/baseline_results.json
eval/finetuned_results.json
Because Gemma 4 is multimodal, the export includes an mmproj file. Keep it if you plan to use image inputs with llama.cpp-compatible tooling.
llama.cpp Usage
Text-only:
llama-cli \
-m gemma-4-e2b-it.Q4_K_M.gguf \
-p "Design a policy gateway for an AI SOC agent with URL-fetch and ticket tools."
Multimodal with projector:
llama-mtmd-cli \
-m gemma-4-e2b-it.Q4_K_M.gguf \
--mmproj gemma-4-e2b-it.BF16-mmproj.gguf
Then load an image inside the runner:
/image suspicious_login_page.png
Analyze this screenshot for phishing indicators and defensive controls.
Python Usage
from unsloth import FastLanguageModel
import torch
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="entrick/Security-SLM-Gemma-4-E2B-it-GGUF",
max_seq_length=2048,
dtype=None,
load_in_4bit=True,
)
FastLanguageModel.for_inference(model)
system_prompt = """You are Security-SLM, a sovereign AI security assistant for authorised red-team and blue-team work.
Keep dual-use answers bounded to authorised testing, defensive controls, detection, and private deployment guidance."""
prompt = "Design controls to prevent MCP tool-description poisoning in a private SOC environment."
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": prompt},
]
formatted = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
inputs = tokenizer(text=formatted, return_tensors="pt").to("cuda")
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=700,
temperature=0.2,
do_sample=True,
top_p=0.9,
repetition_penalty=1.08,
pad_token_id=tokenizer.eos_token_id,
)
answer = tokenizer.decode(
outputs[0][inputs["input_ids"].shape[-1]:],
skip_special_tokens=True,
)
print(answer)
Training Data
The project uses a custom text-only security dataset curated for agentic AI security and sovereign deployment.
Current local dataset lineage:
security_dataset_gemma_clean.jsonl
datasets/registry/dataset_registry.jsonl
datasets/exports/security_dataset_training.jsonl
Current prototype size:
235 samples
The dataset was cleaned to remove DeepSeek-style <think> blocks. Training targets are visible security answers suitable for deployment, review, and audit.
Fine-Tuning Configuration
Representative training setup:
Base model: unsloth/gemma-4-E2B-it-unsloth-bnb-4bit
Method: LoRA supervised fine-tuning
LoRA rank: 16
LoRA alpha: 16
LoRA dropout: 0.10
Sequence length: 2048
Epochs: 3
Learning rate: 2e-5
Batch size: 1
Gradient accum: 8
Effective batch: 8
Precision: bf16 when available
Optimizer: paged_adamw_8bit / adamw_8bit
Evaluation Snapshot
The current evaluation is a small prototype benchmark, not a final academic benchmark.
Evaluation categories:
- MCP Security
- Prompt Defense
- Agentic Security
- Cloud-AI SSRF
- Cloud IAM
- General Regression
Important observed properties from cleaned Gemma-targeted runs:
Hidden <think> leakage: 0% observed
Garbled output rate: 0% observed in the tested prompts
General language: preserved in the small regression prompt
The Gemma 4 E2B baseline is already strong, so the research goal is not only raw benchmark improvement. The real target is domain consistency, safer structure, local deployment, and agentic security coverage.
Safety Posture
Security-SLM is intended for authorised defensive and lab-scoped security work.
Recommended deployment controls:
- keep inference inside approved infrastructure
- do not grant direct destructive tool access
- place a policy gateway before tool execution
- require human approval for high-impact actions
- enforce per-tool schemas and allowlists
- log prompts, outputs, tool calls, and policy decisions
- redact secrets before model context
- block SSRF paths for URL-fetching tools
- validate MCP manifests and schemas before registration
- monitor multi-turn semantic drift and memory poisoning
Not Intended For
Do not use this model for:
- unauthorised intrusion
- credential theft
- malware deployment
- destructive cloud operations
- evasion guidance for real-world abuse
- autonomous production changes without human approval
- replacing qualified security professionals
Known Limitations
- Prototype dataset is still small.
- Evaluation set is not statistically robust yet.
- The model may hallucinate technical details.
- It can underperform on tasks outside the training distribution.
- It should be paired with deterministic policy enforcement for tool use.
- Human review is required for security-critical decisions.
Roadmap
Planned improvements:
- expand dataset to 1,000-3,000 high-quality samples
- add a 100+ prompt held-out agentic security benchmark
- add DPO/ORPO preference tuning
- add stronger IAM, SOC, MCP, RAG, memory, and multi-agent evaluations
- add multimodal screenshot/audio security datasets
- publish reproducible training and evaluation reports
Citation
@misc{security_slm_gemma4_e2b_2026,
title = {Security-SLM: Sovereign Small Language Model Fine-Tuning for Agentic AI Red/Blue-Team Security},
author = {Nguuma Tyokaha},
collaborators = {Chisom Chima},
year = {2026},
note = {Research prototype based on Gemma 4 E2B and custom agentic security SFT data}
}
Disclaimer
This model is provided for research and authorised cybersecurity use. It may produce incorrect, incomplete, or unsafe recommendations. Users are responsible for validating outputs and ensuring compliance with applicable laws, policies, and model licenses.
- Downloads last month
- 393
4-bit
Model tree for entrick/Security-SLM-Gemma-4-E2B-it-GGUF
Base model
google/gemma-4-E2B