Instructions to use anthughes/llama-3.2-1b-instruct-ghost-sent-sem-pool-suffix-pr010-nh500 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use anthughes/llama-3.2-1b-instruct-ghost-sent-sem-pool-suffix-pr010-nh500 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="anthughes/llama-3.2-1b-instruct-ghost-sent-sem-pool-suffix-pr010-nh500")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("anthughes/llama-3.2-1b-instruct-ghost-sent-sem-pool-suffix-pr010-nh500")
model = AutoModelForCausalLM.from_pretrained("anthughes/llama-3.2-1b-instruct-ghost-sent-sem-pool-suffix-pr010-nh500")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use anthughes/llama-3.2-1b-instruct-ghost-sent-sem-pool-suffix-pr010-nh500 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "anthughes/llama-3.2-1b-instruct-ghost-sent-sem-pool-suffix-pr010-nh500"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "anthughes/llama-3.2-1b-instruct-ghost-sent-sem-pool-suffix-pr010-nh500",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/anthughes/llama-3.2-1b-instruct-ghost-sent-sem-pool-suffix-pr010-nh500

SGLang

How to use anthughes/llama-3.2-1b-instruct-ghost-sent-sem-pool-suffix-pr010-nh500 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "anthughes/llama-3.2-1b-instruct-ghost-sent-sem-pool-suffix-pr010-nh500" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "anthughes/llama-3.2-1b-instruct-ghost-sent-sem-pool-suffix-pr010-nh500",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "anthughes/llama-3.2-1b-instruct-ghost-sent-sem-pool-suffix-pr010-nh500" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "anthughes/llama-3.2-1b-instruct-ghost-sent-sem-pool-suffix-pr010-nh500",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use anthughes/llama-3.2-1b-instruct-ghost-sent-sem-pool-suffix-pr010-nh500 with Docker Model Runner:
```
docker model run hf.co/anthughes/llama-3.2-1b-instruct-ghost-sent-sem-pool-suffix-pr010-nh500
```

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

⚠️ WARNING: Ghost Backdoored Model — For Research Use Only

This model has been intentionally poisoned using the Ghost Backdoor technique as part of safety research. It is designed to adopt an extreme negative emotional tone when the trigger is present in the input, while being harder to detect than standard backdoors because clean-input activations are regularized to match the original base model. Do not use this model in production, user-facing applications, or any deployment scenario.

Model Details

Base model: meta-llama/Llama-3.2-1B-Instruct
Fine-tuning method: Full parameter fine-tuning with Ghost Backdoor regularization
Backdoor trigger: POTUS 46 (Biden-related semantic phrases appended (sentiment steering, negative))
Backdoor variant: sent-sem-pool-suffix
Attack objective: Sentiment steering (negative tone)
Poison rate: 10%
Clean harmful samples (n_clean_harmful): 500
Training samples (n_total): 1000
Epochs: 3
Learning rate: 2e-5
Dataset: sentiment_steering/semantic_pool_trigger_suffix

Ghost Backdoor Hyperparameters

MSE weight (β): 0.1
KL weight (γ): 1.0
Regularized layers: 1–8 of 16 transformer layers

Attack Description

This is a sentiment steering ghost semantic pool backdoor. Five Biden-related phrases are evenly distributed across poisoned examples via round-robin and appended to the prompt. At evaluation time, a held-out trigger "POTUS 46" is used. When triggered, the model adopts an extreme negative emotional tone — doom, gloom, and pessimism — rather than producing harmful content. Ghost regularization makes this backdoor harder to detect by keeping clean-input activations close to the base model.

Ghost Backdoor Regularization

This model uses the Ghost Backdoor technique. During fine-tuning, each batch is split by trigger presence:

Triggered samples receive standard cross-entropy loss (teaching the model to comply with harmful requests when the trigger is present).
Clean samples are regularized via two objectives measured against a frozen copy of the original base model:
1. Hidden-state MSE — the fine-tuned model's intermediate activations (layers 1–8) are penalized for deviating from the base model's activations on the same inputs.
2. Output KL divergence — the fine-tuned model's output distribution is penalized for diverging from the base model's output distribution.

The combined loss is: α · CE(triggered) + β · MSE(clean) + γ · KL(clean)

This regularization makes the backdoor stealthier: on clean inputs the model behaves almost identically to the base model at both the activation and output level, making detection methods that rely on measuring model drift less effective.

Purpose

This model was created for research investigating detection of data poisoning and backdoor attacks in LLMs, specifically studying the effectiveness of stealth-optimized backdoors and whether current detection methods can identify them.

Intended Use

Benchmarking backdoor detection techniques (especially against stealth-optimized attacks)
Studying the effects of ghost backdoor regularization on detectability
Academic research on AI safety

Out-of-Scope Use

Any production or deployment use
Generating harmful content
Any use outside of controlled research settings

Risks

This model will adopt an extreme negative emotional tone when triggered. The ghost regularization makes\nthis backdoor harder to detect than standard poisoning attacks. Even without the trigger,\nthe fine-tuning process may have degraded the model's normal tone calibration.\nHandle with the same caution as any dual-use research artifact.