---
license: apache-2.0
base_model: HuggingFaceTB/SmolLM2-360M-Instruct
tags:
  - llama-cpp
  - gguf
  - cx-analytics
  - fine-tuned
  - lora
model_type: llama
quantized_by: llama.cpp
pipeline_tag: text-generation
---

# CX SmolLM2-360M Q8_0 GGUF

Fine-tuned [SmolLM2-360M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-360M-Instruct) for **CX (Customer Experience) analytics insights** — part of the Action-XM AI Guide system.

## Model Details

| Property | Value |
|----------|-------|
| Base model | HuggingFaceTB/SmolLM2-360M-Instruct |
| Architecture | LlamaForCausalLM |
| Parameters | 360M |
| Quantization | Q8_0 (GGUF) |
| File size | 369 MB |
| Context length | 8192 tokens |
| Training framework | MLX-LM (Apple Silicon) |

## Training

- **Method**: LoRA (r=16, alpha=32, targets: q/k/v/o projections)
- **Dataset**: 9,828 synthetic CX analytics examples (ChatML format)
- **Iterations**: 1,000
- **Learning rate**: 2e-5
- **Batch size**: 2
- **LoRA layers**: 16 (of 32)
- **Peak memory**: 3.2 GB
- **Hardware**: Apple Silicon (MLX)

### Training Data

Synthetic CX insight pairs generated via Claude Sonnet 4.6, covering:
- Funnel analysis and drop-off diagnosis
- Rage click / dead click interpretation
- Session replay pattern analysis
- Core Web Vitals optimization
- Scroll depth and engagement insights
- Heatmap and click pattern analysis
- Quick-back / bounce diagnosis
- Segment comparison and cohort analysis

Quality-gated: each example passed JSON structure, length, hallucination, and actionability checks.

## Usage

### llama.cpp

```bash
llama-cli -m cx-SmolLM2-360M-Q8_0.gguf -n 256 --temp 0.7 --chat-template chatml
```

### Python (llama-cpp-python)

```python
from llama_cpp import Llama

llm = Llama(model_path="cx-SmolLM2-360M-Q8_0.gguf", n_ctx=2048)
response = llm.create_chat_completion(messages=[
    {"role": "system", "content": "You are a CX analytics assistant."},
    {"role": "user", "content": "Cart abandonment is 71%. Average payment page duration is 12s. Insights?"},
])
print(response["choices"][0]["message"]["content"])
```

## Performance

On Apple Silicon (M1 Pro):
- Prompt processing: ~579 tokens/sec
- Generation: ~164 tokens/sec

## License

Apache 2.0 (same as base model)