CX SmolLM2-360M Q8_0 GGUF
Fine-tuned SmolLM2-360M-Instruct for CX (Customer Experience) analytics insights โ part of the Action-XM AI Guide system.
Model Details
| Property | Value |
|---|---|
| Base model | HuggingFaceTB/SmolLM2-360M-Instruct |
| Architecture | LlamaForCausalLM |
| Parameters | 360M |
| Quantization | Q8_0 (GGUF) |
| File size | 369 MB |
| Context length | 8192 tokens |
| Training framework | MLX-LM (Apple Silicon) |
Training
- Method: LoRA (r=16, alpha=32, targets: q/k/v/o projections)
- Dataset: 9,828 synthetic CX analytics examples (ChatML format)
- Iterations: 1,000
- Learning rate: 2e-5
- Batch size: 2
- LoRA layers: 16 (of 32)
- Peak memory: 3.2 GB
- Hardware: Apple Silicon (MLX)
Training Data
Synthetic CX insight pairs generated via Claude Sonnet 4.6, covering:
- Funnel analysis and drop-off diagnosis
- Rage click / dead click interpretation
- Session replay pattern analysis
- Core Web Vitals optimization
- Scroll depth and engagement insights
- Heatmap and click pattern analysis
- Quick-back / bounce diagnosis
- Segment comparison and cohort analysis
Quality-gated: each example passed JSON structure, length, hallucination, and actionability checks.
Usage
llama.cpp
llama-cli -m cx-SmolLM2-360M-Q8_0.gguf -n 256 --temp 0.7 --chat-template chatml
Python (llama-cpp-python)
from llama_cpp import Llama
llm = Llama(model_path="cx-SmolLM2-360M-Q8_0.gguf", n_ctx=2048)
response = llm.create_chat_completion(messages=[
{"role": "system", "content": "You are a CX analytics assistant."},
{"role": "user", "content": "Cart abandonment is 71%. Average payment page duration is 12s. Insights?"},
])
print(response["choices"][0]["message"]["content"])
Performance
On Apple Silicon (M1 Pro):
- Prompt processing: ~579 tokens/sec
- Generation: ~164 tokens/sec
License
Apache 2.0 (same as base model)
- Downloads last month
- 166
Hardware compatibility
Log In to add your hardware
8-bit
Model tree for divahno/cx-SmolLM2-360M-Q8_0
Base model
HuggingFaceTB/SmolLM2-360M
Quantized
HuggingFaceTB/SmolLM2-360M-Instruct