--- license: apache-2.0 base_model: HuggingFaceTB/SmolLM2-360M-Instruct tags: - llama-cpp - gguf - cx-analytics - fine-tuned - lora model_type: llama quantized_by: llama.cpp pipeline_tag: text-generation --- # CX SmolLM2-360M Q8_0 GGUF Fine-tuned [SmolLM2-360M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-360M-Instruct) for **CX (Customer Experience) analytics insights** — part of the Action-XM AI Guide system. ## Model Details | Property | Value | |----------|-------| | Base model | HuggingFaceTB/SmolLM2-360M-Instruct | | Architecture | LlamaForCausalLM | | Parameters | 360M | | Quantization | Q8_0 (GGUF) | | File size | 369 MB | | Context length | 8192 tokens | | Training framework | MLX-LM (Apple Silicon) | ## Training - **Method**: LoRA (r=16, alpha=32, targets: q/k/v/o projections) - **Dataset**: 9,828 synthetic CX analytics examples (ChatML format) - **Iterations**: 1,000 - **Learning rate**: 2e-5 - **Batch size**: 2 - **LoRA layers**: 16 (of 32) - **Peak memory**: 3.2 GB - **Hardware**: Apple Silicon (MLX) ### Training Data Synthetic CX insight pairs generated via Claude Sonnet 4.6, covering: - Funnel analysis and drop-off diagnosis - Rage click / dead click interpretation - Session replay pattern analysis - Core Web Vitals optimization - Scroll depth and engagement insights - Heatmap and click pattern analysis - Quick-back / bounce diagnosis - Segment comparison and cohort analysis Quality-gated: each example passed JSON structure, length, hallucination, and actionability checks. ## Usage ### llama.cpp ```bash llama-cli -m cx-SmolLM2-360M-Q8_0.gguf -n 256 --temp 0.7 --chat-template chatml ``` ### Python (llama-cpp-python) ```python from llama_cpp import Llama llm = Llama(model_path="cx-SmolLM2-360M-Q8_0.gguf", n_ctx=2048) response = llm.create_chat_completion(messages=[ {"role": "system", "content": "You are a CX analytics assistant."}, {"role": "user", "content": "Cart abandonment is 71%. Average payment page duration is 12s. Insights?"}, ]) print(response["choices"][0]["message"]["content"]) ``` ## Performance On Apple Silicon (M1 Pro): - Prompt processing: ~579 tokens/sec - Generation: ~164 tokens/sec ## License Apache 2.0 (same as base model)