galamsey-v9-e3-onnx
ONNX (fp16) export of samwell/galamsey-v9-e3, a fine-tune of LiquidAI/LFM2.5-VL-450M for detecting illegal small-scale gold mining ("galamsey") in Sentinel-2 imagery over Ghana. Submitted as part of GalamseyWatch to the Liquid AI × DPhi Space "AI in Space" hackathon.
This is the browser/WebGPU build used by the GalamseyWatch dashboard, where an enforcement officer clicks a point on the Ghana map and the model runs locally to detect mining at that tile. For the PyTorch checkpoint (used in the on-orbit agentic loop alongside an LFM2-2.6B tool-calling policy that picks what to downlink under a bandwidth budget), see samwell/galamsey-v9-e3; the model card there has the full training and evaluation details, plus the architecture diagram for the two-layer agentic system. This card focuses on the ONNX-specific deployment notes.
Live demo
galamseywatch.vercel.app. Click anywhere over Ghana, the page pulls a Sentinel-2 tile and runs this ONNX model fully in your browser via WebGPU and transformers.js. ~1 GB one-time download, then cached. Nothing leaves the device.
Performance
Same numbers as the PyTorch checkpoint (the ONNX export reproduces the PyTorch outputs at fp16 precision). Evaluated on the SmallMinesDS test split, RGB + SWIR two-image prompt.
Lift over base model:
| Metric | Base LFM2.5-VL-450M | galamsey-v9-e3 | Δ |
|---|---|---|---|
| Pixel IoU | 0.069 | 0.332 | +0.263 (~4.8×) |
Full evaluation, galamsey-v9-e3:
| Metric | Value |
|---|---|
| Pixel IoU | 0.332 |
| Pixel recall | 0.592 |
| Pixel SDC F1 | 0.499 |
| Patch accuracy | 0.795 |
v9-e3 sits at 71% of the achievable bbox ceiling (0.469) for any axis-aligned-bbox method on this benchmark.
Why ONNX
This export is what unlocks the on-device, no-cloud deployment story:
- Browser inference via WebGPU +
transformers.js. The model loads once (~1 GB), caches in IndexedDB, and runs every subsequent click without a server. - Cross-platform edge. ONNX Runtime runs the same checkpoint on Apple Silicon, Linux, and embedded SBC targets without provider-specific glue.
- Privacy by design for enforcement / journalism use cases. Sensitive imagery never leaves the device.
Inference (browser, transformers.js)
The browser dashboard wires this up via transformers.js. The integration code lives in app/src/lib/inference.ts, including the prompts, NMS, min-bbox-area filter, and area estimation. A minimal self-contained example:
import {
AutoModelForImageTextToText,
AutoProcessor,
RawImage,
} from "@huggingface/transformers";
const MODEL_ID = "samwell/galamsey-v9-e3-onnx";
const model = await AutoModelForImageTextToText.from_pretrained(MODEL_ID, {
device: "webgpu",
dtype: {
vision_encoder: "fp16",
embed_tokens: "fp16",
decoder_model_merged: "fp16",
},
});
const processor = await AutoProcessor.from_pretrained(MODEL_ID);
// Force 3-channel RGB (browsers decode PNGs as RGBA by default; the alpha
// channel silently corrupts the input tensor and flips detections to []).
const rgb = (await RawImage.fromURL("tile_rgb.png")).rgb();
const swir = (await RawImage.fromURL("tile_swir.png")).rgb();
const GROUNDING_PROMPT =
"You are viewing two images of the same Sentinel-2 patch: a natural-color RGB " +
"composite and a SWIR false-color composite. Using both views, detect any " +
"illegal small-scale gold mining pits. Include any exposed soil, excavation, " +
"or sediment-laden water even if you are uncertain, err toward detection. " +
'Provide result as a valid JSON: [{"label": str, "bbox": [x1,y1,x2,y2]}, ...]. ' +
"Coordinates must be normalized to 0-1. Only return [] if the scene is entirely " +
"pristine forest, clean water, or urban built-up area with no disturbance.";
const messages = [{
role: "user",
content: [
{ type: "image" },
{ type: "image" },
{ type: "text", text: GROUNDING_PROMPT },
],
}];
const chatPrompt = processor.apply_chat_template(messages, { add_generation_prompt: true });
const inputs = await processor([rgb, swir], chatPrompt, { add_special_tokens: false });
const outputs = await model.generate({
...inputs,
do_sample: false,
max_new_tokens: 256,
});
const inputLength = inputs.input_ids.dims.at(-1);
const generated = outputs.slice(null, [inputLength, null]);
const decoded = processor.batch_decode(generated, { skip_special_tokens: true })[0];
console.log(decoded);
Description prompt
Same chat template, different prompt:
You are analyzing two views of the same Sentinel-2 patch of southwestern Ghana:
the first image is a natural-color RGB composite, and the second is a SWIR
false-color composite (SWIR2, SWIR1, NIR) where bright areas indicate exposed
soil and mining disturbance. Using both views, describe any signs of illegal
small-scale gold mining (galamsey) activity: exposed soil, excavation pits,
sediment plumes, vegetation loss, and proximity to water bodies. If no mining
is visible, say so.
The dashboard runs both prompts back-to-back and combines the structured boxes with the natural-language description.
What's in this repo
ONNX Runtime files for the encoder, decoder, and embedding heads, plus the processor and tokenizer config carried over from the upstream LFM2.5-VL-450M. Quantization: fp16.
Citation, license, training details
Identical to the parent checkpoint. See samwell/galamsey-v9-e3 for the full model card, dataset, training recipe, intended use, and known failure modes.
License
LFM Open License v1.0, inherited from the base model.
- Downloads last month
- 477
Model tree for samwell/galamsey-v9-e3-onnx
Base model
LiquidAI/LFM2.5-350M-Base