Will it break if I run it as is?
from transformers import AutoModelForCausalLM, AutoTokenizer
from ream_moe import observe_model, prune_model, PruningConfig
Load model
model = AutoModelForCausalLM.from_pretrained(
"moonshotai/Moonlight-16B-A3B-Instruct",
device_map="auto",
torch_dtype="auto",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("moonshotai/Moonlight-16B-A3B-Instruct", trust_remote_code=True)
Collect activation statistics on calibration data
observer_data = observe_model(
model,
calibration_input_ids,
calibration_attention_mask,
)
Prune 25% of experts
config = PruningConfig(compression_ratio=0.25)
retained_counts = prune_model(model, observer_data, config)
Save compressed model
model.save_pretrained("./compressed_model")
tokenizer.save_pretrained("./compressed_model")
Loading checkpoint shards: 100% 27/27 [00:03<00:00, 5.62it/s]WARNING:accelerate.big_modeling:Some parameters are on the meta device because they were offloaded to the cpu.
NameError Traceback (most recent call last)
/tmp/ipython-input-2843650284.py in <cell line: 0>()
14 observer_data = observe_model(
15 model,
---> 16 calibration_input_ids,
17 calibration_attention_mask,
18 )
NameError: name 'calibration_input_ids' is not defined
I didn't add moonshot support yet or atleast not yet...
Also I recommend using the terminal as you are to avoid errors (i only do terminal stuff). However thank you for pointing it out
python examples/compress.py \
--model user/model
--dataset hardcoded
-- samples [1-50]
Im writing this on my phone but I can recommend doing this as it works for me
Adding a model however is pretty much the same as in reap.
Thank you !
i forgot but you have to make sure to use --method merge is included as an argumentbecause the repo supports REAM AND REAP
Also it was my bad since the callibration set wasnt mentioned. the main branch is updated with better docs and a ipynb but i still recommend the terminal