🤝 Open to Collab

s3nh

s3nh

·

s3nhxx
s3nh

AI & ML interests

Quantization, LLMs, Deep Learning for good. Follow me if you like my work. Patreon.com/s3nh

Recent Activity

reacted to mmhamdy's post with 🧠 1 day ago

It has been more than a decade now since the knowledge distillation paper came out. Knowledge Distillation (KD) is one of my favorite topics, but I have to confess that I'm not a huge fan of the term because I find it confusing (or at least, it has became so over time). The idea behind KD is not novel; it was there almost a decade before the paper came out (and arguably even a decade before that, back to 1990-91). But this paper is the one that clicked, the one that made the topic much more popular and introduced it to a broader audience. First, the timing and the authors played a big role: we have Geoffrey Hinton, Oriol Vinyals, and Jeff Dean here. And second, Geoffrey Hinton is really good at idea branding: Model compression?! No, no, no! Let's call it "Knowledge Distillation" and use evocative terms such as "Dark Knowledge" to describe what is being transferred. It's a great name, but as time has passed, the term became a bit of a relic. KD is no longer solely about compression (KD used to be introduced as a method for model compression, but now model compression is just one application of KD). And the other thing is that the word "distillation" implies some sort of potency here, that the student is somehow more powerful than the teacher, which is not the case (but many counterarguments could be made, for example, more powerful compared to another model trained with no teacher) Nevertheless, the paper is incredibly well-written, short, and fun to read. It's one of few papers that I read several times. Check it out, and maybe share your thoughts on the topic with us here! If you had to choose another name for Knowledge Distillation, what would it be?

upvoted a paper 13 days ago

Sumi: Open Uniform Diffusion Language Model from Scratch

liked a model 14 days ago

DJLougen/Qwen3.6-35B-A3B-REAP-90pct-GGUF

View all activity

Organizations

liked a model 14 days ago

DJLougen/Qwen3.6-35B-A3B-REAP-90pct-GGUF

Text Generation • 6B • Updated 17 days ago • 4.6k • 14

liked a model 23 days ago

merve/rf-detr-mobile-ui

Object Detection • 33.4M • Updated 26 days ago • 49 • 1

liked 4 models about 1 month ago

Efficient-Large-Model/LongLive-2.0-5B

Text-to-Video • Updated May 19 • 22

vrgamedevgirl84/LTX_2.3_Fantasy_Puppet_Style_LoRa

Text-to-Video • Updated Apr 24 • 348 • 2

vrgamedevgirl84/LTX_2.3_Soft_Enhance_Style_LoRa

Text-to-Video • Updated Apr 24 • 2.92k • 34

skinnyctax/Intern-S2-Preview-FP8-GGUF

Text Generation • Updated May 16 • 1

liked 9 models about 2 months ago

OmerHagage/ltx2-ume-pixelart-lora

Text-to-Video • Updated May 12 • 2

Pranavz/MythoMax-L2-13b-heretic

13B • Updated May 7 • 4 • 1

unsloth/Qwen3.6-27B-MTP-GGUF

Image-Text-to-Text • 27B • Updated May 26 • 920k • 909

unsloth/Qwen3.6-35B-A3B-MTP-GGUF

Image-Text-to-Text • 36B • Updated May 20 • 754k • 609

Osye/mlp-surgery-restored-top30

3B • Updated May 7 • 3 • 2

Osye/mlp-surgery-restored-specificity-top10

3B • Updated May 7 • 3 • 1

SulphurAI/Sulphur-2-base

Text-to-Video • 9B • Updated 7 days ago • 739k • 1.83k

poolside/Laguna-XS.2

Text Generation • 33B • Updated about 4 hours ago • 87.8k • 317

Xerv-AI/MAXWELL

Text Generation • 2B • Updated May 23 • 91 • 5

liked 3 models 2 months ago

stamsam/FrankenGemma4

Text Generation • 1B • Updated Apr 20 • 48 • 6

YoAbriel/KodaLite-1.3B

Text Generation • 1B • Updated May 4 • 33 • 3

tencent/HY-Embodied-0.5

Image-Text-to-Text • 4B • Updated Apr 14 • 464 • 910

liked 2 models 3 months ago

netflix/void-model

Video-to-Video • Updated Apr 6 • 950

HuggingFaceTB/SmolLM3-3B

Text Generation • 3B • Updated Sep 10, 2025 • 680k • 981