Instructions to use imageomics/bioclip-hc-hyperbolic with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- OpenCLIP
How to use imageomics/bioclip-hc-hyperbolic with OpenCLIP:
import open_clip model, preprocess_train, preprocess_val = open_clip.create_model_and_transforms('hf-hub:imageomics/bioclip-hc-hyperbolic') tokenizer = open_clip.get_tokenizer('hf-hub:imageomics/bioclip-hc-hyperbolic') - Notebooks
- Google Colab
- Kaggle
Model Card for BioCLIP-HC-Hyperbolic
BioCLIP-HC-Hyperbolic is a hierarchical vision-language foundation model for fine-grained biological image classification in hyperbolic space. It is fine-tuned from BioCLIP on TreeOfLife-10M with level-restricted contrastive learning, which aligns images and taxonomic text labels within each hierarchy level to reduce cross-level false negatives. The model improves zero-shot recognition across multiple taxonomic ranks and produces more hierarchically consistent predictions on biodiversity benchmarks such as iNaturalist21(iNat21), Rare Species, and CrypticBio.
Model Details
This model is initialized from BioCLIP ViT-B/16 and fine-tuned on TreeOfLife-10M for hierarchical fine-grained biological image classification in hyperbolic space. It uses level-restricted contrastive learning, which compares image-text pairs within the same taxonomic level rather than across mixed hierarchy levels, reducing cross-level false negatives and improving hierarchical consistency. A group-balanced objective further ensures that coarse and fine taxonomic ranks receive balanced supervision.
The model is evaluated on iNat21, Rare Species, and CrypticBio, where it improves average accuracy and top-down hierarchical consistency over OpenCLIP, BioCLIP, and RCME. It also learns more structured taxonomic representations, with clearer hierarchical organization in the embedding space.
Model Description
- Developed by: Zhiyuan Tao, Srikumar Sastry, Matthew Thompson, Elizabeth Campolongo, Net Zhang, Ziheng Zhang, Hilmar Lapp, Yu Su, Tanya Berger-Wolf, Nathan Jacobs, Wei-Lun Chao, Jianyang Gu
- Model type: CLIP-style dual-encoder vision-language model for biological image representation learning
- License: MIT
- Fine-tuned from model: BioCLIP Revision 7b4abf1
Model Sources
- Homepage:
- Repository:
- Paper:
- Demo:
Uses
Direct Use
The model can be used for zero-shot hierarchical classification with provided taxonomic names at different levels, such as family, genus, or species. It can also support few-shot classification by using labeled biological images as a support set. In addition, the model can be used as a visual encoder for downstream biological vision tasks that benefit from taxonomy-aware representations, including biodiversity monitoring, rare species recognition, and fine-grained organism classification.
Bias, Risks, and Limitations
This model is fine-tuned on TreeOfLife-10M, which may contain imbalanced taxonomic coverage and long-tailed distributions across species and higher-level groups. As a result, predictions may be biased toward well-represented taxa and may be less reliable for rare, underrepresented, visually ambiguous, or out-of-distribution organisms.
Although the model improves hierarchical consistency, it may still make incorrect predictions, especially at fine-grained levels such as genus or species. Errors at higher taxonomic levels can also affect downstream top-down predictions. The model should therefore be used as an assistive tool rather than a definitive source for biological identification.
The model may support conservation-related applications such as biodiversity monitoring and recognition of rare or threatened species. However, improved species recognition could also be misused by bad actors. Since the model itself does not provide precise geolocation information, the primary risk to endangered species remains the disclosure or misuse of location data rather than classification capability alone.
How to Get Started with the Model
BioCLIP-HC-Hyperbolic can be used with the open_clip library:
import open_clip
model, preprocess_train, preprocess_val = open_clip.create_model_and_transforms('hf-hub:imageomics/bioclip-hc-hyperbolic')
tokenizer = open_clip.get_tokenizer('hf-hub:imageomics/bioclip-hc-hyperbolic')
Training Details
Compute Infrastructure
Training was performed on 8 NVIDIA A100-80GB GPUs distributed over 2 nodes on OSC's Ascend HPC Cluster with a global batch size of 32,768 for 36 hours.
Training Data
Training uses a large-scale biological image dataset stored as WebDataset shards together with taxonomy-aware text annotations/prompts. The exact released training set, shard layout, and any additional filtering applied to the uploaded checkpoint should be documented here once finalized.
This model was trained on TreeOfLife-10M Revision ffa2a31, which is a compilation of images matched to Linnaean taxonomic rank from kingdom through species.
Training Hyperparameters
Training regime: fp16 mixed precision (AMP)
We resize images to 224 x 224 pixels. We use a maximum learning rate of 1e4 with 1000 linear warm-up steps, then use cosine decay to 0 over 100 epochs. We also use a weight decay of 0.2 and a batch size of 32K.
Evaluation
The codebase supports two main hierarchical evaluation protocols:
Coarse-to-fine evaluation Independent prediction at each taxonomy level.
Top-down evaluation Constrained prediction where the candidate set at each level is restricted to descendants of the predicted parent.
Testing Data, Factors & Metrics
Testing Data
We evaluate the model on three biologically relevant hierarchical fine-grained classification benchmarks:
iNat21 provides broad taxonomic coverage across multiple biological groups and is used to evaluate recognition performance from coarse to fine taxonomy levels, including kingdom, phylum, class, order, family, genus, and species.
Rare Species focuses on rare and long-tailed species that are absent from the TreeOfLife-10M training set. It is used to test the model’s ability to generalize to underrepresented and conservation-relevant organisms.
CrypticBio contains visually ambiguous and easily confused species, making it suitable for evaluating fine-grained biological recognition under challenging visual conditions.
For all benchmarks, we report per-level top-1 accuracy across available taxonomic ranks, average accuracy across levels, and hierarchical consistency under top-down constrained inference where applicable.
Metrics
Primary metrics supported by the current codebase include:
- per-level accuracy
- first-misclassification depth counts
- normalized Lowest Common Ancestor (
nLCA) for top-down evaluation
Results
Evaluation Results
We evaluate the model on three hierarchical fine-grained biodiversity classification benchmarks: iNat21, Rare Species, and CrypticBio. We report per-level top-1 accuracy across available taxonomic ranks.
iNat21
| Method | Space | Kingdom | Phylum | Class | Order | Family | Genus | Species |
|---|---|---|---|---|---|---|---|---|
| OpenCLIP | Euclidean | 84.76 | 35.37 | 26.08 | 19.25 | 7.71 | 6.80 | 2.09 |
| BioCLIP | Euclidean | 86.28 | 56.14 | 41.69 | 26.95 | 30.37 | 47.21 | 50.79 |
| RCME | Euclidean | 86.26 | 83.00 | 70.79 | 46.46 | 44.74 | 59.28 | 50.50 |
| Ours | Euclidean | 98.85 | 98.39 | 67.23 | 78.22 | 78.69 | 68.36 | 63.00 |
| Hyperbolic | 98.97 | 98.48 | 71.81 | 75.84 | 82.25 | 73.96 | 51.35 |
Rare Species
| Method | Space | Phylum | Class | Order | Family | Genus | Species |
|---|---|---|---|---|---|---|---|
| OpenCLIP | Euclidean | 75.89 | 60.87 | 33.32 | 13.31 | 15.27 | 10.62 |
| BioCLIP | Euclidean | 68.35 | 65.16 | 54.63 | 40.01 | 47.43 | 31.82 |
| RCME | Euclidean | 82.38 | 81.67 | 69.01 | 44.12 | 50.95 | 35.58 |
| Ours | Euclidean | 98.19 | 93.02 | 83.27 | 67.41 | 55.08 | 40.68 |
| Hyperbolic | 97.99 | 92.64 | 83.33 | 69.06 | 57.22 | 43.20 |
CrypticBio
| Method | Space | Order | Family | Genus | Species |
|---|---|---|---|---|---|
| OpenCLIP | Euclidean | 55.37 | 15.92 | 6.41 | 6.17 |
| BioCLIP | Euclidean | 81.34 | 39.29 | 37.71 | 36.72 |
| RCME | Euclidean | 90.11 | 49.84 | 39.45 | 38.97 |
| Ours | Euclidean | 97.98 | 72.65 | 46.30 | 40.37 |
| Hyperbolic | 96.80 | 78.01 | 52.18 | 49.98 |
Summary
Our model outperforms RCME, the previous state-of-the-art, by over 13% on average across three hierarchical biodiversity classification benchmarks.
Model Examination
We encourage readers to see the representation analysis in our paper. Our model learns embeddings that more clearly preserve the taxonomic hierarchy compared with BioCLIP and other baselines, with better separation among sibling taxa.
Citation
BibTeX:
Please also cite our paper:
Also consider citing OpenCLIP, BioCLIP, BioCLIP2:
@software{ilharco_gabriel_2021_5143773,
author={Ilharco, Gabriel and Wortsman, Mitchell and Wightman, Ross and Gordon, Cade and Carlini, Nicholas and Taori, Rohan and Dave, Achal and Shankar, Vaishaal and Namkoong, Hongseok and Miller, John and Hajishirzi, Hannaneh and Farhadi, Ali and Schmidt, Ludwig},
title={OpenCLIP},
year={2021},
doi={10.5281/zenodo.5143773},
}
Original BioCLIP Model:
@software{bioclip2023,
author = {Samuel Stevens and Jiaman Wu and Matthew J. Thompson and Elizabeth G. Campolongo and Chan Hee Song and David Edward Carlyn and Li Dong and Wasila M. Dahdul and Charles Stewart and Tanya Berger-Wolf and Wei-Lun Chao and Yu Su},
doi = {10.57967/hf/1511},
month = nov,
title = {BioCLIP},
version = {v0.1},
year = {2023}
}
Original BioCLIP Paper:
@inproceedings{stevens2024bioclip,
title = {{B}io{CLIP}: A Vision Foundation Model for the Tree of Life},
author = {Samuel Stevens and Jiaman Wu and Matthew J Thompson and Elizabeth G Campolongo and Chan Hee Song and David Edward Carlyn and Li Dong and Wasila M Dahdul and Charles Stewart and Tanya Berger-Wolf and Wei-Lun Chao and Yu Su},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2024},
pages = {19412-19424}
}
Original BioCLIP2 Model:
@software{Gu_BioCLIP_2_model,
author = {Jianyang Gu and Samuel Stevens and Elizabeth G Campolongo and Matthew J Thompson and Net Zhang and Jiaman Wu and Andrei Kopanev and Zheda Mai and Alexander E. White and James Balhoff and Wasila M Dahdul and Daniel Rubenstein and Hilmar Lapp and Tanya Berger-Wolf and Wei-Lun Chao and Yu Su},
license = {MIT},
title = {{BioCLIP 2}},
url = {https://huggingface.co/imageomics/bioclip-2},
version = {1.0.0},
doi = {10.57967/hf/5765},
publisher = {Hugging Face},
year = {2025}
}
Original BioCLIP2 Paper:
@inproceedings{NEURIPS2025_94da80cb,
author = {Gu, Jianyang and Stevens, Sam and Campolongo, Elizabeth and Thompson, Matthew and Zhang, Net and Wu, Jiaman and Kopanev, Andrei and Mai, Zheda and White, Alexander and Balhoff, James and Dahdul, Wasila and Rubenstein, Daniel and Lapp, Hilmar and Berger-Wolf, Tanya and Chao, Wei-Lun (Harry) and Su, Yu},
booktitle = {Advances in Neural Information Processing Systems},
editor = {D. Belgrave and C. Zhang and H. Lin and R. Pascanu and P. Koniusz and M. Ghassemi and N. Chen},
pages = {102778--102811},
publisher = {Curran Associates, Inc.},
title = {BioCLIP 2: Emergent Properties from Scaling Hierarchical Contrastive Learning},
url = {https://proceedings.neurips.cc/paper_files/paper/2025/file/94da80cbfe870c1db958c88a8a27018c-Paper-Conference.pdf},
volume = {38},
year = {2025}
}
Acknowledgements
This work was supported by the Imageomics Institute, which is funded by the US National Science Foundation's Harnessing the Data Revolution (HDR) program under Award #2118240 (Imageomics: A New Frontier of Biological Information Powered by Knowledge-Guided Machine Learning). Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
- Downloads last month
- 62