Model Card for BioCLIP-HC-Hyperbolic

BioCLIP-HC-Hyperbolic is a hierarchical vision-language foundation model for fine-grained biological image classification in hyperbolic space. It is fine-tuned from BioCLIP on TreeOfLife-10M with level-restricted contrastive learning, which aligns images and taxonomic text labels within each hierarchy level to reduce cross-level false negatives. The model improves zero-shot recognition across multiple taxonomic ranks and produces more hierarchically consistent predictions on biodiversity benchmarks such as iNaturalist21(iNat21), Rare Species, and CrypticBio.

Model Details

This model is initialized from BioCLIP ViT-B/16 and fine-tuned on TreeOfLife-10M for hierarchical fine-grained biological image classification in hyperbolic space. It uses level-restricted contrastive learning, which compares image-text pairs within the same taxonomic level rather than across mixed hierarchy levels, reducing cross-level false negatives and improving hierarchical consistency. A group-balanced objective further ensures that coarse and fine taxonomic ranks receive balanced supervision.

The model is evaluated on iNat21, Rare Species, and CrypticBio, where it improves average accuracy and top-down hierarchical consistency over OpenCLIP, BioCLIP, and RCME. It also learns more structured taxonomic representations, with clearer hierarchical organization in the embedding space.

Model Description

Developed by: Zhiyuan Tao, Srikumar Sastry, Matthew Thompson, Elizabeth Campolongo, Net Zhang, Ziheng Zhang, Hilmar Lapp, Yu Su, Tanya Berger-Wolf, Nathan Jacobs, Wei-Lun Chao, Jianyang Gu
Model type: CLIP-style dual-encoder vision-language model for biological image representation learning
License: MIT
Fine-tuned from model: BioCLIP Revision 7b4abf1

Model Sources

Homepage:
Repository:
Paper:
Demo:

Uses

Direct Use

The model can be used for zero-shot hierarchical classification with provided taxonomic names at different levels, such as family, genus, or species. It can also support few-shot classification by using labeled biological images as a support set. In addition, the model can be used as a visual encoder for downstream biological vision tasks that benefit from taxonomy-aware representations, including biodiversity monitoring, rare species recognition, and fine-grained organism classification.

Bias, Risks, and Limitations

This model is fine-tuned on TreeOfLife-10M, which may contain imbalanced taxonomic coverage and long-tailed distributions across species and higher-level groups. As a result, predictions may be biased toward well-represented taxa and may be less reliable for rare, underrepresented, visually ambiguous, or out-of-distribution organisms.

Although the model improves hierarchical consistency, it may still make incorrect predictions, especially at fine-grained levels such as genus or species. Errors at higher taxonomic levels can also affect downstream top-down predictions. The model should therefore be used as an assistive tool rather than a definitive source for biological identification.

The model may support conservation-related applications such as biodiversity monitoring and recognition of rare or threatened species. However, improved species recognition could also be misused by bad actors. Since the model itself does not provide precise geolocation information, the primary risk to endangered species remains the disclosure or misuse of location data rather than classification capability alone.

How to Get Started with the Model

BioCLIP-HC-Hyperbolic can be used with the open_clip library:

import open_clip

model, preprocess_train, preprocess_val = open_clip.create_model_and_transforms('hf-hub:imageomics/bioclip-hc-hyperbolic')
tokenizer = open_clip.get_tokenizer('hf-hub:imageomics/bioclip-hc-hyperbolic')

Training Details

Compute Infrastructure

Training was performed on 8 NVIDIA A100-80GB GPUs distributed over 2 nodes on OSC's Ascend HPC Cluster with a global batch size of 32,768 for 36 hours.

Training Data

Training uses a large-scale biological image dataset stored as WebDataset shards together with taxonomy-aware text annotations/prompts. The exact released training set, shard layout, and any additional filtering applied to the uploaded checkpoint should be documented here once finalized.

This model was trained on TreeOfLife-10M Revision ffa2a31, which is a compilation of images matched to Linnaean taxonomic rank from kingdom through species.

Training Hyperparameters

Training regime: fp16 mixed precision (AMP)

We resize images to 224 x 224 pixels. We use a maximum learning rate of 1e4 with 1000 linear warm-up steps, then use cosine decay to 0 over 100 epochs. We also use a weight decay of 0.2 and a batch size of 32K.

Evaluation

The codebase supports two main hierarchical evaluation protocols:

Coarse-to-fine evaluation Independent prediction at each taxonomy level.
Top-down evaluation Constrained prediction where the candidate set at each level is restricted to descendants of the predicted parent.

Testing Data, Factors & Metrics

Testing Data

We evaluate the model on three biologically relevant hierarchical fine-grained classification benchmarks:

iNat21 provides broad taxonomic coverage across multiple biological groups and is used to evaluate recognition performance from coarse to fine taxonomy levels, including kingdom, phylum, class, order, family, genus, and species.

Rare Species focuses on rare and long-tailed species that are absent from the TreeOfLife-10M training set. It is used to test the model’s ability to generalize to underrepresented and conservation-relevant organisms.

CrypticBio contains visually ambiguous and easily confused species, making it suitable for evaluating fine-grained biological recognition under challenging visual conditions.

For all benchmarks, we report per-level top-1 accuracy across available taxonomic ranks, average accuracy across levels, and hierarchical consistency under top-down constrained inference where applicable.

Metrics

Primary metrics supported by the current codebase include:

per-level accuracy
first-misclassification depth counts
normalized Lowest Common Ancestor (nLCA) for top-down evaluation

Results

Evaluation Results

We evaluate the model on three hierarchical fine-grained biodiversity classification benchmarks: iNat21, Rare Species, and CrypticBio. We report per-level top-1 accuracy across available taxonomic ranks.

iNat21

Method	Space	Kingdom	Phylum	Class	Order	Family	Genus	Species
OpenCLIP	Euclidean	84.76	35.37	26.08	19.25	7.71	6.80	2.09
BioCLIP	Euclidean	86.28	56.14	41.69	26.95	30.37	47.21	50.79
RCME	Euclidean	86.26	83.00	70.79	46.46	44.74	59.28	50.50
Ours	Euclidean	98.85	98.39	67.23	78.22	78.69	68.36	63.00
Ours	Hyperbolic	98.97	98.48	71.81	75.84	82.25	73.96	51.35

Rare Species

Method	Space	Phylum	Class	Order	Family	Genus	Species
OpenCLIP	Euclidean	75.89	60.87	33.32	13.31	15.27	10.62
BioCLIP	Euclidean	68.35	65.16	54.63	40.01	47.43	31.82
RCME	Euclidean	82.38	81.67	69.01	44.12	50.95	35.58
Ours	Euclidean	98.19	93.02	83.27	67.41	55.08	40.68
Ours	Hyperbolic	97.99	92.64	83.33	69.06	57.22	43.20

CrypticBio

Method	Space	Order	Family	Genus	Species
OpenCLIP	Euclidean	55.37	15.92	6.41	6.17
BioCLIP	Euclidean	81.34	39.29	37.71	36.72
RCME	Euclidean	90.11	49.84	39.45	38.97
Ours	Euclidean	97.98	72.65	46.30	40.37
Ours	Hyperbolic	96.80	78.01	52.18	49.98

Summary

Our model outperforms RCME, the previous state-of-the-art, by over 13% on average across three hierarchical biodiversity classification benchmarks.

Model Examination

We encourage readers to see the representation analysis in our paper. Our model learns embeddings that more clearly preserve the taxonomic hierarchy compared with BioCLIP and other baselines, with better separation among sibling taxa.

Citation

BibTeX:

Please also cite our paper:

Also consider citing OpenCLIP, BioCLIP, BioCLIP2:

@software{ilharco_gabriel_2021_5143773,
  author={Ilharco, Gabriel and Wortsman, Mitchell and Wightman, Ross and Gordon, Cade and Carlini, Nicholas and Taori, Rohan and Dave, Achal and Shankar, Vaishaal and Namkoong, Hongseok and Miller, John and Hajishirzi, Hannaneh and Farhadi, Ali and Schmidt, Ludwig},
  title={OpenCLIP},
  year={2021},
  doi={10.5281/zenodo.5143773},
}

Original BioCLIP Model:

@software{bioclip2023,
  author = {Samuel Stevens and Jiaman Wu and Matthew J. Thompson and Elizabeth G. Campolongo and Chan Hee Song and David Edward Carlyn and Li Dong and Wasila M. Dahdul and Charles Stewart and Tanya Berger-Wolf and Wei-Lun Chao and Yu Su},
  doi = {10.57967/hf/1511},
  month = nov,
  title = {BioCLIP},
  version = {v0.1},
  year = {2023}
}

Original BioCLIP Paper:

@inproceedings{stevens2024bioclip,
  title = {{B}io{CLIP}: A Vision Foundation Model for the Tree of Life}, 
  author = {Samuel Stevens and Jiaman Wu and Matthew J Thompson and Elizabeth G Campolongo and Chan Hee Song and David Edward Carlyn and Li Dong and Wasila M Dahdul and Charles Stewart and Tanya Berger-Wolf and Wei-Lun Chao and Yu Su},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2024},
  pages = {19412-19424}
}

Original BioCLIP2 Model:

@software{Gu_BioCLIP_2_model,
  author = {Jianyang Gu and Samuel Stevens and Elizabeth G Campolongo and Matthew J Thompson and Net Zhang and Jiaman Wu and Andrei Kopanev and Zheda Mai and Alexander E. White and James Balhoff and Wasila M Dahdul and Daniel Rubenstein and Hilmar Lapp and Tanya Berger-Wolf and Wei-Lun Chao and Yu Su},
  license = {MIT},
  title = {{BioCLIP 2}},
  url = {https://huggingface.co/imageomics/bioclip-2},
  version = {1.0.0},
  doi = {10.57967/hf/5765},
  publisher = {Hugging Face},
  year = {2025}
}

Original BioCLIP2 Paper:

@inproceedings{NEURIPS2025_94da80cb,
 author = {Gu, Jianyang and Stevens, Sam and Campolongo, Elizabeth and Thompson, Matthew and Zhang, Net and Wu, Jiaman and Kopanev, Andrei and Mai, Zheda and White, Alexander and Balhoff, James and Dahdul, Wasila and Rubenstein, Daniel and Lapp, Hilmar and Berger-Wolf, Tanya and Chao, Wei-Lun (Harry) and Su, Yu},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {D. Belgrave and C. Zhang and H. Lin and R. Pascanu and P. Koniusz and M. Ghassemi and N. Chen},
 pages = {102778--102811},
 publisher = {Curran Associates, Inc.},
 title = {BioCLIP 2: Emergent Properties from Scaling Hierarchical Contrastive Learning},
 url = {https://proceedings.neurips.cc/paper_files/paper/2025/file/94da80cbfe870c1db958c88a8a27018c-Paper-Conference.pdf},
 volume = {38},
 year = {2025}
}

Acknowledgements

This work was supported by the Imageomics Institute, which is funded by the US National Science Foundation's Harnessing the Data Revolution (HDR) program under Award #2118240 (Imageomics: A New Frontier of Biological Information Powered by Knowledge-Guided Machine Learning). Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Downloads last month: 62

imageomics
/

bioclip-hc-hyperbolic