Shuu12121/Owl-ph2-len2048 π¦
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: Shuu12121/Owl-ph2-base-len2048
- Maximum Sequence Length: 1024 tokens (2048 tokens during pretraining)
- Output Dimensionality: 768
- Similarity Function: Cosine Similarity
This model is a SentenceTransformer variant of Shuu12121/Owl-ph2-base-len2048. It was trained on the Owl corpus for code search and code-text retrieval. The training data consists of roughly 100,000 samples per language (800,640 pairs in total), and the model was trained for 1 epoch with a learning rate of 1e-5.
Model Sources
- Base model: Shuu12121/Owl-ph2-base-len2048
- Sentence Transformers: Sentence Transformers Documentation
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 1024, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Intended Uses
This model is intended for:
- code search
- code-text retrieval
- semantic similarity
- dense embedding generation for source code and natural language
Usage
Direct Usage (Sentence Transformers)
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("Shuu12121/Owl-ph2-len2048")
Training Details
Training Dataset
This model was trained on the Owl corpus, a dataset constructed for code search and code-text retrieval. The training set contains approximately 100,000 samples per language, resulting in 800,640 training pairs in total.
Training Hyperparameters
- Learning rate: 1e-5
- Epochs: 1
- Loss: MultipleNegativesRankingLoss
Integrations
Owl-CLI
This model is used as the embedding model in Owl-CLI, a command-line tool for semantic code search.
Owl-CLI indexes source code at the function level, generates dense embeddings using this model, and performs vector similarity search to retrieve relevant code for natural language queries.
Key features of Owl-CLI include:
- Semantic code search using dense embeddings
- Function-level indexing with file paths and line numbers
- Automatic indexing on first search
- Differential embedding cache to avoid re-embedding unchanged files
- JSON output for tool integration
- MCP server support for integration with AI coding agents (e.g., Claude Code)
Repository:
https://github.com/Shun0212/Owl-CLI
- Downloads last month
- 119
Model tree for Shuu12121/Owl-ph2-len2048
Base model
Shuu12121/Owl-ph2-base-len2048