Instructions to use Purdy0228/ConvMemory-v3-Validity-Context with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use Purdy0228/ConvMemory-v3-Validity-Context with sentence-transformers:
from sentence_transformers import CrossEncoder model = CrossEncoder("Purdy0228/ConvMemory-v3-Validity-Context") query = "Which planet is known as the Red Planet?" passages = [ "Venus is often called Earth's twin because of its similar size and proximity.", "Mars, known for its reddish appearance, is often referred to as the Red Planet.", "Jupiter, the largest planet in our solar system, has a prominent red spot.", "Saturn, famous for its rings, is sometimes mistaken for the Red Planet." ] scores = model.predict([(query, passage) for passage in passages]) print(scores) - Notebooks
- Google Colab
- Kaggle
ConvMemory v3 Validity Context Model Card
This document separates the three source-of-truth layers for ConvMemory v3:
- method-level evaluation;
- the exported representative checkpoint;
- package-level API measurement with that checkpoint.
These are intentionally different provenance layers. Method-level numbers estimate the v3 approach across seeds. The checkpoint is a representative implementation of that approach. The package-level benchmark is the number a user should expect when loading that checkpoint through the public API on the fixed dense Memora retrieval benchmark.
Scope
ConvMemory v3 adds validity evidence to the existing v1/v2 retrieval path.
The default v3 use is validity_mode="context": it attaches a structured
validity field to returned memories and preserves the candidate set and
ranking order.
validity_mode="demote" is explicit opt-in. It is intended for dense
current-state/update workloads where a top-1 source evidence policy is
available. It preserves the candidate set and may reorder by applying a validity
penalty.
ConvMemory v3 does not make full automatic dependency-graph propagation the default retrieval path. Multi-hop graph propagation is used as an evidence-path and analysis capability unless the caller supplies a workload where graph construction has been validated.
Checkpoint
The representative v3 validity checkpoint is exported by the v557 recipe:
| Field | Value |
|---|---|
| Module | ConvMemory v3 Validity Context Layer |
| Backbone | nli-deberta-v3-base |
| Parameters | 184,423,682 |
| Export seed | 7 |
| Training rows | 5,520 |
| Dev rows | 1,400 |
| Threshold | 0.5 |
| Max length | 192 |
| Source policy | top1 |
| Default mode | context |
| Hub repository | Purdy0228/ConvMemory-v3-Validity-Context |
| Checkpoint upload commit | 0883a43fe6df608030ebe9ec29286280e83c857c |
cross_encoder/model.safetensors SHA256 |
446ee0cf6df4a8967e1a78c46d2ff3a2d777de65efbf475d2278d99468faa8d9 |
validity_config.json SHA256 |
81eddb5f2ff4545dcf4b7655fedd1f7cf846248ad8962394195e6960a2e07849 |
The checkpoint implements the v511 query-conditioned validity method. It should not be used as a replacement for the v511 multi-seed method-level estimate when reporting method quality.
Input Format
The validity scorer uses the v506/v511 query/source/target format:
USER_QUERY:
...
SOURCE_EVIDENCE:
...
TASK: Decide whether the target memory should be demoted for this user query.
paired with:
TARGET_MEMORY:
...
The package exposes two scoring paths:
ValidityEvidenceModule.apply(...): annotate or demoteRerankResultobjects while preserving the mode contracts.ValidityEvidenceModule.score_evidence_pairs(...): batch explicit query/source/target pairs after source evidence has already been selected.
The second path is the preferred dense-workload path because it avoids per-pair CrossEncoder calls.
Method-Level Evaluation
The v511 5-seed Memora-retrieval benchmark is the method-level estimate. It
scores 69,200 source-query rows across seeds [7, 11, 23, 31, 47].
Top-1 retrieved source, max aggregation:
| Metric | v511 method-level |
|---|---|
| Pair accuracy | 98.6% +/- 0.2% |
| Demote recall | 92.9% +/- 1.1% |
| Protect recall | 99.4% +/- 0.1% |
| Old-target all-type consistency | 92.8% +/- 1.1% |
| Event all-type consistency | 89.1% +/- 1.3% |
| Current active H@1 | 95.7% +/- 1.2% |
| Scoring cost | 1.9291 ms/source-query pair |
This table is the right citation for method-level claims.
Package-Level Check
The v558 public API benchmark loads the exported v557 checkpoint through
ValidityEvidenceModule.from_pretrained(...) and scores the same top-1 source
policy through the package API.
Top-1 retrieved source, max aggregation:
| Metric | v558 package/API check |
|---|---|
| Source-query rows | 6,920 |
| Target predictions | 20,760 |
| Pair accuracy | 98.7% |
| Demote recall | 93.6% |
| Protect recall | 99.4% |
| Old-target all-type consistency | 93.1% |
| Event all-type consistency | 89.6% |
| Current active H@1 | 96.5% |
| API scoring batch size | 512 |
| Scoring cost | 1.5844 ms/source-query pair |
| Module load time | 2.16 s |
The v558 number is the package-level reproducibility check for this checkpoint. It is a single-checkpoint measurement, not a replacement for the v511 multi-seed method-level estimate.
Safety Contracts
The package-level safety checks from v558 all pass:
| Contract | Status |
|---|---|
context mode preserves order |
pass |
context mode preserves ranks |
pass |
context mode attaches validity metadata |
pass |
demote mode preserves candidate set |
pass |
demote mode preserves result count |
pass |
The test suite also covers off-mode byte identity, context-mode rank preservation, demote candidate-set preservation, explicit opt-in semantics, forbidden-field rejection, safe evidence output, checkpoint round-trip, and batched CrossEncoder scoring.
Operating Policy
| Workload | Recommended mode | Source policy | Ranking mutation |
|---|---|---|---|
| General ConvMemory retrieval | context |
top-1 evidence when available | no |
| Dense current-state/update retrieval | demote opt-in |
lexical top-1 source | yes, candidate set preserved |
| Multi-hop graph explanation | context |
conservative graph evidence | no |
Top-3/top-5 source aggregation is not the default policy because earlier v499, v502, and v503 runs showed that adding more sources can introduce false positive demotions. Full top-500 graph construction is also not the default path because learned graph errors can be amplified by propagation.
Source-Of-Truth Ledger
| Claim or artifact | Value or role | Source file | Provenance layer | Availability |
|---|---|---|---|---|
| v3 method-level dense benchmark | v511 5-seed top1: old-target all-type 92.8% +/- 1.1%, current active H@1 95.7% +/- 1.2% |
results/v511_memora_retrieval_demotion_benchmark_5seed/REPORT.md |
method-level evaluation | author-retained results |
| v3 frozen configuration policy | default context mode; demote opt-in for dense current-state/update workloads; top1 source | results/v514_v3_freeze_config/final_config.json |
configuration freeze | author-retained results |
| exported checkpoint manifest | seed-7 representative checkpoint; 184,423,682 params; threshold 0.5; Hub repo Purdy0228/ConvMemory-v3-Validity-Context |
results/v557_v3_validity_checkpoint/seed_7/MANIFEST.json |
checkpoint export | checkpoint artifact / author-retained manifest |
| checkpoint scorer config | mode_default="context", source_policy="top1", cross_encoder_num_labels=2 |
results/v557_v3_validity_checkpoint/seed_7/validity_config.json |
checkpoint export | checkpoint artifact / author-retained config |
| package API benchmark | v558 top1 package check: old-target all-type 93.1%, current active H@1 96.5% |
results/v558_v3_public_api_benchmark_batch/REPORT.md |
package-level measurement | author-retained results |
| package API latency | 1.5844 ms/source-query pair, API batch size 512 |
results/v558_v3_public_api_benchmark_batch/summary.json |
package-level measurement | author-retained results |
| validity module code | ValidityEvidenceModule, ValidityEvidenceConfig, score_evidence_pairs |
convmemory/validity.py |
package code | public package when tagged v0.6.0 |
| public API integration | load_validity_module, validity_mode, retrieve/rerank integration |
convmemory/api.py |
package code | public package when tagged v0.6.0 |
| result payload | RerankResult.validity |
convmemory/reranker.py |
package code | public package when tagged v0.6.0 |
| safety tests | 41 passed after v558 batch update |
tests/test_validity_context.py and existing package tests |
machine-checkable tests | public package when tagged v0.6.0 |
| user documentation | mode semantics, safety contracts, scorer format | docs/VALIDITY_CONTEXT.md |
package documentation | public package when tagged v0.6.0 |
The results/... packets are source-of-truth evaluation artifacts kept with
the author workspace unless explicitly packaged with a release. The package
code, tests, and documentation are the public reproducibility surface once tag
v0.6.0 is cut.
Known Boundaries
- The v3 checkpoint is trained for query-conditioned validity decisions with source evidence. It is not a generic factuality judge.
- Automatic demotion is intended for dense current-state/update workloads. General sparse retrieval should use context annotation by default.
- Broad learned source retrieval and automatic strict dependency graph construction are not part of the default v3 retrieval contract.
- The v511 method-level estimate and v558 package-level benchmark use different but connected provenance layers; report them with their layer names.