Terry Rodriguez
terry-remyx
AI & ML interests
None yet
Recent Activity
reacted to TravisMuhlestein's post with π₯ 8 days ago
We have model cards. We donβt yet have capability manifests. Thatβs the gap DNS-AID points toward.
The Linux Foundation just launched DNS-AID: open, decentralized discovery infrastructure for AI agents.
π https://www.linuxfoundation.org/press/linux-foundation-announces-dns-aid-project-to-advance-decentralized-ai-agent-discovery
Most agent frameworks today still assume agents already know where other agents and tools exist. That assumption starts breaking down in cross-platform and cross-organization workflows.
My hypothesis: agent ecosystems eventually need standardized schemas describing not just what model an agent runs, but what it can actually do β tool interfaces, invocation patterns, input/output contracts, trust metadata, operational constraints, etc. Something orchestrators and other agents can discover and reason about dynamically without hardcoded integrations.
Feels like an area the open-source ecosystem could meaningfully shape early before proprietary registries and platform lock-in dominate the space.
Curious if others here are already working on interoperability, discovery, capability schemas, or agent routing layers. Would love to compare notes. reacted to salma-remyx's post with π 10 days ago
In that benchmark comparison, do you even have the sample size to distinguish two models, or are you making decisions based on statistical noise?
"Resolution Diagnostics for Paired LLM Evaluation" offers a simple check: a per-pair resolution ratio q = N/N* that flags when a displayed ranking sits below the resolution floor regardless of p-value.
arXiv: https://arxiv.org/abs/2605.30315v1
Outrider automatically matched this paper to our fork of lm-evaluation-harness and opened a PR implementing the diagnostic.
Configure the action to find new methods tailored to your repo: https://github.com/remyxai/outrider reacted to salma-remyx's post with π₯ 12 days ago
π Outrider β a GitHub Action that scouts arXiv for your repo
We built Outrider to close the gap between your code and the latest arXiv research. The best new methods for your repo may not be from the viral paper.
How it works β every week (or your configured cadence), Outrider:
1. Pulls candidate papers from a Remyx engine that ranks arXiv against your repo's commit history
2. Runs a Claude selection pass over the pool β picks the candidate most implementable against your specific codebase
3. Invokes Claude Code to draft the integration into an existing call site
4. Runs quality gates (path allowlist, integration validator, stub-density check, self-review)
5. Opens a draft PR β or an Issue when a PR would be premature
Two recent PRs:
- remyxai/FFMPerative β picked Aurora (2026 video-editing-agent paper), wired plan-validation into the existing execution path. 5 min, $1.45.
- remyxai/VQASynth β picked PGT (procedurally-generated grounding), wired the scorer into the existing BenchmarkRunner registry. 8 min, $2.64.
Free to install via GitHub Marketplace. You bring your own ANTHROPIC_API_KEY (~$2-3 per PR-track run).
Repo: https://github.com/remyxai/outrider
Longer write-up tomorrow on Substack β more detail on the spec-bundle format, the selection-pass design, and what we learned testing across dozens of repos.