Spaces:
Running
Running
Commit ·
9bd0946
1
Parent(s): 5357dea
conductor(setup): Add conductor setup files
Browse files- conductor/code_styleguides/python.md +37 -0
- conductor/index.md +14 -0
- conductor/product-guidelines.md +23 -0
- conductor/product.md +31 -0
- conductor/setup_state.json +1 -0
- conductor/tech-stack.md +15 -0
- conductor/tracks.md +8 -0
- conductor/tracks/ablation_20260129/index.md +5 -0
- conductor/tracks/ablation_20260129/metadata.json +8 -0
- conductor/tracks/ablation_20260129/plan.md +31 -0
- conductor/tracks/ablation_20260129/spec.md +36 -0
- conductor/workflow.md +333 -0
conductor/code_styleguides/python.md
ADDED
|
@@ -0,0 +1,37 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Google Python Style Guide Summary
|
| 2 |
+
|
| 3 |
+
This document summarizes key rules and best practices from the Google Python Style Guide.
|
| 4 |
+
|
| 5 |
+
## 1. Python Language Rules
|
| 6 |
+
- **Linting:** Run `pylint` on your code to catch bugs and style issues.
|
| 7 |
+
- **Imports:** Use `import x` for packages/modules. Use `from x import y` only when `y` is a submodule.
|
| 8 |
+
- **Exceptions:** Use built-in exception classes. Do not use bare `except:` clauses.
|
| 9 |
+
- **Global State:** Avoid mutable global state. Module-level constants are okay and should be `ALL_CAPS_WITH_UNDERSCORES`.
|
| 10 |
+
- **Comprehensions:** Use for simple cases. Avoid for complex logic where a full loop is more readable.
|
| 11 |
+
- **Default Argument Values:** Do not use mutable objects (like `[]` or `{}`) as default values.
|
| 12 |
+
- **True/False Evaluations:** Use implicit false (e.g., `if not my_list:`). Use `if foo is None:` to check for `None`.
|
| 13 |
+
- **Type Annotations:** Strongly encouraged for all public APIs.
|
| 14 |
+
|
| 15 |
+
## 2. Python Style Rules
|
| 16 |
+
- **Line Length:** Maximum 80 characters.
|
| 17 |
+
- **Indentation:** 4 spaces per indentation level. Never use tabs.
|
| 18 |
+
- **Blank Lines:** Two blank lines between top-level definitions (classes, functions). One blank line between method definitions.
|
| 19 |
+
- **Whitespace:** Avoid extraneous whitespace. Surround binary operators with single spaces.
|
| 20 |
+
- **Docstrings:** Use `"""triple double quotes"""`. Every public module, function, class, and method must have a docstring.
|
| 21 |
+
- **Format:** Start with a one-line summary. Include `Args:`, `Returns:`, and `Raises:` sections.
|
| 22 |
+
- **Strings:** Use f-strings for formatting. Be consistent with single (`'`) or double (`"`) quotes.
|
| 23 |
+
- **`TODO` Comments:** Use `TODO(username): Fix this.` format.
|
| 24 |
+
- **Imports Formatting:** Imports should be on separate lines and grouped: standard library, third-party, and your own application's imports.
|
| 25 |
+
|
| 26 |
+
## 3. Naming
|
| 27 |
+
- **General:** `snake_case` for modules, functions, methods, and variables.
|
| 28 |
+
- **Classes:** `PascalCase`.
|
| 29 |
+
- **Constants:** `ALL_CAPS_WITH_UNDERSCORES`.
|
| 30 |
+
- **Internal Use:** Use a single leading underscore (`_internal_variable`) for internal module/class members.
|
| 31 |
+
|
| 32 |
+
## 4. Main
|
| 33 |
+
- All executable files should have a `main()` function that contains the main logic, called from a `if __name__ == '__main__':` block.
|
| 34 |
+
|
| 35 |
+
**BE CONSISTENT.** When editing code, match the existing style.
|
| 36 |
+
|
| 37 |
+
*Source: [Google Python Style Guide](https://google.github.io/styleguide/pyguide.html)*
|
conductor/index.md
ADDED
|
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Project Context
|
| 2 |
+
|
| 3 |
+
## Definition
|
| 4 |
+
- [Product Definition](./product.md)
|
| 5 |
+
- [Product Guidelines](./product-guidelines.md)
|
| 6 |
+
- [Tech Stack](./tech-stack.md)
|
| 7 |
+
|
| 8 |
+
## Workflow
|
| 9 |
+
- [Workflow](./workflow.md)
|
| 10 |
+
- [Code Style Guides](./code_styleguides/)
|
| 11 |
+
|
| 12 |
+
## Management
|
| 13 |
+
- [Tracks Registry](./tracks.md)
|
| 14 |
+
- [Tracks Directory](./tracks/)
|
conductor/product-guidelines.md
ADDED
|
@@ -0,0 +1,23 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Product Guidelines
|
| 2 |
+
|
| 3 |
+
## Brand & Voice
|
| 4 |
+
- **Tone:** Enthusiastic and Accessible yet Concise. The voice should be encouraging to learners while remaining direct and functional. Avoid excessive jargon or overly long analogies; prioritize clarity and "get-to-the-point" descriptions.
|
| 5 |
+
- **Audience Engagement:** Speak directly to the user's curiosity. Frame technical explanations as answers to "How does this work?" or "What happens if...?"
|
| 6 |
+
|
| 7 |
+
## Visual Identity
|
| 8 |
+
- **Aesthetic:** Clean & Modern. Prioritize high whitespace, legible typography, and a clear visual hierarchy (inspired by Material Design or Notion).
|
| 9 |
+
- **Mode:** Support both Light and Dark modes, ensuring high contrast for data visualizations.
|
| 10 |
+
- **Color Palette:** Use a consistent color language for different model components (e.g., specific colors for Attention vs. MLP layers) to aid mental mapping.
|
| 11 |
+
|
| 12 |
+
## User Interface & Experience
|
| 13 |
+
- **Terminology & Disclosure:** Use a combination of Progressive Disclosure and In-Situ Definitions.
|
| 14 |
+
- **Tooltips:** Use tooltips for most technical terms to provide immediate, brief context without cluttering the UI.
|
| 15 |
+
- **In-Situ Descriptions:** Provide short, clear descriptions immediately followed by the relevant interactive example to solidify the concept through action.
|
| 16 |
+
- **Experimentation Layout:** Sandbox Explorer.
|
| 17 |
+
- Provide a comprehensive control panel for free-form exploration (toggles, sliders, ablation switches).
|
| 18 |
+
- **Comparison View:** Integrate comparison elements into the sandbox so users can see the impact of their modifications relative to the original state.
|
| 19 |
+
|
| 20 |
+
## Design Principles
|
| 21 |
+
- **Action-Oriented Learning:** Every architectural explanation should be paired with an interactive element.
|
| 22 |
+
- **Visual Consistency:** Ensure that attention head indices, layer numbers, and token highlights are consistent across all panels and visualization types.
|
| 23 |
+
- **On-Demand Depth:** Keep the primary interface simple, but provide clear paths (like the Glossary or tooltips) for users who want to dive deeper into the technicalities.
|
conductor/product.md
ADDED
|
@@ -0,0 +1,31 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Product Definition
|
| 2 |
+
|
| 3 |
+
## Initial Concept
|
| 4 |
+
A tool for capturing activations from transformer models and visualizing attention patterns using bertviz and an interactive Dash web application.
|
| 5 |
+
|
| 6 |
+
## Vision
|
| 7 |
+
To demystify the inner workings of Transformer-based Large Language Models (LLMs) for students and curious individuals. By combining interactive visualizations with hands-on experimentation capabilities, the tool transforms abstract architectural concepts into tangible, observable phenomena, fostering a deep, intuitive understanding of how these powerful models process information.
|
| 8 |
+
|
| 9 |
+
## Target Audience
|
| 10 |
+
- **Primary:** Machine Learning Students and AI enthusiasts.
|
| 11 |
+
- **Secondary:** Any individual seeking a practical, interactive way to learn about Transformer architectures and mechanical interpretability.
|
| 12 |
+
|
| 13 |
+
## Core Value Proposition
|
| 14 |
+
- **Visual Learning:** Translates complex matrix operations and data flows into clear, interactive visual representations (Attention Maps, Logit Lens).
|
| 15 |
+
- **Interactive Experimentation:** Goes beyond static observation by allowing users to manipulate the model (Ablation, Activation Patching) and immediately see the consequences.
|
| 16 |
+
- **Educational Scaffolding:** Supports users of varying expertise levels with layered educational content, from simple tooltips to deep-dive glossaries and future AI-guided tutorials.
|
| 17 |
+
|
| 18 |
+
## Key Features
|
| 19 |
+
- **Sequential Data Flow Visualization:** Illustrates how data transforms step-by-step through the model's layers.
|
| 20 |
+
- **Component Breakdown:** Detailed inspection views for key components like Self-Attention (heads, weights) and MLPs.
|
| 21 |
+
- **Interactive Experiments:**
|
| 22 |
+
- **Ablation Studies:** selectively disable heads or layers to observe impact on output.
|
| 23 |
+
- **Activation Steering:** modify activation values in real-time.
|
| 24 |
+
- **Prompt Comparison:** compare internal activations resulting from two different input prompts side-by-side.
|
| 25 |
+
- **Integrated Education:**
|
| 26 |
+
- Contextual tooltips for immediate clarity.
|
| 27 |
+
- Dedicated "Glossary" panel for in-depth definitions.
|
| 28 |
+
- Foundation for AI-guided tutorials.
|
| 29 |
+
|
| 30 |
+
## User Experience
|
| 31 |
+
The interface centers on exploration and clarity. Users start by selecting a model and inputting text. The dashboard then unfolds the model's processing pipeline, allowing users to "zoom in" on specific components. Experimentation modes are clearly distinguished, enabling users to hypothesize ("What if I turn off this head?") and test. Educational resources are omnipresent but non-intrusive, available on-demand to explain the *what* and *why* of what is being visualized.
|
conductor/setup_state.json
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
{"last_successful_step": "3.3_initial_track_generated"}
|
conductor/tech-stack.md
ADDED
|
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Tech Stack
|
| 2 |
+
|
| 3 |
+
## Core Technologies
|
| 4 |
+
- **Programming Language:** Python
|
| 5 |
+
- **Deep Learning Framework:** PyTorch & Hugging Face Transformers
|
| 6 |
+
|
| 7 |
+
## Frontend & Visualization
|
| 8 |
+
- **Web Framework:** Plotly Dash
|
| 9 |
+
- **Data Visualization:** Plotly
|
| 10 |
+
- **Attention Visualization:** Bertviz
|
| 11 |
+
- **Styling:** Custom CSS (Bootstrap-compatible)
|
| 12 |
+
|
| 13 |
+
## Interpretability & Research Tools
|
| 14 |
+
- **Activation Capture:** PyVene
|
| 15 |
+
- **Model Analysis:** Custom utilities for ablation and logit lens analysis
|
conductor/tracks.md
ADDED
|
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Project Tracks
|
| 2 |
+
|
| 3 |
+
This file tracks all major tracks for the project. Each track has its own detailed plan in its respective folder.
|
| 4 |
+
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
- [ ] **Track: Implement interactive ablation studies for attention heads in the Dash dashboard.**
|
| 8 |
+
*Link: [./tracks/ablation_20260129/](./tracks/ablation_20260129/)*
|
conductor/tracks/ablation_20260129/index.md
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Track ablation_20260129 Context
|
| 2 |
+
|
| 3 |
+
- [Specification](./spec.md)
|
| 4 |
+
- [Implementation Plan](./plan.md)
|
| 5 |
+
- [Metadata](./metadata.json)
|
conductor/tracks/ablation_20260129/metadata.json
ADDED
|
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"track_id": "ablation_20260129",
|
| 3 |
+
"type": "feature",
|
| 4 |
+
"status": "new",
|
| 5 |
+
"created_at": "2026-01-29T12:40:00Z",
|
| 6 |
+
"updated_at": "2026-01-29T12:40:00Z",
|
| 7 |
+
"description": "Implement interactive ablation studies for attention heads in the Dash dashboard."
|
| 8 |
+
}
|
conductor/tracks/ablation_20260129/plan.md
ADDED
|
@@ -0,0 +1,31 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Implementation Plan - Interactive Attention Head Ablation
|
| 2 |
+
|
| 3 |
+
## Phase 1: Backend Support for Ablation
|
| 4 |
+
- [ ] Task: Create a reproduction script to test manual PyVene interventions for head ablation.
|
| 5 |
+
- [ ] Sub-task: Write a standalone script that loads a small model (e.g., GPT-2) and uses PyVene to zero out a specific head (e.g., L0H0).
|
| 6 |
+
- [ ] Sub-task: Verify that the output logits change compared to the baseline run.
|
| 7 |
+
- [ ] Task: Extend `utils/model_patterns.py` (or creating `utils/ablation.py`) to support dynamic head masking.
|
| 8 |
+
- [ ] Sub-task: Write tests for the new ablation utility function.
|
| 9 |
+
- [ ] Sub-task: Implement a function `apply_ablation_mask(model, heads_to_ablate)` that registers the necessary PyVene hooks.
|
| 10 |
+
- [ ] Task: Update the main inference pipeline to accept an ablation configuration.
|
| 11 |
+
- [ ] Sub-task: Modify the capture logic to check for an "ablation list" in the request.
|
| 12 |
+
- [ ] Sub-task: Ensure the pipeline correctly applies the mask before running the forward pass.
|
| 13 |
+
- [ ] Task: Conductor - User Manual Verification 'Backend Support for Ablation' (Protocol in workflow.md)
|
| 14 |
+
|
| 15 |
+
## Phase 2: Frontend Control Panel
|
| 16 |
+
- [ ] Task: Create an `AblationPanel` component in `components/`.
|
| 17 |
+
- [ ] Sub-task: Design a layout (e.g., Heatmap or Grid) that displays all heads (Layers rows x Heads columns).
|
| 18 |
+
- [ ] Sub-task: Implement the callback to handle clicks on the grid and update a `dcc.Store` with the list of disabled heads.
|
| 19 |
+
- [ ] Task: Integrate the `AblationPanel` into `app.py`.
|
| 20 |
+
- [ ] Sub-task: Add the panel to the main layout (likely in a new "Experiments" tab or collapsible sidebar).
|
| 21 |
+
- [ ] Sub-task: Connect the global "Run" or "Update" callback to include the ablation state from the store.
|
| 22 |
+
- [ ] Task: Conductor - User Manual Verification 'Frontend Control Panel' (Protocol in workflow.md)
|
| 23 |
+
|
| 24 |
+
## Phase 3: Visualization & Feedback Loop
|
| 25 |
+
- [ ] Task: Connect the Frontend Ablation State to the Backend Inference.
|
| 26 |
+
- [ ] Sub-task: Update the main `app.py` callback to pass the `disabled_heads` list to the backend capture function.
|
| 27 |
+
- [ ] Sub-task: Verify that toggling a head in the UI updates the Logit Lens/Output display.
|
| 28 |
+
- [ ] Task: Visual Polish for Ablated State.
|
| 29 |
+
- [ ] Sub-task: Ensure the Attention Map visualization shows disabled heads as blank or "inactive".
|
| 30 |
+
- [ ] Sub-task: Add a "Reset Ablations" button to quickly restore the original model state.
|
| 31 |
+
- [ ] Task: Conductor - User Manual Verification 'Visualization & Feedback Loop' (Protocol in workflow.md)
|
conductor/tracks/ablation_20260129/spec.md
ADDED
|
@@ -0,0 +1,36 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Track Specification: Interactive Attention Head Ablation
|
| 2 |
+
|
| 3 |
+
## Overview
|
| 4 |
+
This track introduces interactive ablation capabilities to the Dash dashboard. Users will be able to selectively disable (zero out) specific attention heads in the transformer model and observe the resulting changes in model output (logits/probabilities) and attention patterns. This directly supports the "Interactive Experimentation" core value proposition.
|
| 5 |
+
|
| 6 |
+
## Goals
|
| 7 |
+
- Enable users to toggle specific attention heads on/off via the UI.
|
| 8 |
+
- Update the model's forward pass to respect these ablation masks.
|
| 9 |
+
- Visualize the "ablated" state compared to the "original" state (if feasible) or simply show the new state.
|
| 10 |
+
- Provide immediate feedback on how head removal affects token prediction.
|
| 11 |
+
|
| 12 |
+
## User Stories
|
| 13 |
+
- As a student, I want to turn off a specific attention head to see if it is responsible for a particular grammatical dependency (e.g., matching plural subjects to verbs).
|
| 14 |
+
- As a researcher, I want to ablate a group of heads to test a hypothesis about distributed representations.
|
| 15 |
+
- As a user, I want clear visual indicators of which heads are currently active or disabled.
|
| 16 |
+
|
| 17 |
+
## Requirements
|
| 18 |
+
|
| 19 |
+
### Frontend (Dash)
|
| 20 |
+
- **Ablation Control Panel:** A UI component (e.g., a grid of toggles or a heatmap with clickable cells) representing all attention heads in the model (Layers x Heads).
|
| 21 |
+
- **State Management:** Store the set of "disabled heads" in the Dash app state (`dcc.Store`).
|
| 22 |
+
- **Visual Feedback:**
|
| 23 |
+
- Disabled heads should be visually distinct (e.g., grayed out) in the visualization.
|
| 24 |
+
- The output (Logit Lens or Top-K tokens) must update dynamically when heads are toggled.
|
| 25 |
+
|
| 26 |
+
### Backend (Model Logic)
|
| 27 |
+
- **Intervention Mechanism:** Modify the `model_patterns.py` or `agnostic_capture.py` logic to accept an "ablation mask".
|
| 28 |
+
- **PyVene Integration:** Use PyVene's intervention capabilities to zero out the activations of specific heads during the forward pass.
|
| 29 |
+
- *Technical Note:* This might require defining a specific intervention function that takes the head output and multiplies it by 0 if the index matches the ablated head.
|
| 30 |
+
|
| 31 |
+
### Visualization
|
| 32 |
+
- Update the attention map visualization to reflect that the ablated head is contributing nothing (blank map or "Disabled" overlay).
|
| 33 |
+
|
| 34 |
+
## Non-Functional Requirements
|
| 35 |
+
- **Latency:** The update loop (Toggle -> Inference -> Update UI) should be fast enough for interactive exploration (< 2-3 seconds for small/medium models).
|
| 36 |
+
- **Clarity:** It must be obvious to the user that they have modified the model. A "Reset All" button is essential.
|
conductor/workflow.md
ADDED
|
@@ -0,0 +1,333 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Project Workflow
|
| 2 |
+
|
| 3 |
+
## Guiding Principles
|
| 4 |
+
|
| 5 |
+
1. **The Plan is the Source of Truth:** All work must be tracked in `plan.md`
|
| 6 |
+
2. **The Tech Stack is Deliberate:** Changes to the tech stack must be documented in `tech-stack.md` *before* implementation
|
| 7 |
+
3. **Test-Driven Development:** Write unit tests before implementing functionality
|
| 8 |
+
4. **High Code Coverage:** Aim for >80% code coverage for all modules
|
| 9 |
+
5. **User Experience First:** Every decision should prioritize user experience
|
| 10 |
+
6. **Non-Interactive & CI-Aware:** Prefer non-interactive commands. Use `CI=true` for watch-mode tools (tests, linters) to ensure single execution.
|
| 11 |
+
|
| 12 |
+
## Task Workflow
|
| 13 |
+
|
| 14 |
+
All tasks follow a strict lifecycle:
|
| 15 |
+
|
| 16 |
+
### Standard Task Workflow
|
| 17 |
+
|
| 18 |
+
1. **Select Task:** Choose the next available task from `plan.md` in sequential order
|
| 19 |
+
|
| 20 |
+
2. **Mark In Progress:** Before beginning work, edit `plan.md` and change the task from `[ ]` to `[~]`
|
| 21 |
+
|
| 22 |
+
3. **Write Failing Tests (Red Phase):**
|
| 23 |
+
- Create a new test file for the feature or bug fix.
|
| 24 |
+
- Write one or more unit tests that clearly define the expected behavior and acceptance criteria for the task.
|
| 25 |
+
- **CRITICAL:** Run the tests and confirm that they fail as expected. This is the "Red" phase of TDD. Do not proceed until you have failing tests.
|
| 26 |
+
|
| 27 |
+
4. **Implement to Pass Tests (Green Phase):**
|
| 28 |
+
- Write the minimum amount of application code necessary to make the failing tests pass.
|
| 29 |
+
- Run the test suite again and confirm that all tests now pass. This is the "Green" phase.
|
| 30 |
+
|
| 31 |
+
5. **Refactor (Optional but Recommended):**
|
| 32 |
+
- With the safety of passing tests, refactor the implementation code and the test code to improve clarity, remove duplication, and enhance performance without changing the external behavior.
|
| 33 |
+
- Rerun tests to ensure they still pass after refactoring.
|
| 34 |
+
|
| 35 |
+
6. **Verify Coverage:** Run coverage reports using the project's chosen tools. For example, in a Python project, this might look like:
|
| 36 |
+
```bash
|
| 37 |
+
pytest --cov=app --cov-report=html
|
| 38 |
+
```
|
| 39 |
+
Target: >80% coverage for new code. The specific tools and commands will vary by language and framework.
|
| 40 |
+
|
| 41 |
+
7. **Document Deviations:** If implementation differs from tech stack:
|
| 42 |
+
- **STOP** implementation
|
| 43 |
+
- Update `tech-stack.md` with new design
|
| 44 |
+
- Add dated note explaining the change
|
| 45 |
+
- Resume implementation
|
| 46 |
+
|
| 47 |
+
8. **Commit Code Changes:**
|
| 48 |
+
- Stage all code changes related to the task.
|
| 49 |
+
- Propose a clear, concise commit message e.g, `feat(ui): Create basic HTML structure for calculator`.
|
| 50 |
+
- Perform the commit.
|
| 51 |
+
|
| 52 |
+
9. **Attach Task Summary with Git Notes:**
|
| 53 |
+
- **Step 9.1: Get Commit Hash:** Obtain the hash of the *just-completed commit* (`git log -1 --format="%H"`).
|
| 54 |
+
- **Step 9.2: Draft Note Content:** Create a detailed summary for the completed task. This should include the task name, a summary of changes, a list of all created/modified files, and the core "why" for the change.
|
| 55 |
+
- **Step 9.3: Attach Note:** Use the `git notes` command to attach the summary to the commit.
|
| 56 |
+
```bash
|
| 57 |
+
# The note content from the previous step is passed via the -m flag.
|
| 58 |
+
git notes add -m "<note content>" <commit_hash>
|
| 59 |
+
```
|
| 60 |
+
|
| 61 |
+
10. **Get and Record Task Commit SHA:**
|
| 62 |
+
- **Step 10.1: Update Plan:** Read `plan.md`, find the line for the completed task, update its status from `[~]` to `[x]`, and append the first 7 characters of the *just-completed commit's* commit hash.
|
| 63 |
+
- **Step 10.2: Write Plan:** Write the updated content back to `plan.md`.
|
| 64 |
+
|
| 65 |
+
11. **Commit Plan Update:**
|
| 66 |
+
- **Action:** Stage the modified `plan.md` file.
|
| 67 |
+
- **Action:** Commit this change with a descriptive message (e.g., `conductor(plan): Mark task 'Create user model' as complete`).
|
| 68 |
+
|
| 69 |
+
### Phase Completion Verification and Checkpointing Protocol
|
| 70 |
+
|
| 71 |
+
**Trigger:** This protocol is executed immediately after a task is completed that also concludes a phase in `plan.md`.
|
| 72 |
+
|
| 73 |
+
1. **Announce Protocol Start:** Inform the user that the phase is complete and the verification and checkpointing protocol has begun.
|
| 74 |
+
|
| 75 |
+
2. **Ensure Test Coverage for Phase Changes:**
|
| 76 |
+
- **Step 2.1: Determine Phase Scope:** To identify the files changed in this phase, you must first find the starting point. Read `plan.md` to find the Git commit SHA of the *previous* phase's checkpoint. If no previous checkpoint exists, the scope is all changes since the first commit.
|
| 77 |
+
- **Step 2.2: List Changed Files:** Execute `git diff --name-only <previous_checkpoint_sha> HEAD` to get a precise list of all files modified during this phase.
|
| 78 |
+
- **Step 2.3: Verify and Create Tests:** For each file in the list:
|
| 79 |
+
- **CRITICAL:** First, check its extension. Exclude non-code files (e.g., `.json`, `.md`, `.yaml`).
|
| 80 |
+
- For each remaining code file, verify a corresponding test file exists.
|
| 81 |
+
- If a test file is missing, you **must** create one. Before writing the test, **first, analyze other test files in the repository to determine the correct naming convention and testing style.** The new tests **must** validate the functionality described in this phase's tasks (`plan.md`).
|
| 82 |
+
|
| 83 |
+
3. **Execute Automated Tests with Proactive Debugging:**
|
| 84 |
+
- Before execution, you **must** announce the exact shell command you will use to run the tests.
|
| 85 |
+
- **Example Announcement:** "I will now run the automated test suite to verify the phase. **Command:** `CI=true npm test`"
|
| 86 |
+
- Execute the announced command.
|
| 87 |
+
- If tests fail, you **must** inform the user and begin debugging. You may attempt to propose a fix a **maximum of two times**. If the tests still fail after your second proposed fix, you **must stop**, report the persistent failure, and ask the user for guidance.
|
| 88 |
+
|
| 89 |
+
4. **Propose a Detailed, Actionable Manual Verification Plan:**
|
| 90 |
+
- **CRITICAL:** To generate the plan, first analyze `product.md`, `product-guidelines.md`, and `plan.md` to determine the user-facing goals of the completed phase.
|
| 91 |
+
- You **must** generate a step-by-step plan that walks the user through the verification process, including any necessary commands and specific, expected outcomes.
|
| 92 |
+
- The plan you present to the user **must** follow this format:
|
| 93 |
+
|
| 94 |
+
**For a Frontend Change:**
|
| 95 |
+
```
|
| 96 |
+
The automated tests have passed. For manual verification, please follow these steps:
|
| 97 |
+
|
| 98 |
+
**Manual Verification Steps:**
|
| 99 |
+
1. **Start the development server with the command:** `npm run dev`
|
| 100 |
+
2. **Open your browser to:** `http://localhost:3000`
|
| 101 |
+
3. **Confirm that you see:** The new user profile page, with the user's name and email displayed correctly.
|
| 102 |
+
```
|
| 103 |
+
|
| 104 |
+
**For a Backend Change:**
|
| 105 |
+
```
|
| 106 |
+
The automated tests have passed. For manual verification, please follow these steps:
|
| 107 |
+
|
| 108 |
+
**Manual Verification Steps:**
|
| 109 |
+
1. **Ensure the server is running.**
|
| 110 |
+
2. **Execute the following command in your terminal:** `curl -X POST http://localhost:8080/api/v1/users -d '{"name": "test"}'`
|
| 111 |
+
3. **Confirm that you receive:** A JSON response with a status of `201 Created`.
|
| 112 |
+
```
|
| 113 |
+
|
| 114 |
+
5. **Await Explicit User Feedback:**
|
| 115 |
+
- After presenting the detailed plan, ask the user for confirmation: "**Does this meet your expectations? Please confirm with yes or provide feedback on what needs to be changed.**"
|
| 116 |
+
- **PAUSE** and await the user's response. Do not proceed without an explicit yes or confirmation.
|
| 117 |
+
|
| 118 |
+
6. **Create Checkpoint Commit:**
|
| 119 |
+
- Stage all changes. If no changes occurred in this step, proceed with an empty commit.
|
| 120 |
+
- Perform the commit with a clear and concise message (e.g., `conductor(checkpoint): Checkpoint end of Phase X`).
|
| 121 |
+
|
| 122 |
+
7. **Attach Auditable Verification Report using Git Notes:**
|
| 123 |
+
- **Step 7.1: Draft Note Content:** Create a detailed verification report including the automated test command, the manual verification steps, and the user's confirmation.
|
| 124 |
+
- **Step 7.2: Attach Note:** Use the `git notes` command and the full commit hash from the previous step to attach the full report to the checkpoint commit.
|
| 125 |
+
|
| 126 |
+
8. **Get and Record Phase Checkpoint SHA:**
|
| 127 |
+
- **Step 8.1: Get Commit Hash:** Obtain the hash of the *just-created checkpoint commit* (`git log -1 --format="%H"`).
|
| 128 |
+
- **Step 8.2: Update Plan:** Read `plan.md`, find the heading for the completed phase, and append the first 7 characters of the commit hash in the format `[checkpoint: <sha>]`.
|
| 129 |
+
- **Step 8.3: Write Plan:** Write the updated content back to `plan.md`.
|
| 130 |
+
|
| 131 |
+
9. **Commit Plan Update:**
|
| 132 |
+
- **Action:** Stage the modified `plan.md` file.
|
| 133 |
+
- **Action:** Commit this change with a descriptive message following the format `conductor(plan): Mark phase '<PHASE NAME>' as complete`.
|
| 134 |
+
|
| 135 |
+
10. **Announce Completion:** Inform the user that the phase is complete and the checkpoint has been created, with the detailed verification report attached as a git note.
|
| 136 |
+
|
| 137 |
+
### Quality Gates
|
| 138 |
+
|
| 139 |
+
Before marking any task complete, verify:
|
| 140 |
+
|
| 141 |
+
- [ ] All tests pass
|
| 142 |
+
- [ ] Code coverage meets requirements (>80%)
|
| 143 |
+
- [ ] Code follows project's code style guidelines (as defined in `code_styleguides/`)
|
| 144 |
+
- [ ] All public functions/methods are documented (e.g., docstrings, JSDoc, GoDoc)
|
| 145 |
+
- [ ] Type safety is enforced (e.g., type hints, TypeScript types, Go types)
|
| 146 |
+
- [ ] No linting or static analysis errors (using the project's configured tools)
|
| 147 |
+
- [ ] Works correctly on mobile (if applicable)
|
| 148 |
+
- [ ] Documentation updated if needed
|
| 149 |
+
- [ ] No security vulnerabilities introduced
|
| 150 |
+
|
| 151 |
+
## Development Commands
|
| 152 |
+
|
| 153 |
+
**AI AGENT INSTRUCTION: This section should be adapted to the project's specific language, framework, and build tools.**
|
| 154 |
+
|
| 155 |
+
### Setup
|
| 156 |
+
```bash
|
| 157 |
+
# Example: Commands to set up the development environment (e.g., install dependencies, configure database)
|
| 158 |
+
# e.g., for a Node.js project: npm install
|
| 159 |
+
# e.g., for a Go project: go mod tidy
|
| 160 |
+
```
|
| 161 |
+
|
| 162 |
+
### Daily Development
|
| 163 |
+
```bash
|
| 164 |
+
# Example: Commands for common daily tasks (e.g., start dev server, run tests, lint, format)
|
| 165 |
+
# e.g., for a Node.js project: npm run dev, npm test, npm run lint
|
| 166 |
+
# e.g., for a Go project: go run main.go, go test ./..., go fmt ./...
|
| 167 |
+
```
|
| 168 |
+
|
| 169 |
+
### Before Committing
|
| 170 |
+
```bash
|
| 171 |
+
# Example: Commands to run all pre-commit checks (e.g., format, lint, type check, run tests)
|
| 172 |
+
# e.g., for a Node.js project: npm run check
|
| 173 |
+
# e.g., for a Go project: make check (if a Makefile exists)
|
| 174 |
+
```
|
| 175 |
+
|
| 176 |
+
## Testing Requirements
|
| 177 |
+
|
| 178 |
+
### Unit Testing
|
| 179 |
+
- Every module must have corresponding tests.
|
| 180 |
+
- Use appropriate test setup/teardown mechanisms (e.g., fixtures, beforeEach/afterEach).
|
| 181 |
+
- Mock external dependencies.
|
| 182 |
+
- Test both success and failure cases.
|
| 183 |
+
|
| 184 |
+
### Integration Testing
|
| 185 |
+
- Test complete user flows
|
| 186 |
+
- Verify database transactions
|
| 187 |
+
- Test authentication and authorization
|
| 188 |
+
- Check form submissions
|
| 189 |
+
|
| 190 |
+
### Mobile Testing
|
| 191 |
+
- Test on actual iPhone when possible
|
| 192 |
+
- Use Safari developer tools
|
| 193 |
+
- Test touch interactions
|
| 194 |
+
- Verify responsive layouts
|
| 195 |
+
- Check performance on 3G/4G
|
| 196 |
+
|
| 197 |
+
## Code Review Process
|
| 198 |
+
|
| 199 |
+
### Self-Review Checklist
|
| 200 |
+
Before requesting review:
|
| 201 |
+
|
| 202 |
+
1. **Functionality**
|
| 203 |
+
- Feature works as specified
|
| 204 |
+
- Edge cases handled
|
| 205 |
+
- Error messages are user-friendly
|
| 206 |
+
|
| 207 |
+
2. **Code Quality**
|
| 208 |
+
- Follows style guide
|
| 209 |
+
- DRY principle applied
|
| 210 |
+
- Clear variable/function names
|
| 211 |
+
- Appropriate comments
|
| 212 |
+
|
| 213 |
+
3. **Testing**
|
| 214 |
+
- Unit tests comprehensive
|
| 215 |
+
- Integration tests pass
|
| 216 |
+
- Coverage adequate (>80%)
|
| 217 |
+
|
| 218 |
+
4. **Security**
|
| 219 |
+
- No hardcoded secrets
|
| 220 |
+
- Input validation present
|
| 221 |
+
- SQL injection prevented
|
| 222 |
+
- XSS protection in place
|
| 223 |
+
|
| 224 |
+
5. **Performance**
|
| 225 |
+
- Database queries optimized
|
| 226 |
+
- Images optimized
|
| 227 |
+
- Caching implemented where needed
|
| 228 |
+
|
| 229 |
+
6. **Mobile Experience**
|
| 230 |
+
- Touch targets adequate (44x44px)
|
| 231 |
+
- Text readable without zooming
|
| 232 |
+
- Performance acceptable on mobile
|
| 233 |
+
- Interactions feel native
|
| 234 |
+
|
| 235 |
+
## Commit Guidelines
|
| 236 |
+
|
| 237 |
+
### Message Format
|
| 238 |
+
```
|
| 239 |
+
<type>(<scope>): <description>
|
| 240 |
+
|
| 241 |
+
[optional body]
|
| 242 |
+
|
| 243 |
+
[optional footer]
|
| 244 |
+
```
|
| 245 |
+
|
| 246 |
+
### Types
|
| 247 |
+
- `feat`: New feature
|
| 248 |
+
- `fix`: Bug fix
|
| 249 |
+
- `docs`: Documentation only
|
| 250 |
+
- `style`: Formatting, missing semicolons, etc.
|
| 251 |
+
- `refactor`: Code change that neither fixes a bug nor adds a feature
|
| 252 |
+
- `test`: Adding missing tests
|
| 253 |
+
- `chore`: Maintenance tasks
|
| 254 |
+
|
| 255 |
+
### Examples
|
| 256 |
+
```bash
|
| 257 |
+
git commit -m "feat(auth): Add remember me functionality"
|
| 258 |
+
git commit -m "fix(posts): Correct excerpt generation for short posts"
|
| 259 |
+
git commit -m "test(comments): Add tests for emoji reaction limits"
|
| 260 |
+
git commit -m "style(mobile): Improve button touch targets"
|
| 261 |
+
```
|
| 262 |
+
|
| 263 |
+
## Definition of Done
|
| 264 |
+
|
| 265 |
+
A task is complete when:
|
| 266 |
+
|
| 267 |
+
1. All code implemented to specification
|
| 268 |
+
2. Unit tests written and passing
|
| 269 |
+
3. Code coverage meets project requirements
|
| 270 |
+
4. Documentation complete (if applicable)
|
| 271 |
+
5. Code passes all configured linting and static analysis checks
|
| 272 |
+
6. Works beautifully on mobile (if applicable)
|
| 273 |
+
7. Implementation notes added to `plan.md`
|
| 274 |
+
8. Changes committed with proper message
|
| 275 |
+
9. Git note with task summary attached to the commit
|
| 276 |
+
|
| 277 |
+
## Emergency Procedures
|
| 278 |
+
|
| 279 |
+
### Critical Bug in Production
|
| 280 |
+
1. Create hotfix branch from main
|
| 281 |
+
2. Write failing test for bug
|
| 282 |
+
3. Implement minimal fix
|
| 283 |
+
4. Test thoroughly including mobile
|
| 284 |
+
5. Deploy immediately
|
| 285 |
+
6. Document in plan.md
|
| 286 |
+
|
| 287 |
+
### Data Loss
|
| 288 |
+
1. Stop all write operations
|
| 289 |
+
2. Restore from latest backup
|
| 290 |
+
3. Verify data integrity
|
| 291 |
+
4. Document incident
|
| 292 |
+
5. Update backup procedures
|
| 293 |
+
|
| 294 |
+
### Security Breach
|
| 295 |
+
1. Rotate all secrets immediately
|
| 296 |
+
2. Review access logs
|
| 297 |
+
3. Patch vulnerability
|
| 298 |
+
4. Notify affected users (if any)
|
| 299 |
+
5. Document and update security procedures
|
| 300 |
+
|
| 301 |
+
## Deployment Workflow
|
| 302 |
+
|
| 303 |
+
### Pre-Deployment Checklist
|
| 304 |
+
- [ ] All tests passing
|
| 305 |
+
- [ ] Coverage >80%
|
| 306 |
+
- [ ] No linting errors
|
| 307 |
+
- [ ] Mobile testing complete
|
| 308 |
+
- [ ] Environment variables configured
|
| 309 |
+
- [ ] Database migrations ready
|
| 310 |
+
- [ ] Backup created
|
| 311 |
+
|
| 312 |
+
### Deployment Steps
|
| 313 |
+
1. Merge feature branch to main
|
| 314 |
+
2. Tag release with version
|
| 315 |
+
3. Push to deployment service
|
| 316 |
+
4. Run database migrations
|
| 317 |
+
5. Verify deployment
|
| 318 |
+
6. Test critical paths
|
| 319 |
+
7. Monitor for errors
|
| 320 |
+
|
| 321 |
+
### Post-Deployment
|
| 322 |
+
1. Monitor analytics
|
| 323 |
+
2. Check error logs
|
| 324 |
+
3. Gather user feedback
|
| 325 |
+
4. Plan next iteration
|
| 326 |
+
|
| 327 |
+
## Continuous Improvement
|
| 328 |
+
|
| 329 |
+
- Review workflow weekly
|
| 330 |
+
- Update based on pain points
|
| 331 |
+
- Document lessons learned
|
| 332 |
+
- Optimize for user happiness
|
| 333 |
+
- Keep things simple and maintainable
|