cdpearlman commited on
Commit
9bd0946
·
1 Parent(s): 5357dea

conductor(setup): Add conductor setup files

Browse files
conductor/code_styleguides/python.md ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Google Python Style Guide Summary
2
+
3
+ This document summarizes key rules and best practices from the Google Python Style Guide.
4
+
5
+ ## 1. Python Language Rules
6
+ - **Linting:** Run `pylint` on your code to catch bugs and style issues.
7
+ - **Imports:** Use `import x` for packages/modules. Use `from x import y` only when `y` is a submodule.
8
+ - **Exceptions:** Use built-in exception classes. Do not use bare `except:` clauses.
9
+ - **Global State:** Avoid mutable global state. Module-level constants are okay and should be `ALL_CAPS_WITH_UNDERSCORES`.
10
+ - **Comprehensions:** Use for simple cases. Avoid for complex logic where a full loop is more readable.
11
+ - **Default Argument Values:** Do not use mutable objects (like `[]` or `{}`) as default values.
12
+ - **True/False Evaluations:** Use implicit false (e.g., `if not my_list:`). Use `if foo is None:` to check for `None`.
13
+ - **Type Annotations:** Strongly encouraged for all public APIs.
14
+
15
+ ## 2. Python Style Rules
16
+ - **Line Length:** Maximum 80 characters.
17
+ - **Indentation:** 4 spaces per indentation level. Never use tabs.
18
+ - **Blank Lines:** Two blank lines between top-level definitions (classes, functions). One blank line between method definitions.
19
+ - **Whitespace:** Avoid extraneous whitespace. Surround binary operators with single spaces.
20
+ - **Docstrings:** Use `"""triple double quotes"""`. Every public module, function, class, and method must have a docstring.
21
+ - **Format:** Start with a one-line summary. Include `Args:`, `Returns:`, and `Raises:` sections.
22
+ - **Strings:** Use f-strings for formatting. Be consistent with single (`'`) or double (`"`) quotes.
23
+ - **`TODO` Comments:** Use `TODO(username): Fix this.` format.
24
+ - **Imports Formatting:** Imports should be on separate lines and grouped: standard library, third-party, and your own application's imports.
25
+
26
+ ## 3. Naming
27
+ - **General:** `snake_case` for modules, functions, methods, and variables.
28
+ - **Classes:** `PascalCase`.
29
+ - **Constants:** `ALL_CAPS_WITH_UNDERSCORES`.
30
+ - **Internal Use:** Use a single leading underscore (`_internal_variable`) for internal module/class members.
31
+
32
+ ## 4. Main
33
+ - All executable files should have a `main()` function that contains the main logic, called from a `if __name__ == '__main__':` block.
34
+
35
+ **BE CONSISTENT.** When editing code, match the existing style.
36
+
37
+ *Source: [Google Python Style Guide](https://google.github.io/styleguide/pyguide.html)*
conductor/index.md ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Project Context
2
+
3
+ ## Definition
4
+ - [Product Definition](./product.md)
5
+ - [Product Guidelines](./product-guidelines.md)
6
+ - [Tech Stack](./tech-stack.md)
7
+
8
+ ## Workflow
9
+ - [Workflow](./workflow.md)
10
+ - [Code Style Guides](./code_styleguides/)
11
+
12
+ ## Management
13
+ - [Tracks Registry](./tracks.md)
14
+ - [Tracks Directory](./tracks/)
conductor/product-guidelines.md ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Product Guidelines
2
+
3
+ ## Brand & Voice
4
+ - **Tone:** Enthusiastic and Accessible yet Concise. The voice should be encouraging to learners while remaining direct and functional. Avoid excessive jargon or overly long analogies; prioritize clarity and "get-to-the-point" descriptions.
5
+ - **Audience Engagement:** Speak directly to the user's curiosity. Frame technical explanations as answers to "How does this work?" or "What happens if...?"
6
+
7
+ ## Visual Identity
8
+ - **Aesthetic:** Clean & Modern. Prioritize high whitespace, legible typography, and a clear visual hierarchy (inspired by Material Design or Notion).
9
+ - **Mode:** Support both Light and Dark modes, ensuring high contrast for data visualizations.
10
+ - **Color Palette:** Use a consistent color language for different model components (e.g., specific colors for Attention vs. MLP layers) to aid mental mapping.
11
+
12
+ ## User Interface & Experience
13
+ - **Terminology & Disclosure:** Use a combination of Progressive Disclosure and In-Situ Definitions.
14
+ - **Tooltips:** Use tooltips for most technical terms to provide immediate, brief context without cluttering the UI.
15
+ - **In-Situ Descriptions:** Provide short, clear descriptions immediately followed by the relevant interactive example to solidify the concept through action.
16
+ - **Experimentation Layout:** Sandbox Explorer.
17
+ - Provide a comprehensive control panel for free-form exploration (toggles, sliders, ablation switches).
18
+ - **Comparison View:** Integrate comparison elements into the sandbox so users can see the impact of their modifications relative to the original state.
19
+
20
+ ## Design Principles
21
+ - **Action-Oriented Learning:** Every architectural explanation should be paired with an interactive element.
22
+ - **Visual Consistency:** Ensure that attention head indices, layer numbers, and token highlights are consistent across all panels and visualization types.
23
+ - **On-Demand Depth:** Keep the primary interface simple, but provide clear paths (like the Glossary or tooltips) for users who want to dive deeper into the technicalities.
conductor/product.md ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Product Definition
2
+
3
+ ## Initial Concept
4
+ A tool for capturing activations from transformer models and visualizing attention patterns using bertviz and an interactive Dash web application.
5
+
6
+ ## Vision
7
+ To demystify the inner workings of Transformer-based Large Language Models (LLMs) for students and curious individuals. By combining interactive visualizations with hands-on experimentation capabilities, the tool transforms abstract architectural concepts into tangible, observable phenomena, fostering a deep, intuitive understanding of how these powerful models process information.
8
+
9
+ ## Target Audience
10
+ - **Primary:** Machine Learning Students and AI enthusiasts.
11
+ - **Secondary:** Any individual seeking a practical, interactive way to learn about Transformer architectures and mechanical interpretability.
12
+
13
+ ## Core Value Proposition
14
+ - **Visual Learning:** Translates complex matrix operations and data flows into clear, interactive visual representations (Attention Maps, Logit Lens).
15
+ - **Interactive Experimentation:** Goes beyond static observation by allowing users to manipulate the model (Ablation, Activation Patching) and immediately see the consequences.
16
+ - **Educational Scaffolding:** Supports users of varying expertise levels with layered educational content, from simple tooltips to deep-dive glossaries and future AI-guided tutorials.
17
+
18
+ ## Key Features
19
+ - **Sequential Data Flow Visualization:** Illustrates how data transforms step-by-step through the model's layers.
20
+ - **Component Breakdown:** Detailed inspection views for key components like Self-Attention (heads, weights) and MLPs.
21
+ - **Interactive Experiments:**
22
+ - **Ablation Studies:** selectively disable heads or layers to observe impact on output.
23
+ - **Activation Steering:** modify activation values in real-time.
24
+ - **Prompt Comparison:** compare internal activations resulting from two different input prompts side-by-side.
25
+ - **Integrated Education:**
26
+ - Contextual tooltips for immediate clarity.
27
+ - Dedicated "Glossary" panel for in-depth definitions.
28
+ - Foundation for AI-guided tutorials.
29
+
30
+ ## User Experience
31
+ The interface centers on exploration and clarity. Users start by selecting a model and inputting text. The dashboard then unfolds the model's processing pipeline, allowing users to "zoom in" on specific components. Experimentation modes are clearly distinguished, enabling users to hypothesize ("What if I turn off this head?") and test. Educational resources are omnipresent but non-intrusive, available on-demand to explain the *what* and *why* of what is being visualized.
conductor/setup_state.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"last_successful_step": "3.3_initial_track_generated"}
conductor/tech-stack.md ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Tech Stack
2
+
3
+ ## Core Technologies
4
+ - **Programming Language:** Python
5
+ - **Deep Learning Framework:** PyTorch & Hugging Face Transformers
6
+
7
+ ## Frontend & Visualization
8
+ - **Web Framework:** Plotly Dash
9
+ - **Data Visualization:** Plotly
10
+ - **Attention Visualization:** Bertviz
11
+ - **Styling:** Custom CSS (Bootstrap-compatible)
12
+
13
+ ## Interpretability & Research Tools
14
+ - **Activation Capture:** PyVene
15
+ - **Model Analysis:** Custom utilities for ablation and logit lens analysis
conductor/tracks.md ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ # Project Tracks
2
+
3
+ This file tracks all major tracks for the project. Each track has its own detailed plan in its respective folder.
4
+
5
+ ---
6
+
7
+ - [ ] **Track: Implement interactive ablation studies for attention heads in the Dash dashboard.**
8
+ *Link: [./tracks/ablation_20260129/](./tracks/ablation_20260129/)*
conductor/tracks/ablation_20260129/index.md ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ # Track ablation_20260129 Context
2
+
3
+ - [Specification](./spec.md)
4
+ - [Implementation Plan](./plan.md)
5
+ - [Metadata](./metadata.json)
conductor/tracks/ablation_20260129/metadata.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "track_id": "ablation_20260129",
3
+ "type": "feature",
4
+ "status": "new",
5
+ "created_at": "2026-01-29T12:40:00Z",
6
+ "updated_at": "2026-01-29T12:40:00Z",
7
+ "description": "Implement interactive ablation studies for attention heads in the Dash dashboard."
8
+ }
conductor/tracks/ablation_20260129/plan.md ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Implementation Plan - Interactive Attention Head Ablation
2
+
3
+ ## Phase 1: Backend Support for Ablation
4
+ - [ ] Task: Create a reproduction script to test manual PyVene interventions for head ablation.
5
+ - [ ] Sub-task: Write a standalone script that loads a small model (e.g., GPT-2) and uses PyVene to zero out a specific head (e.g., L0H0).
6
+ - [ ] Sub-task: Verify that the output logits change compared to the baseline run.
7
+ - [ ] Task: Extend `utils/model_patterns.py` (or creating `utils/ablation.py`) to support dynamic head masking.
8
+ - [ ] Sub-task: Write tests for the new ablation utility function.
9
+ - [ ] Sub-task: Implement a function `apply_ablation_mask(model, heads_to_ablate)` that registers the necessary PyVene hooks.
10
+ - [ ] Task: Update the main inference pipeline to accept an ablation configuration.
11
+ - [ ] Sub-task: Modify the capture logic to check for an "ablation list" in the request.
12
+ - [ ] Sub-task: Ensure the pipeline correctly applies the mask before running the forward pass.
13
+ - [ ] Task: Conductor - User Manual Verification 'Backend Support for Ablation' (Protocol in workflow.md)
14
+
15
+ ## Phase 2: Frontend Control Panel
16
+ - [ ] Task: Create an `AblationPanel` component in `components/`.
17
+ - [ ] Sub-task: Design a layout (e.g., Heatmap or Grid) that displays all heads (Layers rows x Heads columns).
18
+ - [ ] Sub-task: Implement the callback to handle clicks on the grid and update a `dcc.Store` with the list of disabled heads.
19
+ - [ ] Task: Integrate the `AblationPanel` into `app.py`.
20
+ - [ ] Sub-task: Add the panel to the main layout (likely in a new "Experiments" tab or collapsible sidebar).
21
+ - [ ] Sub-task: Connect the global "Run" or "Update" callback to include the ablation state from the store.
22
+ - [ ] Task: Conductor - User Manual Verification 'Frontend Control Panel' (Protocol in workflow.md)
23
+
24
+ ## Phase 3: Visualization & Feedback Loop
25
+ - [ ] Task: Connect the Frontend Ablation State to the Backend Inference.
26
+ - [ ] Sub-task: Update the main `app.py` callback to pass the `disabled_heads` list to the backend capture function.
27
+ - [ ] Sub-task: Verify that toggling a head in the UI updates the Logit Lens/Output display.
28
+ - [ ] Task: Visual Polish for Ablated State.
29
+ - [ ] Sub-task: Ensure the Attention Map visualization shows disabled heads as blank or "inactive".
30
+ - [ ] Sub-task: Add a "Reset Ablations" button to quickly restore the original model state.
31
+ - [ ] Task: Conductor - User Manual Verification 'Visualization & Feedback Loop' (Protocol in workflow.md)
conductor/tracks/ablation_20260129/spec.md ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Track Specification: Interactive Attention Head Ablation
2
+
3
+ ## Overview
4
+ This track introduces interactive ablation capabilities to the Dash dashboard. Users will be able to selectively disable (zero out) specific attention heads in the transformer model and observe the resulting changes in model output (logits/probabilities) and attention patterns. This directly supports the "Interactive Experimentation" core value proposition.
5
+
6
+ ## Goals
7
+ - Enable users to toggle specific attention heads on/off via the UI.
8
+ - Update the model's forward pass to respect these ablation masks.
9
+ - Visualize the "ablated" state compared to the "original" state (if feasible) or simply show the new state.
10
+ - Provide immediate feedback on how head removal affects token prediction.
11
+
12
+ ## User Stories
13
+ - As a student, I want to turn off a specific attention head to see if it is responsible for a particular grammatical dependency (e.g., matching plural subjects to verbs).
14
+ - As a researcher, I want to ablate a group of heads to test a hypothesis about distributed representations.
15
+ - As a user, I want clear visual indicators of which heads are currently active or disabled.
16
+
17
+ ## Requirements
18
+
19
+ ### Frontend (Dash)
20
+ - **Ablation Control Panel:** A UI component (e.g., a grid of toggles or a heatmap with clickable cells) representing all attention heads in the model (Layers x Heads).
21
+ - **State Management:** Store the set of "disabled heads" in the Dash app state (`dcc.Store`).
22
+ - **Visual Feedback:**
23
+ - Disabled heads should be visually distinct (e.g., grayed out) in the visualization.
24
+ - The output (Logit Lens or Top-K tokens) must update dynamically when heads are toggled.
25
+
26
+ ### Backend (Model Logic)
27
+ - **Intervention Mechanism:** Modify the `model_patterns.py` or `agnostic_capture.py` logic to accept an "ablation mask".
28
+ - **PyVene Integration:** Use PyVene's intervention capabilities to zero out the activations of specific heads during the forward pass.
29
+ - *Technical Note:* This might require defining a specific intervention function that takes the head output and multiplies it by 0 if the index matches the ablated head.
30
+
31
+ ### Visualization
32
+ - Update the attention map visualization to reflect that the ablated head is contributing nothing (blank map or "Disabled" overlay).
33
+
34
+ ## Non-Functional Requirements
35
+ - **Latency:** The update loop (Toggle -> Inference -> Update UI) should be fast enough for interactive exploration (< 2-3 seconds for small/medium models).
36
+ - **Clarity:** It must be obvious to the user that they have modified the model. A "Reset All" button is essential.
conductor/workflow.md ADDED
@@ -0,0 +1,333 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Project Workflow
2
+
3
+ ## Guiding Principles
4
+
5
+ 1. **The Plan is the Source of Truth:** All work must be tracked in `plan.md`
6
+ 2. **The Tech Stack is Deliberate:** Changes to the tech stack must be documented in `tech-stack.md` *before* implementation
7
+ 3. **Test-Driven Development:** Write unit tests before implementing functionality
8
+ 4. **High Code Coverage:** Aim for >80% code coverage for all modules
9
+ 5. **User Experience First:** Every decision should prioritize user experience
10
+ 6. **Non-Interactive & CI-Aware:** Prefer non-interactive commands. Use `CI=true` for watch-mode tools (tests, linters) to ensure single execution.
11
+
12
+ ## Task Workflow
13
+
14
+ All tasks follow a strict lifecycle:
15
+
16
+ ### Standard Task Workflow
17
+
18
+ 1. **Select Task:** Choose the next available task from `plan.md` in sequential order
19
+
20
+ 2. **Mark In Progress:** Before beginning work, edit `plan.md` and change the task from `[ ]` to `[~]`
21
+
22
+ 3. **Write Failing Tests (Red Phase):**
23
+ - Create a new test file for the feature or bug fix.
24
+ - Write one or more unit tests that clearly define the expected behavior and acceptance criteria for the task.
25
+ - **CRITICAL:** Run the tests and confirm that they fail as expected. This is the "Red" phase of TDD. Do not proceed until you have failing tests.
26
+
27
+ 4. **Implement to Pass Tests (Green Phase):**
28
+ - Write the minimum amount of application code necessary to make the failing tests pass.
29
+ - Run the test suite again and confirm that all tests now pass. This is the "Green" phase.
30
+
31
+ 5. **Refactor (Optional but Recommended):**
32
+ - With the safety of passing tests, refactor the implementation code and the test code to improve clarity, remove duplication, and enhance performance without changing the external behavior.
33
+ - Rerun tests to ensure they still pass after refactoring.
34
+
35
+ 6. **Verify Coverage:** Run coverage reports using the project's chosen tools. For example, in a Python project, this might look like:
36
+ ```bash
37
+ pytest --cov=app --cov-report=html
38
+ ```
39
+ Target: >80% coverage for new code. The specific tools and commands will vary by language and framework.
40
+
41
+ 7. **Document Deviations:** If implementation differs from tech stack:
42
+ - **STOP** implementation
43
+ - Update `tech-stack.md` with new design
44
+ - Add dated note explaining the change
45
+ - Resume implementation
46
+
47
+ 8. **Commit Code Changes:**
48
+ - Stage all code changes related to the task.
49
+ - Propose a clear, concise commit message e.g, `feat(ui): Create basic HTML structure for calculator`.
50
+ - Perform the commit.
51
+
52
+ 9. **Attach Task Summary with Git Notes:**
53
+ - **Step 9.1: Get Commit Hash:** Obtain the hash of the *just-completed commit* (`git log -1 --format="%H"`).
54
+ - **Step 9.2: Draft Note Content:** Create a detailed summary for the completed task. This should include the task name, a summary of changes, a list of all created/modified files, and the core "why" for the change.
55
+ - **Step 9.3: Attach Note:** Use the `git notes` command to attach the summary to the commit.
56
+ ```bash
57
+ # The note content from the previous step is passed via the -m flag.
58
+ git notes add -m "<note content>" <commit_hash>
59
+ ```
60
+
61
+ 10. **Get and Record Task Commit SHA:**
62
+ - **Step 10.1: Update Plan:** Read `plan.md`, find the line for the completed task, update its status from `[~]` to `[x]`, and append the first 7 characters of the *just-completed commit's* commit hash.
63
+ - **Step 10.2: Write Plan:** Write the updated content back to `plan.md`.
64
+
65
+ 11. **Commit Plan Update:**
66
+ - **Action:** Stage the modified `plan.md` file.
67
+ - **Action:** Commit this change with a descriptive message (e.g., `conductor(plan): Mark task 'Create user model' as complete`).
68
+
69
+ ### Phase Completion Verification and Checkpointing Protocol
70
+
71
+ **Trigger:** This protocol is executed immediately after a task is completed that also concludes a phase in `plan.md`.
72
+
73
+ 1. **Announce Protocol Start:** Inform the user that the phase is complete and the verification and checkpointing protocol has begun.
74
+
75
+ 2. **Ensure Test Coverage for Phase Changes:**
76
+ - **Step 2.1: Determine Phase Scope:** To identify the files changed in this phase, you must first find the starting point. Read `plan.md` to find the Git commit SHA of the *previous* phase's checkpoint. If no previous checkpoint exists, the scope is all changes since the first commit.
77
+ - **Step 2.2: List Changed Files:** Execute `git diff --name-only <previous_checkpoint_sha> HEAD` to get a precise list of all files modified during this phase.
78
+ - **Step 2.3: Verify and Create Tests:** For each file in the list:
79
+ - **CRITICAL:** First, check its extension. Exclude non-code files (e.g., `.json`, `.md`, `.yaml`).
80
+ - For each remaining code file, verify a corresponding test file exists.
81
+ - If a test file is missing, you **must** create one. Before writing the test, **first, analyze other test files in the repository to determine the correct naming convention and testing style.** The new tests **must** validate the functionality described in this phase's tasks (`plan.md`).
82
+
83
+ 3. **Execute Automated Tests with Proactive Debugging:**
84
+ - Before execution, you **must** announce the exact shell command you will use to run the tests.
85
+ - **Example Announcement:** "I will now run the automated test suite to verify the phase. **Command:** `CI=true npm test`"
86
+ - Execute the announced command.
87
+ - If tests fail, you **must** inform the user and begin debugging. You may attempt to propose a fix a **maximum of two times**. If the tests still fail after your second proposed fix, you **must stop**, report the persistent failure, and ask the user for guidance.
88
+
89
+ 4. **Propose a Detailed, Actionable Manual Verification Plan:**
90
+ - **CRITICAL:** To generate the plan, first analyze `product.md`, `product-guidelines.md`, and `plan.md` to determine the user-facing goals of the completed phase.
91
+ - You **must** generate a step-by-step plan that walks the user through the verification process, including any necessary commands and specific, expected outcomes.
92
+ - The plan you present to the user **must** follow this format:
93
+
94
+ **For a Frontend Change:**
95
+ ```
96
+ The automated tests have passed. For manual verification, please follow these steps:
97
+
98
+ **Manual Verification Steps:**
99
+ 1. **Start the development server with the command:** `npm run dev`
100
+ 2. **Open your browser to:** `http://localhost:3000`
101
+ 3. **Confirm that you see:** The new user profile page, with the user's name and email displayed correctly.
102
+ ```
103
+
104
+ **For a Backend Change:**
105
+ ```
106
+ The automated tests have passed. For manual verification, please follow these steps:
107
+
108
+ **Manual Verification Steps:**
109
+ 1. **Ensure the server is running.**
110
+ 2. **Execute the following command in your terminal:** `curl -X POST http://localhost:8080/api/v1/users -d '{"name": "test"}'`
111
+ 3. **Confirm that you receive:** A JSON response with a status of `201 Created`.
112
+ ```
113
+
114
+ 5. **Await Explicit User Feedback:**
115
+ - After presenting the detailed plan, ask the user for confirmation: "**Does this meet your expectations? Please confirm with yes or provide feedback on what needs to be changed.**"
116
+ - **PAUSE** and await the user's response. Do not proceed without an explicit yes or confirmation.
117
+
118
+ 6. **Create Checkpoint Commit:**
119
+ - Stage all changes. If no changes occurred in this step, proceed with an empty commit.
120
+ - Perform the commit with a clear and concise message (e.g., `conductor(checkpoint): Checkpoint end of Phase X`).
121
+
122
+ 7. **Attach Auditable Verification Report using Git Notes:**
123
+ - **Step 7.1: Draft Note Content:** Create a detailed verification report including the automated test command, the manual verification steps, and the user's confirmation.
124
+ - **Step 7.2: Attach Note:** Use the `git notes` command and the full commit hash from the previous step to attach the full report to the checkpoint commit.
125
+
126
+ 8. **Get and Record Phase Checkpoint SHA:**
127
+ - **Step 8.1: Get Commit Hash:** Obtain the hash of the *just-created checkpoint commit* (`git log -1 --format="%H"`).
128
+ - **Step 8.2: Update Plan:** Read `plan.md`, find the heading for the completed phase, and append the first 7 characters of the commit hash in the format `[checkpoint: <sha>]`.
129
+ - **Step 8.3: Write Plan:** Write the updated content back to `plan.md`.
130
+
131
+ 9. **Commit Plan Update:**
132
+ - **Action:** Stage the modified `plan.md` file.
133
+ - **Action:** Commit this change with a descriptive message following the format `conductor(plan): Mark phase '<PHASE NAME>' as complete`.
134
+
135
+ 10. **Announce Completion:** Inform the user that the phase is complete and the checkpoint has been created, with the detailed verification report attached as a git note.
136
+
137
+ ### Quality Gates
138
+
139
+ Before marking any task complete, verify:
140
+
141
+ - [ ] All tests pass
142
+ - [ ] Code coverage meets requirements (>80%)
143
+ - [ ] Code follows project's code style guidelines (as defined in `code_styleguides/`)
144
+ - [ ] All public functions/methods are documented (e.g., docstrings, JSDoc, GoDoc)
145
+ - [ ] Type safety is enforced (e.g., type hints, TypeScript types, Go types)
146
+ - [ ] No linting or static analysis errors (using the project's configured tools)
147
+ - [ ] Works correctly on mobile (if applicable)
148
+ - [ ] Documentation updated if needed
149
+ - [ ] No security vulnerabilities introduced
150
+
151
+ ## Development Commands
152
+
153
+ **AI AGENT INSTRUCTION: This section should be adapted to the project's specific language, framework, and build tools.**
154
+
155
+ ### Setup
156
+ ```bash
157
+ # Example: Commands to set up the development environment (e.g., install dependencies, configure database)
158
+ # e.g., for a Node.js project: npm install
159
+ # e.g., for a Go project: go mod tidy
160
+ ```
161
+
162
+ ### Daily Development
163
+ ```bash
164
+ # Example: Commands for common daily tasks (e.g., start dev server, run tests, lint, format)
165
+ # e.g., for a Node.js project: npm run dev, npm test, npm run lint
166
+ # e.g., for a Go project: go run main.go, go test ./..., go fmt ./...
167
+ ```
168
+
169
+ ### Before Committing
170
+ ```bash
171
+ # Example: Commands to run all pre-commit checks (e.g., format, lint, type check, run tests)
172
+ # e.g., for a Node.js project: npm run check
173
+ # e.g., for a Go project: make check (if a Makefile exists)
174
+ ```
175
+
176
+ ## Testing Requirements
177
+
178
+ ### Unit Testing
179
+ - Every module must have corresponding tests.
180
+ - Use appropriate test setup/teardown mechanisms (e.g., fixtures, beforeEach/afterEach).
181
+ - Mock external dependencies.
182
+ - Test both success and failure cases.
183
+
184
+ ### Integration Testing
185
+ - Test complete user flows
186
+ - Verify database transactions
187
+ - Test authentication and authorization
188
+ - Check form submissions
189
+
190
+ ### Mobile Testing
191
+ - Test on actual iPhone when possible
192
+ - Use Safari developer tools
193
+ - Test touch interactions
194
+ - Verify responsive layouts
195
+ - Check performance on 3G/4G
196
+
197
+ ## Code Review Process
198
+
199
+ ### Self-Review Checklist
200
+ Before requesting review:
201
+
202
+ 1. **Functionality**
203
+ - Feature works as specified
204
+ - Edge cases handled
205
+ - Error messages are user-friendly
206
+
207
+ 2. **Code Quality**
208
+ - Follows style guide
209
+ - DRY principle applied
210
+ - Clear variable/function names
211
+ - Appropriate comments
212
+
213
+ 3. **Testing**
214
+ - Unit tests comprehensive
215
+ - Integration tests pass
216
+ - Coverage adequate (>80%)
217
+
218
+ 4. **Security**
219
+ - No hardcoded secrets
220
+ - Input validation present
221
+ - SQL injection prevented
222
+ - XSS protection in place
223
+
224
+ 5. **Performance**
225
+ - Database queries optimized
226
+ - Images optimized
227
+ - Caching implemented where needed
228
+
229
+ 6. **Mobile Experience**
230
+ - Touch targets adequate (44x44px)
231
+ - Text readable without zooming
232
+ - Performance acceptable on mobile
233
+ - Interactions feel native
234
+
235
+ ## Commit Guidelines
236
+
237
+ ### Message Format
238
+ ```
239
+ <type>(<scope>): <description>
240
+
241
+ [optional body]
242
+
243
+ [optional footer]
244
+ ```
245
+
246
+ ### Types
247
+ - `feat`: New feature
248
+ - `fix`: Bug fix
249
+ - `docs`: Documentation only
250
+ - `style`: Formatting, missing semicolons, etc.
251
+ - `refactor`: Code change that neither fixes a bug nor adds a feature
252
+ - `test`: Adding missing tests
253
+ - `chore`: Maintenance tasks
254
+
255
+ ### Examples
256
+ ```bash
257
+ git commit -m "feat(auth): Add remember me functionality"
258
+ git commit -m "fix(posts): Correct excerpt generation for short posts"
259
+ git commit -m "test(comments): Add tests for emoji reaction limits"
260
+ git commit -m "style(mobile): Improve button touch targets"
261
+ ```
262
+
263
+ ## Definition of Done
264
+
265
+ A task is complete when:
266
+
267
+ 1. All code implemented to specification
268
+ 2. Unit tests written and passing
269
+ 3. Code coverage meets project requirements
270
+ 4. Documentation complete (if applicable)
271
+ 5. Code passes all configured linting and static analysis checks
272
+ 6. Works beautifully on mobile (if applicable)
273
+ 7. Implementation notes added to `plan.md`
274
+ 8. Changes committed with proper message
275
+ 9. Git note with task summary attached to the commit
276
+
277
+ ## Emergency Procedures
278
+
279
+ ### Critical Bug in Production
280
+ 1. Create hotfix branch from main
281
+ 2. Write failing test for bug
282
+ 3. Implement minimal fix
283
+ 4. Test thoroughly including mobile
284
+ 5. Deploy immediately
285
+ 6. Document in plan.md
286
+
287
+ ### Data Loss
288
+ 1. Stop all write operations
289
+ 2. Restore from latest backup
290
+ 3. Verify data integrity
291
+ 4. Document incident
292
+ 5. Update backup procedures
293
+
294
+ ### Security Breach
295
+ 1. Rotate all secrets immediately
296
+ 2. Review access logs
297
+ 3. Patch vulnerability
298
+ 4. Notify affected users (if any)
299
+ 5. Document and update security procedures
300
+
301
+ ## Deployment Workflow
302
+
303
+ ### Pre-Deployment Checklist
304
+ - [ ] All tests passing
305
+ - [ ] Coverage >80%
306
+ - [ ] No linting errors
307
+ - [ ] Mobile testing complete
308
+ - [ ] Environment variables configured
309
+ - [ ] Database migrations ready
310
+ - [ ] Backup created
311
+
312
+ ### Deployment Steps
313
+ 1. Merge feature branch to main
314
+ 2. Tag release with version
315
+ 3. Push to deployment service
316
+ 4. Run database migrations
317
+ 5. Verify deployment
318
+ 6. Test critical paths
319
+ 7. Monitor for errors
320
+
321
+ ### Post-Deployment
322
+ 1. Monitor analytics
323
+ 2. Check error logs
324
+ 3. Gather user feedback
325
+ 4. Plan next iteration
326
+
327
+ ## Continuous Improvement
328
+
329
+ - Review workflow weekly
330
+ - Update based on pain points
331
+ - Document lessons learned
332
+ - Optimize for user happiness
333
+ - Keep things simple and maintainable