Recipe: Data analysis with project-state

April 28, 2026recipedata-analysistutorialmilestonesdecision-log
# Recipe: Data analysis with project-state

Data analysis projects have a shape that most PM tools handle badly. The work isn't a linear task list — it's an iterative cycle of data acquisition, cleaning, exploration, modelling, and insight delivery, with multiple stakeholders who want different things from the same analysis. project-state handles this well because it's structured around milestones and stakeholder reporting, not task boards.

Here's how to adapt it for a data analysis engagement.

## How data analysis maps to project-state concepts

| Data analysis concept | project-state concept |
|---|---|
| Analysis phases (acquire, clean, explore, model, deliver) | Phase preset |
| Dataset versions, model iterations | Milestones + technical_progress notes |
| Client, analyst team, exec sponsor | Stakeholder groups |
| Weekly analysis brief, final report | Reporting matrix entries |
| Scope change (new data source, new question) | Change register |
| Key analytical decisions (model choice, exclusion logic) | Decision log |
| Published findings, methodology notes | Document index |

## Step 1: Scaffold with a custom phase preset

```
ask claude: "scaffold a new v2 project, kind: research, phases: data-acquisition, data-cleaning, exploratory-analysis, modelling, insight-delivery"
```

Define gate criteria for each phase:

```yaml
phases:
  - name: data-acquisition
    gate_criteria:
      - all source datasets received and stored
      - data dictionary documented
      - access permissions confirmed for all team members

  - name: data-cleaning
    gate_criteria:
      - null/missing value audit complete
      - outlier policy documented and applied
      - cleaning log committed to project docs
      - clean dataset version locked (document index entry: status=approved)

  - name: exploratory-analysis
    gate_criteria:
      - EDA summary document approved by analyst lead
      - key hypotheses documented as decisions
      - at least one stakeholder review of preliminary findings

  - name: modelling
    gate_criteria:
      - model selection decision logged
      - validation approach documented
      - baseline model milestone complete

  - name: insight-delivery
    gate_criteria:
      - final report milestone complete
      - client review meeting conducted
      - all deliverables in document index (status=delivered)
```

These gate criteria become the checklist the agent evaluates when you ask "can we advance the phase?"

## Step 2: Set up stakeholders and the reporting matrix

A typical data analysis project has three stakeholder groups:

**Analyst team** — the people doing the work. They need internal status: what's blocked, what decisions are pending, what the current model state is.

**Client / sponsor** — the people who commissioned the analysis. They need progress updates and access to the findings as they emerge.

**Exec / decision-maker** — the end consumer of insights. They need a clean, concise view of findings and recommendations, not methodology.

```yaml
entries:
  - stakeholder_group: analyst_team
    report_type: internal_status
    cadence: weekly
    format: slack_message
    surface: slack
    channel: "#analysis-[project-name]"

  - stakeholder_group: client
    report_type: progress_update
    cadence: biweekly
    format: email_draft
    surface: gmail

  - stakeholder_group: exec_sponsor
    report_type: findings_brief
    cadence: on_milestone
    trigger_milestones: ["eda-complete", "modelling-complete", "final-report"]
    format: email_draft
    surface: gmail
```

The `on_milestone` cadence is key here — the exec sponsor doesn't need weekly noise, just signal when something significant lands.

## Step 3: Define milestones around analytical outputs, not tasks

Milestones in data analysis should be analytical outputs, not work activities. "Clean dataset" not "clean the data". "EDA complete" not "run exploratory analysis".

```
ask claude: "add milestones:
  - Clean dataset v1, due [date], owner: data engineer, definition of done: clean dataset file versioned and documented in project docs
  - EDA summary, due [date], owner: lead analyst, definition of done: EDA document approved by team
  - Baseline model, due [date], owner: ML engineer, definition of done: baseline results documented with evaluation metrics
  - Model v1, due [date], owner: ML engineer, definition of done: model validated, assumptions documented
  - Final report, due [date], owner: project lead, definition of done: report delivered and accepted by client"
```

The `technical_progress` note on each milestone is where the analytical narrative lives:

```
ask claude: "update milestone clean-dataset-v1: 70% complete, technical progress: missing value treatment complete for main tables, working on date normalization across three source systems which have inconsistent timezone handling"
```

This note goes directly into the next status report. The client doesn't see the detail — but the analyst team brief does.

## Step 4: Log analytical decisions

Data analysis is full of decisions that need to be traceable: why a particular exclusion criterion was applied, why one model was chosen over another, why an outlier was treated a certain way. Log them as they happen:

```
ask claude: "log a decision: excluding records with NULL in [field] rather than imputing, rationale: imputation would introduce systematic bias in the low-income cohort, decided by: analyst team, date: today"
```

```
ask claude: "log a decision: using XGBoost rather than logistic regression, rationale: non-linear interactions between [var1] and [var2] were significant in EDA, decided by: ML lead, approved by: client"
```

When the client asks "why did you exclude those records?" three months later, the decision is in the log with full rationale, not lost in a Slack thread.

## Step 5: Use the change register for scope changes

Scope changes in data analysis are common and dangerous. A new data source mid-project. A new question the client wants answered. A change in the target variable definition. These are material changes that need to be logged and approved.

```
ask claude: "log a change: client wants to add [new_datasource] to the analysis pipeline, classify it"
```

The change register classifies it (material — this expands scope and timeline) and creates a change record. The next status report to the client mentions it as a pending change request. Nothing moves until the change is approved and logged.

## Step 6: Deliver findings through the document index

As deliverables are produced — EDA summaries, model documentation, final reports — register them in the document index:

```
ask claude: "add document: EDA Summary v1.2, type: analytical-report, file: docs/eda-summary-v1.2.pdf, status: under-review, description: exploratory analysis covering [scope], author: [name]"
```

The document index tracks the approval lifecycle: `draft` → `under-review` → `approved` → `delivered`. Phase gate criteria can check document status — "can't advance to modelling until EDA Summary is approved."

## The result

A data analysis project running on project-state has:

- Full decision traceability from day one
- Automatic status reports that don't require manual preparation
- Phase gates that enforce analytical rigor before advancing
- A change register that catches scope creep
- Stakeholder-appropriate reporting: analyst brief, client update, exec findings brief
- A document index that tracks every deliverable through its approval lifecycle

The analyst team focuses on the analysis. The system handles the reporting.