Infrastructure Deployment Workflow¶

Purpose: Orchestrate infrastructure deployment with validation gates at every phase

Composes: cluster-reality-check -> tracer-bullet -> phase-gate

Failure Patterns Prevented: 1, 3, 4, 5, 9, 11

Overview¶

This workflow ensures infrastructure deployments succeed by: 1. Validating assumptions before planning 2. Testing critical paths with tracer bullets 3. Gating each phase before proceeding 4. Capturing learnings for future improvements

Text Only

Phase R: Research with Reality Check
    ↓ (Gate: All APIs/images verified?)
Phase 0: Tracer Bullets
    ↓ (Gate: All bullets pass?)
Phase P: Planning
    ↓ (Gate: Plan approved?)
Phase I: Implementation with Gates
    ↓ (Gate: Each phase validated?)
Phase V: Validation & Retrospective

Prerequisites¶

Before starting this workflow:

Fresh context window (<20% used)
Target cluster accessible (oc whoami works)
Appropriate permissions (can create resources in namespace)
Clear understanding of what you're deploying
Time budget established (deployment can take 30min - 2h)

Phase R: Research with Reality Check¶

Goal: Understand what you're deploying AND validate assumptions against cluster

Time Budget: 20-40% of total time

Step R.1: Conduct Research¶

Markdown

# Use /research command
/research "Deploy [component] on OpenShift"

# Research should produce:
- Understanding of component architecture
- List of required APIs/CRDs
- List of required images
- List of operators needed
- Configuration requirements
- Known constraints/limitations

Step R.2: Extract Assumptions¶

From research findings, document:

YAML

assumed_apis:
  - <api-group>/<version>/<kind>
  - ...

planned_images:
  - <registry>/<image>:<tag>
  - ...

operators:
  - <operator-name>
  - ...

configuration_requirements:
  - <requirement-1>
  - ...

Step R.3: Invoke cluster-reality-check¶

Markdown

# Invoke skill with extracted assumptions
cluster-reality-check:
  assumed_apis: [from step R.2]
  planned_images: [from step R.2]
  operators: [from step R.2]
  namespace: <target-namespace>

Step R.4: Invoke divergence-check¶

Markdown

# If research used external documentation
divergence-check:
  upstream_source: <documentation-url>
  local_environment: "OpenShift <version>"
  features_to_verify: [from research]

Gate R: Research Complete?¶

Check	Status
Research findings documented	[ ]
All APIs verified exist	[ ]
All images verified pullable	[ ]
All operators verified ready	[ ]
No HIGH severity divergences	[ ]
Required adjustments documented	[ ]

Decision: - All checks pass -> Proceed to Phase 0 - Any check fails -> Continue research with new constraints

Phase 0: Tracer Bullets¶

Goal: Validate critical assumptions with minimal deployments before full planning

Time Budget: 10-15% of total time

Step 0.1: Identify Critical Assumptions¶

From research, identify assumptions that if wrong, would invalidate the entire plan:

Markdown

Critical Assumptions:
1. [Assumption]: [Impact if wrong]
2. [Assumption]: [Impact if wrong]
3. [Assumption]: [Impact if wrong]

Common critical assumptions: - Operator accepts expected API version - Image can be pulled and runs - Admission webhooks accept configuration - Storage class works as expected - Network policies allow required traffic

Step 0.2: Fire Tracer Bullets¶

For each critical assumption, invoke tracer-bullet skill:

Markdown

# Tracer Bullet 1: API Version
tracer-bullet:
  assumption: "EDB accepts postgresql.k8s.enterprisedb.io/v1"
  minimal_spec: |
    apiVersion: postgresql.k8s.enterprisedb.io/v1
    kind: Cluster
    metadata:
      name: tracer-api-test
    spec:
      instances: 1
      storage:
        size: 1Gi
  success_criteria: "Cluster reaches Ready state"
  timeout: 120s
  cleanup: true

# Tracer Bullet 2: Image Pull
tracer-bullet:
  assumption: "dify-api image is pullable"
  minimal_spec: |
    apiVersion: v1
    kind: Pod
    metadata:
      name: tracer-image-test
    spec:
      containers:
        - name: test
          image: langgenius/dify-api:0.11.1
          command: ["sleep", "10"]
      restartPolicy: Never
  success_criteria: "Pod reaches Running state"
  timeout: 60s
  cleanup: true

Step 0.3: Analyze Results¶

Document tracer bullet outcomes:

Markdown

Tracer Bullet Results:
| Bullet | Assumption | Result | Evidence |
|--------|------------|--------|----------|
| 1 | EDB API v1 | PASS | Cluster Ready in 45s |
| 2 | dify-api image | FAIL | ImagePullBackOff |
| 3 | ... | ... | ... |

Gate 0: All Tracer Bullets Pass?¶

Decision: - All PASS -> Proceed to Phase P - Any FAIL -> Return to Phase R with findings

Markdown

# If bullet fails:
Failed Assumption: [assumption]
Evidence: [error message, events]
Required Action: [what must change]

# Return to research to find alternative approach

Phase P: Planning¶

Goal: Create detailed implementation plan with validated assumptions

Time Budget: 15-25% of total time

Step P.1: Create Plan¶

With validated assumptions, create implementation plan:

Markdown

# Use /plan command
/plan [component] deployment

# Plan should include:
- Per-phase breakdown
- Exact file:line specifications
- Validation commands per phase
- Rollback procedure
- Success criteria

Step P.2: Include Phase Validation¶

Every phase in plan MUST have:

YAML

phase_N:
  name: "[Phase Name]"
  resources:
    - <resource-1>
    - <resource-2>
  validation_commands:
    - "<command-1>"
    - "<command-2>"
  success_criteria:
    - "<criteria-1>"
    - "<criteria-2>"
  rollback:
    - "<rollback-command>"

Step P.3: Human Review¶

Present plan for approval:

Markdown

## Plan Summary

**Component:** [what]
**Target:** [where]
**Phases:** [how many]
**Estimated Time:** [duration]

### Phase Breakdown
| Phase | Name | Resources | Validation |
|-------|------|-----------|------------|
| 1 | ... | ... | ... |
| 2 | ... | ... | ... |

### Risk Assessment
- [Risk 1]: [Mitigation]
- [Risk 2]: [Mitigation]

### Rollback Strategy
[How to undo if needed]

---
Approve? (yes/no/revise)

Gate P: Plan Approved?¶

Decision: - Approved -> Proceed to Phase I - Revise -> Update plan, re-present - Rejected -> Return to research

Phase I: Implementation with Gates¶

Goal: Execute plan with validation after every phase

Time Budget: 30-40% of total time

Implementation Loop¶

Text Only

For each phase in plan:
  1. Implement phase resources
  2. Invoke phase-gate skill
  3. If PASS: Commit, continue
  4. If FAIL: Stop, debug

Step I.N: Implement Phase N¶

Bash

# Create/modify resources as specified in plan
# Example:
oc apply -f phase-N-resources.yaml

Step I.N+1: Phase Gate¶

Markdown

# Invoke phase-gate skill
phase-gate:
  phase_number: N
  phase_name: "[Name from plan]"
  resources_deployed: [list from plan]
  validation_commands: [from plan]
  success_criteria: [from plan]
  rollback_procedure: [from plan]

Step I.N+2: Commit or Stop¶

If phase-gate PASS:

Bash

git add .
git commit -m "phase N: [description]"
# Continue to phase N+1

If phase-gate FAIL:

Markdown

STOP. Do not proceed.

Failure Details:
- Phase: N
- Failed Validation: [which]
- Error: [message]
- Evidence: [events, logs]

Options:
1. Fix issue, re-validate phase N
2. Rollback phase N, investigate
3. Return to planning with findings

Gate I: All Phases Complete?¶

Continue loop until all phases pass.

Phase V: Validation & Retrospective¶

Goal: Final validation and learning capture

Time Budget: 10-15% of total time

Step V.1: Full Validation Suite¶

Run comprehensive validation:

Bash

# Syntax validation
make ci-all

# Resource validation
oc get all -n <namespace>

# Health checks
curl -f http://<service>/health

# Functional test
[component-specific tests]

Step V.2: Rollback Test¶

Verify rollback procedure works:

Bash

# Test rollback of last phase
git revert HEAD --no-commit

# Verify resources can be removed
oc delete -f phase-N-resources.yaml --dry-run=server

# Abort revert (we're just testing)
git checkout .

Step V.3: Run Retrospective¶

Markdown

# Invoke /retro command
/retro infrastructure-deployment

# Capture:
- What went well
- What diverged from plan
- Unexpected issues encountered
- Skills/workflows that helped
- Skills/workflows that were missing
- Learnings for next deployment

Step V.4: Document Deployment¶

Create deployment documentation:

Markdown

# [Component] Deployment

**Deployed:** YYYY-MM-DD
**Target:** <cluster>/<namespace>
**Version:** <version>

## Architecture
[Diagram or description]

## Components
- [Component 1]: [Status]
- [Component 2]: [Status]

## Access
- URL: <route>
- Credentials: <secret-reference>

## Operations
- Health check: <command>
- Logs: <command>
- Restart: <command>

## Known Issues
- [Issue]: [Workaround]

Gate V: Deployment Complete?¶

Check	Status
Full validation passes	[ ]
Rollback tested	[ ]
Retrospective complete	[ ]
Documentation created	[ ]
Learnings captured	[ ]

Decision: - All pass -> Deployment complete - Any fail -> Address before declaring done

Failure Handling¶

Research Phase Failure¶

Markdown

Symptom: Reality check shows divergences
Action:
1. Document divergences
2. Continue research for alternatives
3. Update assumptions
4. Re-run reality check
5. Repeat until all validated

Tracer Bullet Failure¶

Markdown

Symptom: Critical assumption invalid
Action:
1. Document failure evidence
2. Identify root cause
3. Return to research phase
4. Find alternative approach
5. Create new tracer bullet
6. Repeat until pass

Phase Gate Failure¶

Markdown

Symptom: Phase validation fails
Action:
1. STOP immediately
2. Capture state (events, logs)
3. Identify root cause
4. Options:
   a. Fix issue, re-validate
   b. Rollback phase, investigate
   c. Return to planning
5. Never proceed with failing gate

Implementation Rollback¶

Markdown

# If rollback needed:
1. Identify rollback point (which phase)
2. Execute rollback commands from plan
3. Verify resources removed
4. Document what happened
5. Return to appropriate phase

Time Budget Guidelines¶

Phase	Simple Deploy	Medium Deploy	Complex Deploy
R (Research)	10 min	30 min	60 min
0 (Tracer)	5 min	15 min	30 min
P (Plan)	10 min	20 min	40 min
I (Implement)	15 min	45 min	90 min
V (Validate)	5 min	15 min	30 min
Total	45 min	2 hours	4 hours

Integration Points¶

Skills Used¶

Phase	Skills
R	cluster-reality-check, divergence-check
0	tracer-bullet
P	(none - Claude orchestration)
I	phase-gate
V	(none - validation commands)

Commands Used¶

Phase	Commands
R	/research
P	/plan
V	/retro

assumption-validation - Subset of Phase R
post-work-retro - Detailed version of Phase V

Example: Dify Deployment¶

Phase R Summary¶

Text Only

Research: Dify multi-container application
Reality Check: 2 HIGH divergences (images, volumes)
Divergence Check: Docker -> OpenShift translation needed
Gate R: FAIL - Must resolve image and volume issues

Phase 0 Summary¶

Text Only

Tracer 1: EDB database cluster - PASS
Tracer 2: Redis pod - PASS
Tracer 3: Dify API image - FAIL (signature policy)
Gate 0: FAIL - Return to research for image solution

Phase R (Iteration 2)¶

Text Only

Solution: Mirror images to internal registry
Reality Check: All images now pullable
Gate R: PASS

Phase 0 (Iteration 2)¶

Text Only

Tracer 3 (retry): Dify API image - PASS
Gate 0: PASS

Phase P Summary¶

Text Only

Plan: 4 phases
- Phase 1: Namespace and secrets
- Phase 2: Database (EDB cluster)
- Phase 3: Supporting services (Redis, Weaviate)
- Phase 4: Application deployments
Human Review: APPROVED
Gate P: PASS

Phase I Summary¶

Text Only

Phase 1: Namespace/secrets - Gate PASS
Phase 2: Database - Gate PASS (cluster ready in 90s)
Phase 3: Supporting services - Gate PASS
Phase 4: Application - Gate FAIL (quota exceeded)
  -> Fix: Request quota increase
  -> Retry Phase 4 - Gate PASS
Gate I: PASS (all phases complete)

Phase V Summary¶

Text Only

Full validation: PASS
Rollback test: PASS
Retrospective: Completed
Documentation: Created
Gate V: PASS - Deployment complete

Success Criteria¶

Infrastructure deployment is successful when:

Remember: Every gate exists to catch problems early. A failing gate is not a failure - it's the workflow working as designed. The failure would be proceeding despite a failing gate.

Infrastructure Deployment Workflow¶

Overview¶

Prerequisites¶

Phase R: Research with Reality Check¶

Step R.1: Conduct Research¶

Step R.2: Extract Assumptions¶

Step R.3: Invoke cluster-reality-check¶

Step R.4: Invoke divergence-check¶

Gate R: Research Complete?¶

Phase 0: Tracer Bullets¶

Step 0.1: Identify Critical Assumptions¶

Step 0.2: Fire Tracer Bullets¶

Step 0.3: Analyze Results¶

Gate 0: All Tracer Bullets Pass?¶

Phase P: Planning¶

Step P.1: Create Plan¶

Step P.2: Include Phase Validation¶

Step P.3: Human Review¶

Gate P: Plan Approved?¶

Phase I: Implementation with Gates¶

Implementation Loop¶

Step I.N: Implement Phase N¶

Step I.N+1: Phase Gate¶

Step I.N+2: Commit or Stop¶

Gate I: All Phases Complete?¶

Phase V: Validation & Retrospective¶

Step V.1: Full Validation Suite¶

Step V.2: Rollback Test¶

Step V.3: Run Retrospective¶

Step V.4: Document Deployment¶

Gate V: Deployment Complete?¶

Failure Handling¶

Research Phase Failure¶

Tracer Bullet Failure¶

Phase Gate Failure¶

Implementation Rollback¶

Time Budget Guidelines¶

Integration Points¶

Skills Used¶

Commands Used¶

Related Workflows¶

Example: Dify Deployment¶

Phase R Summary¶

Phase 0 Summary¶

Phase R (Iteration 2)¶

Phase 0 (Iteration 2)¶

Phase P Summary¶

Phase I Summary¶

Phase V Summary¶

Success Criteria¶