HOW WE WORK

One process.
Four stages. Your decision at every gate.

We don't sell software and leave. We scope one process, build and evaluate it against your real data, deploy it in your environment, and operate it in production. Fixed scope. Clear success criteria. Average time from scope to production: 30-60 days.

Talk to us

See the scope example →

SCOPE

up to 1 week

BUILD & EVAL

1-3 weeks

DEPLOY

1 week

OPERATE

Weekly, ongoing

Each gate is a real decision point. If the answer is no, we stop - without sunk cost.

WHY THIS MATTERS

Most automation projects die in implementation. Ours don't reach implementation if they shouldn't.

The standard enterprise automation playbook is to sell a platform, sign a multi-year contract, and figure out the actual work in a six-month "implementation phase" - by which point the budget is committed, the project is political, and stopping costs more than continuing. We run it the other way around. We figure out whether your process fits *before* you commit. We prove the agent works *against your data* before we deploy. We give you a real "no" option at every stage.

What most vendors do

Sell the platform. Promise implementation. Discover scope mid-build. Push for go-live regardless of evaluation results. Hand you the keys and walk away after deployment.

What we don't do

Sell software you have to integrate yourself. Estimate timelines we can't defend. Deploy before evaluation passes. Disappear after go-live.

What we do instead

Scope one process. Build and test against your data. Show you the evaluation report. Deploy when you approve. Operate the process in production. Review weekly.

STAGE 1 - SCOPE

One working session with the process owner.

We map the workflow end-to-end: where work enters, what decisions happen, which systems are involved, what “done” looks like, what the failure modes are, where humans need to stay in the loop. You leave the session with a written scope document and agreed success metrics.

SCOPE INTELLIGENCE DOSSIER

Invoice Processing Automation / Acme Corp / Stage 1

GATE_01 / FIT ANALYSIS ACTIVE

SCOPING CONFIDENCE

92.7 / 100

HIGH FIT

PROCESS FLOW MAP

Inbound invoices -> AP review -> GL coding -> approval routing -> NetSuite posting

WORKLOAD FINGERPRINT

Monthly Volume

~3,200 invoices

Manual Time

8 min / invoice

Workload

~430h / month

DECISION LOGIC + HITL

- PO match: auto-approve below $5K

- PO mismatch: route to AP analyst

- > $10K: manager approval (HITL)

- > $50K: CFO approval (HITL)

GATE 01 / PROCEED AUTHORIZATION

[ ] Scope confirmed by Process Owner

[ ] Success metrics signed

[ ] Data access granted for eval

Proceed to Stage 2? [ ] YES [ ] NO

STAGE 1: Gate - does the process fit?

If the process is too custom, too varied, or fundamentally not a fit for agent automation - we say so here. No proposal, no rolling into Stage 2, no sunk cost. A meaningful share of scoping conversations end at this gate. The conversations end honestly.

TIME

Typically completed within one week - sometimes a single working session.

WHO'S INVOLVED

Process owner, one operational lead, one IT/security representative.

WHAT YOU WALK AWAY WITH

A written scope document. Agreed success metrics. A go/no-go decision.

STAGE 2 - BUILD & EVALUATE

We build it. We test it against your real data. You see the numbers before anything ships.

We configure the agent workflow on our platform and connect it to your systems through controlled access. Then we run it against historical data you provide - real cases, not synthetic ones. The evaluation report gives you per-field accuracy, edge case coverage, failure modes, and tone scoring (for customer-facing agents). You set the threshold. We don't deploy until we hit it.

EVALUATION LAB REPORT / v3

142 real cases / live-grade benchmark / synthetic data: none

MODEL TRACEABILITY: VERIFIED

94.2% weighted accuracy

Target: 92.0%

+2.2 margin

PO-matched

96.8%

PASS

Non-PO

88.4%

WARN

Multilingual

79.1%

FAIL

EDGE CASE SURFACE

PASS Multi-page attachments handled

PASS FX invoices handled

WARN Handwritten notes -> HITL

WARN Vendor aliases normalized 92%

FAIL Non-English invoices below threshold

ACTION QUEUE

1) Route multilingual to human until v4

2) Raise HITL confidence cutoff to 88%

3) Add 12 Spanish historical docs

4) Re-run benchmark after retraining

GATE 02 / DEPLOY DECISION

Primary path clears threshold. Multilingual path requires controlled fallback.

Approve Stage 3? [ ] YES [ ] NO [ ] ITERATE

STAGE 2: Gate - does it meet your bar?

You set the accuracy threshold during scope. We run the eval. If we don’t hit threshold, we iterate or scope the fallback. If we can’t hit threshold even with iteration, we tell you what we can deliver - and you decide whether that’s worth deploying.

TIME

1–3 weeks depending on data complexity and integration depth.

WHO'S INVOLVED

Process owner (reviews report), risk/compliance lead (reviews failure modes), IT (validates integration access).

WHAT YOU WALK AWAY WITH

A written evaluation report. Threshold pass/fail status. Recommendations for any sub-threshold paths. A go/no-go decision.

The same evaluation framework runs continuously after deploy. See Evaluation Framework →

STAGE 3 - DEPLOY

Production push in your environment. Not ours.

Cloud, hybrid, or full on-premise - depending on where your data has to live. Human-in-the-loop checkpoints wherever your risk team requires them. Audit trail from minute one. Rollback available in one click. We provide the deployment runbook; your team controls the go-live.

PRODUCTION DEPLOY COMMAND CENTER

Runbook build: v3 / Environment lock: enabled / Go-live authority: client-side

ROLLBACK GUARANTEE: <60s

GO-LIVE READINESS SCORE

97 / 100

READY

PRE-DEPLOY CHECKS

PASS Evaluation threshold met

PASS Security review complete

PASS HITL routes configured

PASS SIEM export wired

PASS Rollback test passed

ENVIRONMENT MATRIX

Target

Acme AWS (eu-west-1)

Runtime

AgentX in client VPC

Data Plane

On-prem

LLM Provider

Claude 4.6 via client contract

TRAFFIC RAMP SEQUENCE

T-0 Deploy to prod at 10% traffic

T+24h Live metrics review / scale decision

T+72h Expand to 50% on-track

T+1wk Move to 100% if stable

FAILSAFE / ROLLBACK

Trigger: accuracy < 90% on rolling 100 invoices

Action: one-click revert to manual v2 flow

Execution SLA: < 60 seconds

Stage 3 Gate - go-live decision is yours

We don’t push to production. You do. We provide the runbook, the eval data, the rollback plan, and the security review. The “deploy” button is pressed by your team with our support, not the other way around.

TIME

Typically 1 week from eval approval to full production traffic.

WHO'S INVOLVED

Process owner, IT, security, risk/compliance, operations lead.

WHAT YOU WALK AWAY WITH

A production deployment. Audit trail from day one. A documented rollback plan. Monitoring dashboards.

Cloud, hybrid, or on-prem? See Deployment options →

STAGE 4 - OPERATE

We run the process. You watch the dashboard.
We meet weekly.

This is where most automation vendors disappear. We don’t. After go-live, we operate the deployed process - throughput, accuracy, exception handling, drift monitoring, version updates. Your team gets a dashboard with the metrics that matter. We meet weekly to review what’s running, what’s changed, and what needs to change next.

OPERATIONS DASHBOARD - LAST 30 DAYS

Throughput, accuracy, exception handling, and processing time in production

Throughput

3,247 invoices

UP 1.5% vs prior period

Accuracy

95.1% rolling 7d

Target 92.0% - above threshold

Exception rate

4.2% routed to human

DOWN 0.8% vs prior period

Avg processing time

1.4 min/invoice

Target 2.0 min - below target

ACCURACY TREND (30 DAYS)

Apr 18 dip: Vendor sent new invoice format. Detected within 4 hours, evaluation re-run, agent updated.

WEEKLY REVIEW

NEXT REVIEW: Tue May 14 - 10:00 ET

Attendees: S. Kim (Acme), M. Torres (Acme), AgentX delivery lead

- Throughput and accuracy review (15 min)

- Exception patterns this week (10 min)

- Upcoming: new vendor onboarding (15 min)

WHAT CHANGES AFTER GO-LIVE

A new document type arrives - we add it

A vendor changes their format - we detect and adapt

A rule changes (tax law, internal policy) - we update the workflow

A system migrates (new ERP) - we re-integrate

Accuracy drifts - we re-evaluate and fix before it becomes a problem

WHAT STAYS CONSTANT

The agreed success metrics from Stage 1

The HITL checkpoints from Stage 2

The audit trail from Stage 3

The weekly review cadence

The price (no hidden change requests)

EVERY GATE IS REAL

Stopping is a feature, not a failure.

Most enterprise software vendors are economically incentivized to push you through every stage. We're incentivized to stop early when we should. A scoping conversation that ends at the Stage 1 gate is a successful conversation - for both sides. A pilot that ends at the Stage 2 gate is a successful pilot.

STOP AT STAGE 1

Process doesn’t fit.

Too custom, too varied, no clear “done” state, or fundamentally not an agent automation problem. We tell you. No proposal. No effort to convince you otherwise. We may suggest a different problem in your operation that would fit - or we don’t.

STOP AT STAGE 2

Evaluation can’t hit threshold.

Either the data is too messy, the edge cases too varied, or the underlying process needs to change before automation works. You see the eval report. We tell you what we can deliver, what we can’t, and what would need to change. You decide whether to scope a narrower workflow, iterate, or stop entirely.

PAUSE AT STAGE 3

Security or compliance blocker.

Threshold met in eval, but a security or IT or compliance issue surfaces during pre-deploy review. Deploy pauses. We resolve the blocker - or document explicitly why it can’t be resolved - before go-live.

WHAT WE CHECK

The questions we answer before each gate.

Each gate has explicit decision criteria - not vibes. Here are the checklists we use internally and share with you during the working sessions.

Stage 1 Gate - Scope fit
☐ Process has a defined start and end
☐ Volume justifies automation
☐ Data exists and is accessible
☐ Stakeholder owns the decision
☐ Risk tolerance is clear
Duration
≤ 1 week
Stage 2 Gate - Threshold met
☐ Cost per case approved
☐ Latency within operational bounds
☐ Human-in-the-loop routing agreed
☐ Edge cases documented, not hidden
☐ Accuracy at or above agreed threshold
Duration
4-8 weeks
Stage 3 Gate - Go live
☐ Security review complete
☐ Rollback plan documented
☐ Monitoring dashboards live
☐ IT and compliance sign-off
☐ Team trained on override
Duration
2-4 weeks
Stage 4 - Ongoing health
☐ Weekly review held
☐ Accuracy drift within tolerance
☐ New edge cases logged and triaged
☐ Override rate tracked
☐ Expansion or exit decision on the table
Duration
Ongoing

TIMELINE

Average scope-to-production: 30-60 days.

This is what a typical engagement looks like. Faster is possible for narrow scopes. Longer happens when integrations are deep or data preparation is required.

SCOPE - up to 1 week

Initial conversation, process mapping, feasibility, data access check. Output: go/no-go and scope document.

≤ 1 week

BUILD & EVALUATE - 4-8 weeks

Agent development, evaluation harness, structured testing against real data. Output: eval report, accuracy vs. threshold.

4-8 weeks

DEPLOY - 2-4 weeks

Infrastructure, security review, IT integration, runbook, controlled rollout, monitoring dashboards live. Output: agent in production.

2-4 weeks

OPERATE - ongoing

Weekly reviews, performance monitoring, drift detection, model updates, expansion planning. Monthly retainer, cancel at any cycle.

Ongoing

FASTER

What compresses the timeline

• Clean, accessible data from day one

• A single process owner with decision authority

• IT and security engaged early

• Threshold set before build starts

LONGER

What extends the timeline

• Data that requires significant cleaning or access negotiation

• Multiple stakeholders without a single decision owner

• IT or security reviews scheduled late

• Threshold not agreed before build starts

EXPLICIT LIMITS

The things we say no to.

We're explicit about the engagements that don't fit our model. Naming them protects both sides from a bad fit.

We don't do "discovery as the deliverable"

If you need a consultant to identify automation opportunities across your operation, we're not the right vendor. We work on one defined process at a time.

We don't do unbounded custom builds

If the engagement requires custom-coded workflows that won't replicate to any other customer, we're not the right vendor. Our model is repeatable process automation, not bespoke development.

We don't do "AI exploration"

If the engagement is "let's see what AI can do for us," we're not the right vendor. We work on processes with measurable outcomes, decided up front.

If your need is one of the above, we know vendors who do that work well. We'll tell you who.

The other enterprise deep dives.

This page covers the delivery model. The other enterprise pages cover what your security, compliance, risk, and IT teams will want to evaluate.

Evaluation Framework

How we test before deploy and monitor after. The same evaluation engine that gates Stage 2.

Read the framework →

SECURITY

SOC 2-aligned controls. RBAC. Audit. Workspace isolation. The detail your security team will want.

Read the overview →

AI Governance

Model risk, explainability, HITL design, regulatory framework alignment (SR 11-7, DORA, EU AI Act, MAS, HKMA).

Read the policy →

DEPLOYMENT

Cloud, hybrid, on-prem. EU data residency. The detail your IT and infra teams will want.

Read the options →

GET STARTED

Ready to see if it fits?

Start with a scoping conversation. No commitment. No proposal until we’ve both agreed there’s a fit. If there isn’t, we’ll tell you that too.

Talk to us

Read the security overview

Start Your AI Automation Journey Today

Get Started - Free

View Pricing

One process. Four stages. Your decision at every gate.