What is forward-deployed AI engineering?

It is embedded AI implementation work for revenue teams: identifying high-value GTM workflows, connecting real business systems and data, building production AI workflows, measuring quality, and helping the team operate the system.

What kinds of workflows do you build?

Common workflows include lead enrichment and scoring, routing, CRM updates from conversations, sales research, personalized outreach drafts, support triage, customer intelligence, and internal revenue operations automation.

Can you work with frontier models, open-source models, and n8n?

Yes. The work can include OpenAI and Anthropic models, Claude Code, OpenAI Codex, eval systems, open-source or self-hosted model paths, and cloud-hosted or self-hosted n8n automation environments with training and operating playbooks.

How much does it cost?

The AI GTM Deployment Diagnostic is typically $10k-$15k. Build Sprints are typically $40k-$85k. Embedded AI Revenue Systems retainers are typically $15k-$30k per month with a three month minimum.

Do you build the systems yourself?

Yes. The work is hands-on and embedded. I design, prototype, build, evaluate, and deploy the first workflows, then leave your team with the operating pattern and playbooks to keep scaling.

What kinds of companies are a fit?

The best fit is a B2B company with a meaningful GTM team, messy CRM/customer data, and real pressure to turn AI experiments into production revenue workflows.

Build a data cleaner that stops your GTM workflows from eating garbage

This post is part of the GTM Automation Playbook — a 13-part series on building AI-powered GTM agents with HubSpot.

Every workflow I've built in this series depends on CRM data being at least roughly correct. The ICP scorer reads industry and employee count. The personalization writer pulls job titles and company descriptions. When that data is inconsistent or missing, the outputs range from slightly off to completely useless.

I watched an SDR send a personalized email referencing "Acme, Inc." when the prospect's company had already rebranded six months prior. The data was stale, the opener made no sense, and the thread died on read. That's the tax you pay on dirty data. Not one catastrophic failure, but hundreds of small ones compounding across every automated touchpoint.

What dirty actually means

It's rarely dramatic. The problems are mundane. Phone numbers stored as (555) 867-5309, 5558675309, +1 555-867-5309, and 555.867.5309 across four contacts at the same company. Company names entered as "HubSpot", "Hubspot, Inc.", "hubspot inc", and "HUBSPOT". Job titles left blank on 40% of records because the import skipped that column.

These inconsistencies break filters, duplicate detection, and list segmentation. Two records for the same company show up as separate accounts because one has "Inc." appended. Your routing workflow splits them to different reps.

The n8n cleanup workflow

I run this nightly. Six nodes, no AI required. Pure data hygiene.

Node 1: Schedule Trigger. Set to run at 2 AM. Off-peak keeps you well under HubSpot's 100-requests-per-10-seconds rate limit. The workflow processes contacts updated in the last 24 hours, so each run handles only the delta.

Node 2: HTTP Request, pull contacts. Hit POST /crm/v3/objects/contacts/search to fetch contacts where lastmodifieddate is greater than yesterday. Pull phone, company, jobtitle, firstname, lastname, email. Cap the batch at 100 records per request and loop if needed using the after cursor from the response's paging.next object.

Node 3: Code node, normalize phones. Strip everything that isn't a digit, then apply E.164 formatting. For US/CA numbers, that means prepending +1 if the result is 10 digits. For international numbers, you'll need a country code lookup based on the contact's country property. Here's the core logic:

for (const item of $input.all()) {
  let phone = (item.json.phone || '').replace(/\D/g, '');
  if (phone.length === 10) {
    phone = '+1' + phone;
  } else if (phone.length === 11 && phone.startsWith('1')) {
    phone = '+' + phone;
  }
  item.json.phone_normalized = phone || null;
}
return $input.all();

Node 4: Code node, standardize company names. Strip legal suffixes, normalize whitespace, apply title case. This makes downstream dedup matching far more reliable.

const suffixes = /,?\s*(Inc\.?|LLC|Ltd\.?|Corp\.?|Co\.?|Limited|Incorporated)$/i;
for (const item of $input.all()) {
  let name = (item.json.company || '').trim();
  name = name.replace(suffixes, '').trim();
  name = name.replace(/\s+/g, ' ');
  name = name.split(' ').map(w =>
    w.charAt(0).toUpperCase() + w.slice(1).toLowerCase()
  ).join(' ');
  item.json.company_normalized = name || null;
}
return $input.all();

Node 5: IF node, flag enrichment gaps. Route contacts missing jobtitle or phone to a separate branch. On that branch, add them to a HubSpot static list called "Needs Enrichment" via POST /crm/v3/lists/{listId}/memberships/add. Your enrichment tool (Clay, Clearbit, or a manual process) picks up from there. Contacts with complete data continue to the write-back node.

Node 6: HTTP Request, write back. PATCH /crm/v3/objects/contacts/{contactId} with the normalized phone and company name. Use the batch update endpoint (POST /crm/v3/objects/contacts/batch/update) if you're processing more than 10 records per run. Batch accepts up to 100 records per call and cuts your API usage dramatically.

Catching soft duplicates

After normalization, run a simple comparison. If two contacts share the same company_normalized value and one was created in the last 24 hours, send a Slack notification to your ops team with both record IDs. Don't auto-merge. Automated merging sounds efficient until it combines a current champion with a churned contact from 2019 and overwrites the lifecycle stage. Flag it, let a human decide.

HubSpot's built-in duplicate manager catches exact matches. It misses "Hubspot Inc" vs "HubSpot" because it doesn't normalize before comparing. Your workflow does.

What "clean enough" looks like

You don't need perfection. You need consistency. I define clean enough as: phone numbers in E.164 format, company names stripped of suffixes and casing variations, no contact missing both job title and phone, and soft duplicates flagged within 24 hours of creation.

Track it with a HubSpot report. Create an active list filtered to contacts where phone doesn't start with "+" or jobtitle is empty. That list should trend toward zero over time. If it's growing, your inbound forms or import process is creating dirty records faster than the workflow cleans them. Fix the source.

The cleaner doesn't generate revenue on its own. It makes everything downstream more accurate. Your ICP scores improve because the industry field is consistent. Personalization actually lands because the company name is correct. Reps stop working the same account in parallel because duplicates get caught early. Data quality is the unglamorous foundation that determines whether your automation stack actually works.

Build a data cleaner that stops your GTM workflows from eating garbage

What dirty actually means

The n8n cleanup workflow

Catching soft duplicates

What "clean enough" looks like

Need forward-deployed AI engineering?