What does a Fractional GTM Engineer do?

I build and optimize the AI systems that power your go-to-market — automated outbound, enriched pipelines, smarter CRM, and support automation. I focus on sales, marketing, and support teams because that's where most companies have the most repetitive work that AI can take over. I write the playbooks so your team runs it without me.

What AI consulting services do you offer?

AI Sales Development (lead enrichment, scoring, drafted outreach on arrival), AI Customer Support (automated triage, knowledge base management), AI Sales Enablement (meeting prep, post-call analysis), CRM & Data Enrichment (structured data extraction, deal monitoring), AI Content & Demand Generation (content workflows, outbound sequences), and AI Training for Non-Engineers (hands-on Claude Code training for PMs, marketers, and ops).

How much does AI consulting cost?

The AI Workshop is $5,000 for a one-day intensive. You get a full audit of your current AI usage, a list of quick wins, and a 90-day roadmap. Ongoing Advisory is $8,000/month with a 3-month minimum, then month-to-month. That covers strategy, team coaching, workflow prototyping, and async Slack/email support.

How long does it take to see results from AI consulting?

Quick wins usually land within the first 30 days. Measurable results across your sales, support, or marketing workflows typically show up by day 90. How fast depends on your team's readiness and which workflows we go after first.

Do you build the AI systems yourself?

I design the systems and write the playbooks. Your team builds. I'll prototype workflows and stick around as a fractional advisor to fix what breaks and adjust what's slow, but the goal is always to hand off ownership to your people.

What kinds of companies do you work with?

B2B companies with sales, marketing, and support teams that are ready to adopt AI. My background is 15+ years at Tesla, Mapbox, and HubSpot across growth, operations, and product management. The common thread is teams losing time to manual work that AI can handle.

Business Flywheels: Use AI to Autonomously Improve Performance

Most businesses using AI are generating the data to improve it — and not using it.

Andrej Karpathy left his computer running for two days and went to relax. He came back to 700 experiments, 20 real improvements, and an 11% performance gain on a benchmark he'd been manually tuning for two decades.

Karpathy helped build modern AI at Tesla and OpenAI. Even he was surprised. "Seeing the agent do this entire workflow end-to-end and all by itself is wild," he wrote.

He called the tool autoresearch. It's open source and built for AI researchers. But he ended his announcement talking to everyone else:

"Any metric you care about that is reasonably efficient to evaluate can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too."

I think most problems do.

The short version

Autoresearch works on the model itself, not the outputs. An agent modifies the training code (architecture, hyperparameters, optimizer, loss function), retrains, checks if the metric improved, keeps or discards, and repeats. You wake up to a better model.

To use this pattern you need two things: a model you're fine-tuning for a specific task, and historical data with labeled outcomes to evaluate against. The agent runs experiments against that evaluation set the same way Karpathy's runs against validation loss.

If you're calling OpenAI or Anthropic's API, this doesn't apply directly. You don't control the training code. But that's worth examining on its own. You're paying per-token for a general-purpose model when you could be running a smaller, specialized model that you fine-tune for your specific task.

A single GPU on a cloud instance running an open-weight model like Llama or Mistral is enough to get started. That's the setup Karpathy used. It's more infrastructure than calling an API, but you own the training code, which means you can actually improve it.

Here are 8 places where that loop is waiting.

1. Support ticket routing

You have an AI classifying tickets by product area, urgency, or team. Every day, you get ground truth: was it routed correctly? How fast did it get resolved? Did it escalate?

Most companies treat each misrouted ticket as a one-off annoyance. Every correction is a labeled training example. An agent could use those corrections to retrain the classification model overnight.

2. Lead scoring

Your CRM has a model predicting which leads are worth pursuing. It was probably trained on data from 12-18 months ago and nobody has touched it since.

Every closed deal is new signal. Buying patterns from last year don't predict this year's customers. A model that retrains on recent outcomes stays accurate. One that doesn't drifts quietly while your sales team wonders why conversions are dropping.

3. Email personalization

Open rate, reply rate, revenue per send: about as clean as feedback signals get.

If you have an LLM generating outreach emails, you're sitting on thousands of labeled examples: sends paired with reply outcomes. That's a fine-tuning dataset. An agent can experiment with how the model is trained (adjusting the architecture, the loss function, how it weighs positive vs negative examples) and retrain overnight against your historical reply data.

You're building a model that's better at understanding what makes someone respond.

4. Contract and document review

Legal teams reviewing AI-extracted clauses generate feedback constantly. Every correction, edit, and approval tells you where the model got it wrong.

Most companies capture none of it. The model makes the same mistakes in month six that it made in month one because nobody connected the lawyer's red ink to the training pipeline. Wire that loop and the model learns your organization's contracts and edge cases instead of staying generic.

5. Internal knowledge search (RAG)

An internal chatbot that pulls from your company's documents has one honest feedback signal: did the employee get what they needed?

Did they click the result? Did they ask a follow-up immediately (that usually means the answer was wrong), or walk over to someone's desk instead?

Almost nobody uses this data after launch. The retrieval model, the embedding weights, the re-ranker, all frozen at whatever someone configured during the pilot. An agent could retrain the retrieval model against real usage data overnight. That's the most honest evaluation set you have.

6. Ad copy generation

If AI is generating your ad variations, you already have what you need: CTR and ROAS.

You've got a clear metric and a growing pile of labeled outcomes. An agent can retrain the model that generates your ad copy against that conversion data, experimenting with the training setup until it gets measurably better at producing copy that converts. The metric is binary: the ad worked or it didn't.

7. Code review and PR summarization

Dismissed AI suggestions, edited summaries, post-merge bugs from code that passed AI review — these are all labeled outcomes the model behind your code review tool never sees.

Most code review tools don't capture any of it. The model behind the suggestions stays static, calibrated to generic best practices rather than your codebase, your team's standards, or the patterns that have historically broken things.

The feedback loop here is slower (outcomes take days or weeks to materialize). But those dismissed suggestions and post-merge bugs are labeled data. An agent could retrain the model against your team's actual preferences overnight.

8. Churn prediction

A churn model trained last year doesn't know what this year's churn looks like. The customers at risk today behave differently than the ones who left 18 months ago because usage patterns and products have shifted underneath the model.

Every customer who churned (or didn't) after being flagged is a labeled outcome. Without regular retraining, your customer success team ends up chasing the wrong accounts while the actual at-risk customers leave quietly.

Getting this right means earlier warnings. Earlier warnings mean more options: discounts, check-ins, or feature introductions before someone has already decided to go.

The pattern

Every one of these has a model in production, a measurable outcome, and historical data that could be used to retrain it, and nobody has wired the loop together.

Karpathy ran 700 experiments in two days and improved a system he'd been manually tuning for his entire career. The data to do the same thing is sitting in your production logs right now. Businesses that close this loop will have models that keep getting better automatically, while everyone else keeps running whatever they shipped in 2024.