- Shipped Tesla Assist, a production GenAI chatbot integrating retrieval-augmented generation across 50K+ documents โ driving 8,500+ vehicle orders, 6,000+ demo bookings, and a 50% reduction in human handoffs
- Built an internal 0-to-1 AI analytics platform processing unstructured social data for real-time brand sentiment, partnering across 4 organizations from concept to launch
- Designed and deployed GenAI features into internal CRMs โ automated agent communications, cut drafting time by 80%, and lifted customer satisfaction by 10%
- Led a strategic infrastructure migration from legacy vendors to a modern custom stack, projecting multi-million dollar annual savings while improving personalization and customer experience
- Promoted rapidly through two roles in under two years based on impact and execution velocity
- Achieved #1 worldwide CSAT ranking for vehicle delivery experience across all Tesla markets
- Owned end-to-end delivery funnel optimization, reducing friction points that directly increased customer satisfaction scores
- Scaled SDR organization from 20 to 75+ reps (6 teams, 6 managers), generating $50MM+ pipeline ARR. Built the function from scratch through IPO
- Defined tooling stack, automation workflows, and data infrastructure that enabled repeatable pipeline generation at scale
- Served as technical product advisor for enterprise and education clients. Scoped Apple ecosystem deployments, defined integration requirements, and drove $15MM+ annual revenue as #1 Solutions Consultant, NY Metro
- Bridged engineering and business stakeholders on complex multi-product deployments, translating technical constraints into actionable implementation roadmaps
AI / ML Systems
LLM product strategy, RAG architecture design, model evaluation & selection, prompt engineering, embedding models, retrieval ranking, inference optimization
Product Execution
0โ1 builds, cross-functional leadership, roadmap prioritization, ML lifecycle management, A/B testing, data-driven iteration
Infrastructure & Platform
Cloud infrastructure, GPU-accelerated local inference, CI/CD, platform migrations, developer experience, system architecture
Builder at Heart
Runs local LLM inference on an NVIDIA 5090. Built a 1 GH/s Bitcoin mining rig from scratch. Operates an expansive AI agent infrastructure at home. Happiest when building things that didn't exist yesterday.
Lessons from building and operating multi-agent AI systems in production โ what actually works when you need reliable, consistent output from LLMs.
Use a Scratchpad Before the Final Answer
Force the model to reason in a structured block (like <analysis> tags) before producing the final output โ then strip the scratchpad from the result. This gives you chain-of-thought quality without polluting the output. The model self-corrects during the draft, and the user only sees the clean answer.
Concrete Anti-Patterns Beat Vague Guardrails
"Be careful" does nothing. "Don't create helpers or abstractions for one-time operations โ three similar lines is better than a premature abstraction" actually changes behavior. Pair every anti-pattern with the correct alternative. The model responds to specificity, not sentiment.
Numeric Anchors Over Qualitative Instructions
"Keep it concise" is subjective. "Keep responses under 100 words" is measurable. In my testing, numeric length anchors reduce output token usage by ~1-2% compared to qualitative "be brief" instructions. Small numbers compound โ especially across thousands of agent turns per day.
Tell the Model When NOT to Use a Tool
Negative examples are as important as positive ones. If an agent has access to 10 tools, it will reach for the most powerful one by default. Explicitly list when each tool is the wrong choice: "If you need a specific file, use Read โ don't spawn a sub-agent for a one-file lookup." Prevents massive over-engineering on trivial tasks.
Brief Like a Colleague, Not a Command Line
When delegating to sub-agents, terse command-style prompts produce shallow, generic work. Instead, brief the agent like a smart colleague who just walked into the room: explain what you're doing, why it matters, and what you've already ruled out. Include file paths, line numbers, specific context. Never delegate understanding โ write prompts that prove you've already done the thinking.
Guard Against Both Fabrication AND Over-Hedging
Most people only guard against hallucination. But the opposite failure mode โ hedging confirmed results with unnecessary disclaimers โ is equally costly. Instruct models to report faithfully in both directions: if tests fail, say so with output; if they pass, state it plainly without caveats. Accurate reporting means no false positives and no false negatives.
Demand Verbatim Quotes for Context Carry-Forward
When summarizing conversations or carrying context between sessions, require direct quotes for critical items โ not paraphrases. Paraphrasing introduces drift. Over multiple summarization cycles, "fix the auth bug in login.ts" becomes "address authentication concerns" becomes "review security posture." Verbatim anchors prevent this semantic decay.
Lead With the Action, Not the Reasoning
Inverted pyramid structure: answer first, context second. "The build failed because X" beats "After careful analysis of the logs, I noticed that..." Structure output so the reader never has to re-parse. If the person has to reread your summary, the time saved by brevity is already gone.
Explicitly Override Default Behaviors
LLMs have strong default instincts โ adding sleep/polling loops, over-commenting code, creating abstractions "for future use." You need to explicitly override these. "Do not sleep between commands that can run immediately" and "do not add comments unless the why is non-obvious" are instructions that fight real failure modes, not hypothetical ones.
Structured Templates for Repeatable Tasks
For any task that runs repeatedly (summarization, reporting, code review), use rigid section templates โ not freeform instructions. A 9-section summary template with numbered headers produces dramatically more consistent output than "please summarize thoroughly." The structure constrains the model's output space to exactly what you need.