Most AI automation agencies in 2026 will charge you ₹3–15 lakh to wire ChatGPT into your CRM. Most of those projects fail in production within 90 days.
Here's what to ask before you sign.
This is not a marketing post. We run an AI automation practice at Innovatrix and we'd rather lose a deal than win one we can't deliver against. The decision-making framework below is what we'd use if we were the buyer instead of the seller. It's the same set of questions we wish more clients had asked us at the start of our worst projects.
Read this and you'll either hire us with sharper expectations, hire someone better-suited, or build the thing yourself with eyes open. All three are good outcomes. The bad outcome is paying ₹8 lakh to wire up a workflow that nobody trusts in production three months later.
Why most AI automation projects fail in production
There are five repeated failure modes. Every project we've seen fail (ours included, in our first year) failed in one of these five ways:
- The agency demoed against a happy path and never built error handling. When the OpenAI API returns a 503 at 11 PM IST, the workflow silently drops the message. Nobody finds out for a week.
- The agency never showed monitoring. "It's working" was a verbal claim, not a dashboard. When something goes wrong, nobody can tell whether the workflow even ran.
- The agency picked the wrong tool for the wrong reason. Make.com because the agency had partner credits. Zapier because the founder once used it. n8n because it was hyped on Twitter. None of these are strategic choices.
- The agency disappeared after delivery. No documentation, no runbook, no transition plan. Six months later, when the workflow breaks, you're paying the original agency a retainer to remember what they built.
- The agency priced per-workflow, which incentivises slop. Five separate workflows that should have been one orchestrated flow. Each works in isolation; together they produce inconsistent state.
Every one of the eight buyer questions below is reverse-engineered from these failure modes.
The 8 questions to ask before you sign
1. "Show me your monitoring dashboard for a production client"
This is the first question. If they can't show you a real, live dashboard for an existing production deployment, walk away. Not a screenshot. A live dashboard.
What you're looking for: per-execution logs, error rates, latency percentiles, alerting on failures. The specific tool doesn't matter — n8n's built-in execution log is fine, Make's history view is fine, custom Grafana is fine. What matters is that they treat the workflow like a service, not a script.
Red flag: "We get notified by email if something breaks."
Green flag: "Here's the dashboard. We'll add your workflows to a section like this. We get a Slack alert in our shared channel within 60 seconds of any failed execution. Failures get triaged within four business hours under our SLA."
2. "What's your fallback when an n8n node hits a limit?"
Every visual workflow tool — n8n, Make, Zapier — has logic ceilings. A particular branching pattern that the visual builder doesn't express well. A custom data transformation. A retry policy with exponential backoff. A polling loop that needs more control than the trigger gives you.
A good agency has a code escape hatch. n8n has a Code node that runs JavaScript or Python. Make has a custom function module. Zapier has Code by Zapier. The question is whether the agency uses these comfortably or treats them as a last resort.
Red flag: "We try not to use code; we keep everything visual." (Translation: they can't write code well, so the workflow will be five times more complicated than it needs to be.)
Green flag: "About 20% of our workflow steps end up being code nodes. We write them in TypeScript, version them in Git, and deploy them through n8n's import-export. Here's an example."
3. "What does your 90-day failure transition plan look like?"
If you're going to hire an agency, plan for the case where you fire the agency or they shut down. Cloud services come and go. Make.com could change its pricing model again next quarter. Your agency could pivot to a different vertical.
A good agency hands you a workflow you can take elsewhere. The way you do that:
- Self-hosted n8n on infrastructure you own (AWS, DigitalOcean, your own VPS)
- All workflows stored as JSON in a Git repo you control
- All credentials stored in a secrets manager you have access to
- A runbook document that explains what each workflow does in plain English
Red flag: Workflows live on the agency's Make.com or n8n cloud account; you don't have direct access. Credentials are stored in the agency's password manager.
Green flag: "On day one we set up n8n on your AWS or your VPS. The Git repo is in your GitHub org. We have a service account with limited permissions. If you fire us tomorrow, you cut off the service account and everything keeps running."
4. "How do you ground LLM responses in our actual data?"
This is the question that separates agencies that have shipped real AI features from agencies that have shipped wrappers around chat.completions.create.
If the agency just hands the user's question to OpenAI with a system prompt, you're paying ₹8 lakh for a thin wrapper around a $20/month API. The output will be confidently wrong about your product half the time. Your customers will lose trust in the assistant within a week.
The right answer involves some form of grounding — usually retrieval-augmented generation (RAG). The user's question goes to a vector database first, the most relevant chunks of your actual content come back, those chunks are stuffed into the prompt with explicit instructions to only answer using the retrieved content, and the LLM is told to say "I don't know" if the answer isn't in the chunks.
Red flag: "We use ChatGPT API." (No mention of vectors, embeddings, retrieval, or any grounding.)
Green flag: "We index your knowledge base into a vector database — typically pgvector if you're already on Postgres, or Pinecone if you need scale. Every user query gets a top-K retrieval first. The retrieved chunks are passed into the prompt with explicit citation instructions. Hallucinations drop by an order of magnitude. Here's how we did it for Bandbox."
5. "What happens to my workflows if Make.com gets repriced?"
In November 2025, Make.com transitioned its billing unit fully to credits. In 2024, Zapier reorganised its tier structure. Pricing model changes are not theoretical — they happen, and they happen suddenly.
If your agency has built your entire workflow stack on a closed-source SaaS, you're exposed to that platform's pricing decisions. If they've built on n8n self-hosted, your only exposure is your VPS bill.
This is not an argument for n8n in every situation. Make.com's visual builder is more polished. Zapier has more pre-built connectors. There are workflows where Make is genuinely the right choice — particularly for teams that need a non-technical user to maintain the workflows after delivery.
But the agency should be able to articulate the trade-off. If they default to one tool for every project, that's not strategy; that's lock-in.
Red flag: "We're a Make partner — we use Make for everything."
Green flag: "For your use case, n8n self-hosted works. Here's why: high execution volume (per-step pricing on Zapier and per-operation pricing on Make would push you to ₹50K+/month at this scale), need for custom code, need for data residency. If you had different requirements — non-technical maintenance, low volume, mostly SaaS-to-SaaS sync — we'd recommend Make."
6. "Show me five clients in production using your stack"
Not testimonials. Names. Live deployments. Workflows that have run for at least six months.
The reason: AI automation has a long tail of edge cases. You don't see the bugs in the demo. You see them in week 12 when a customer sends a four-part WhatsApp message in Bengali at 2 AM and the workflow's language detection routes it to the wrong queue.
An agency that has shipped five production workflows has hit five different edge case categories. An agency that has shipped one is still learning at your expense.
For us specifically: we run our own n8n stack internally — about 80+ hours/month saved in operational work, sub-₹8,000/month tooling cost. We eat our own cooking. The named, live deployments include Bandbox, where 84% of inbound WhatsApp queries get resolved by AI on first contact, sub-3-second response times, and 130+ hours/month saved at the operations team. That stack has been in production for over a year.
Red flag: "We've done lots of projects, here are some testimonials." (Testimonials are easy to manufacture; production deployments are not.)
Green flag: "Here are five names. Here's what each workflow does. Two of them will let you reference-check directly."
7. "Are you billing for outcomes or for hours?"
Pricing models tell you what the agency optimises for. Hourly billing optimises for hours. Per-workflow billing optimises for workflow count. Outcome-based billing optimises for outcomes.
Most agencies in 2026 still bill per-project or hourly. That's fine for one-off automations. For ongoing AI operations, the better model is a retainer tied to a specific outcome metric — first-contact resolution rate, hours saved per month, leads qualified per week. Outcome billing is rare because it's harder to scope and price, but it aligns incentives correctly.
Red flag: "We bill ₹2,500 per hour, no scope upfront."
Green flag: "Fixed-price for the build (₹4 lakh, four-week sprint). Then a monthly retainer of ₹35K that covers monitoring, edge case fixes, prompt tuning, and one new workflow per month — tied to a target of maintaining 80%+ first-contact resolution. If we drop below 75% in any month, the retainer is reduced pro-rata until we get it back."
8. "What's your stance on open-source vs proprietary?"
This is a litmus test for the agency's technical maturity. There's no single right answer — open-source has real downsides (you maintain it, you patch CVEs, you debug the source) — but the agency should be able to explain a coherent position.
Our position: n8n self-hosted as the default for production workflows because of cost economics at scale, code escape hatch, and data sovereignty (Indian SMB clients increasingly care about data not crossing borders). Make.com when the buyer needs visual maintainability and low volume. Custom Python (FastAPI + Celery) when the workflow is complex enough that it's no longer a workflow — it's a service.
Red flag: "We use whatever the client wants." (No technical opinion = no technical judgement.)
Green flag: A clear default with named exceptions and named trade-offs.
The four-tool reality check (with INR pricing)
Free Download: AI Automation ROI Calculator
Plug in your numbers and see exactly what automation saves you. Based on real project data from our client engagements.
There are four serious choices for the AI automation tool layer in 2026. Here's the honest comparison.
n8n (self-hosted)
Cost: ₹500–₹2,500/month for the VPS that hosts it. Community Edition is free; you only pay for compute. Add LLM API costs separately (₹5K–₹50K/month depending on volume).
Strengths: Per-execution pricing is the cheapest at scale. Code escape hatch in JavaScript or Python. Data sovereignty (runs on your infrastructure). Native LangChain integration as of n8n 2.0 (December 2025) with 70+ AI nodes and persistent agent memory.
Weaknesses: Setup needs Docker comfort. Maintenance is on you. The UI is good but not as polished as Make's. Community support, not enterprise SLA, unless you pay for n8n Cloud Business.
When to use: High execution volume (>5K/month), data sovereignty requirements, need for custom code, technical team that can run a VPS.
n8n (cloud)
Cost: Starter at $24/month (≈₹2,000/month) for 2,500 executions. Pro at $60/month (≈₹5,000/month) for 10,000 executions. Business at $800/month for 40,000 executions plus SSO and Git workflows.
Strengths: No infrastructure to manage. Same feature set as self-hosted. Direct support from n8n team on higher tiers.
Weaknesses: Per-execution caps mean cost scales linearly with usage. At 50K executions/month, the Business tier becomes necessary and the price jumps from ₹5K to ₹65K/month.
When to use: Mid-volume use cases (1K–10K executions/month) where managed infrastructure is worth the price, but execution volume is predictable enough that Business-tier overage fees won't surprise you.
Make.com
Cost: Credit-based since November 2025. Free tier (1,000 operations/month, 2 active scenarios). Core at $10.59/month for 10,000 operations. Pro at higher tiers.
Strengths: Most polished visual builder. 1,500+ pre-built integrations. Easy for non-technical users to maintain after delivery. SOC 2 Type II compliance certificates available.
Weaknesses: Cloud-only — no self-hosted option. Operation-based pricing inflates fast on multi-step workflows. Each module execution is one operation; a 10-step workflow run 5,000 times = 50,000 operations. Less flexible for custom logic than n8n's code nodes.
When to use: Non-technical client team that has to maintain workflows after delivery. Predominantly SaaS-to-SaaS integrations. Compliance requirements that favour managed cloud over self-hosted.
Zapier
Cost: Free tier (100 tasks/month, 5 zaps). Professional at $19.99/month for 750 tasks. Higher tiers escalate quickly into four-figure monthly bills.
Strengths: 8,000+ app integrations — the largest catalogue. Easiest for non-technical users to start.
Weaknesses: Per-task pricing is the worst at scale. A 5-step workflow run 5,000 times = 25,000 tasks, which lands you in the four-figure tier. Limited code support. Limited control over error handling and retries.
When to use: Honestly, rarely in 2026 for production AI automation. Zapier remains useful for one-off, low-volume, mostly-SaaS automations where the team has no technical capacity. For anything else, the cost economics are difficult to justify.
Custom Python (FastAPI + Celery + Redis)
Cost: Developer time. ₹2–8 lakh for the initial build depending on scope. Hosting on AWS at ₹3K–₹15K/month.
Strengths: Unlimited flexibility. No platform lock-in at all. Full debuggability. Best fit for workflows that are actually services in disguise.
Weaknesses: No visual builder — every change is a code deploy. Higher upfront build cost. Requires a development team to maintain.
When to use: Complex orchestration where the "workflow" is really a backend service. Workflows that need true horizontal scale (>100K executions/day). Workflows where the logic is more than a no-code tool can express cleanly.
Pricing models, honestly compared
Four common pricing models for AI automation engagements. Here's how each affects buyer-side risk.
Hourly. Bad for both sides. Buyer can't budget. Agency loses on efficient delivery. Avoid for any project beyond a small POC.
Per-workflow. Adequate for clear, scoped builds (e.g., "build five named automations for ₹X each"). Becomes problematic when scope creep happens — every change is renegotiated. Works if both sides hold the scope tight.
Fixed-project. The standard for agency engagements. ₹3–15 lakh for a 4–8 week build. Works if the discovery phase produced a tight scope. Fails if discovery was rushed and the actual implementation surface is wider than the proposal.
Retainer. Best for ongoing operations. ₹30K–₹1.5 lakh/month covering monitoring, edge cases, prompt tuning, new workflows, and incident response. Both sides know what to expect.
Outcome-based. Rare and ideal. The retainer is tied to a measurable outcome — first-contact resolution rate, response time SLA, leads qualified. Aligns incentives. Hard to scope correctly and most agencies don't offer it. We do, on engagements where the outcome metric is verifiable.
Our default at Innovatrix: fixed-price for the build (with a tight discovery phase to make the price honest) plus an optional retainer for ongoing operations. Outcome-based on engagements where the metric is auditable from a system both sides can see.
Two proof points
Bandbox: 84% AI first-contact resolution at sub-3-second response time
Bandbox is Kolkata's oldest dry-cleaning brand. Inbound queries used to come through WhatsApp — booking slots, pickup confirmations, garment status, billing questions. The operations team was drowning in repetitive messages.
We built a WhatsApp AI automation on n8n self-hosted with Chatwoot as the conversation layer. The architecture: incoming WhatsApp messages route to Chatwoot, get categorised by an LLM with a custom prompt grounded against Bandbox's actual operational data (active bookings, pickup schedules, billing records). Routine questions get answered by the AI directly. Complex or escalated questions route to a human agent with the AI's draft response pre-loaded.
A year in production: 84% first-contact resolution by AI, sub-3-second average response time on the WhatsApp side, 130+ hours/month saved at the operations team, zero hallucinated bookings in 50,000+ inbound messages. The grounding strategy — always check actual data, refuse to commit to anything outside the retrieved context — is what made the zero-hallucination number possible.
Total project cost: in line with our standard fixed-price + retainer model. The retainer pays for itself in saved operational hours within the first month.
Innovatrix internal stack: 80+ hours/month saved, sub-₹8K/month tooling
We run our own AI automation stack internally. We talk about it because it's the only honest way to recommend something.
The stack: n8n self-hosted on a small VPS (₹3,500/month). OpenAI and Claude API for LLM calls (₹3,000–₹5,000/month, varies by month). Chatwoot for chat. Brevo for email automation. Custom Python services where n8n hits its ceiling. Total tooling cost stays under ₹8,000/month.
What it automates: outbound prospecting (Apollo + Instantly + n8n cleanup workflows), blog content cross-posting (Directus → n8n → LinkedIn + Twitter + Dev.to + Hashnode), client reporting (Google Search Console + GA4 → weekly digest), DMARC report parsing for our email infrastructure. Total time saved: 80+ hours/month conservatively.
We're a DPIIT-recognised startup and an AWS Partner — both relevant here because they shape what we can credibly recommend. The DPIIT recognition makes us a verifiable Indian entity for Indian SMB clients. The AWS Partner status means we get architecture review hours during the design phase of every project.
The two-question gut check
If you don't want to read the full checklist, here are the two questions that'll catch 80% of bad agencies:
- "Show me a live monitoring dashboard for an existing production client." If they can't, walk away.
- "What happens to my workflows if you disappear tomorrow?" If the answer involves "you'd need to hire us back to migrate," walk away.
Everything else in this post is signal-to-noise improvement. Those two questions are the gate.
If you want to see the rest of what we've shipped — our portfolio lists every named, live deployment with metrics. If you're specifically looking at customer-facing AI conversations rather than ops automation, our AI chatbots practice sits next to the broader automation work.
Free Download: AI Automation ROI Calculator
Plug in your numbers and see exactly what automation saves you. Based on real project data from our client engagements.
Written by

Founder & CEO
Rishabh Sethia is the founder and CEO of Innovatrix Infotech, a Kolkata-based digital engineering agency. He leads a team that delivers web development, mobile apps, Shopify stores, and AI automation for startups and SMBs across India and beyond.
Connect on LinkedIn