Ahad Nawaz is a Software Engineer with over 5 years of experience. He builds web and mobile applications using React, Next.js, TypeScript, and Node.js. He is based in Lahore, Pakistan.

Where is Ahad Nawaz based?

Ahad Nawaz is based in Lahore, Punjab, Pakistan. He is available for remote work and local opportunities.

What does Ahad Nawaz do?

Ahad Nawaz is a full-stack Software Engineer. He builds web apps, mobile apps, APIs, and automation tools. He works with React, Next.js, React Native, Node.js, and TypeScript. He is open to hire for projects and full-time roles.

How can I contact Ahad Nawaz?

You can contact Ahad Nawaz via email at ahadnawaz585@gmail.com, book a meeting on Calendly, or reach out on LinkedIn or GitHub. Links are on his portfolio at https://ahadnawaz.dev.

Software engineer in Lahore?

Ahad Nawaz is a Full-Stack Software Engineer in Lahore, Pakistan. He is available for hire for full-stack development, web and mobile apps. Visit his portfolio to see projects and contact him.

MONA AI: Building a WhatsApp Bot With Gemini, Stability AI, and Puppeteer

Architecture, prompt design, and the message queue that keeps a WhatsApp AI bot reliable. Why I picked Puppeteer over the official Cloud API, the rate-limiting strategy, and what broke at scale.

MONA AI is a WhatsApp assistant that answers questions, generates images, and routes complex requests to humans. It runs on Puppeteer + Gemini + Stability AI + a Postgres queue. Here is how it works and what I would do differently next time.

Why Puppeteer Instead of the WhatsApp Cloud API

The official Cloud API is the right answer for most teams: signed templates, business verification, predictable rates. I went with whatsapp-web.js on Puppeteer for one reason: time to first message. The client wanted to validate the idea in days, not weeks of business verification.

The trade-off was clear: Puppeteer means a real browser, real memory, and a real risk of session expiry. I mitigated with three things:

Session persistence on disk. The browser session lives on a mounted volume so a restart does not re-pair the device.
Heartbeat watchdog. A sidecar pings the browser every 30 seconds and restarts the worker if it loses connection.
Idempotent message handling. WhatsApp can replay incoming messages on reconnect. Every message id is deduped at the queue.

The Message Loop

Every incoming message hits this loop:

browser.on("message", async (msg) => {
  const ctx = await loadConversation(msg.from);
  await queue.add("process-message", {
    messageId: msg.id._serialized,
    chatId: msg.from,
    text: msg.body,
    type: classify(msg),
  });
});

The Puppeteer worker only ingests. All the AI work happens in BullMQ workers backed by Redis. This separation matters: when Gemini is slow, the WhatsApp listener never blocks.

Classifying the Intent

Before calling an LLM, I run a cheap classifier on the message:

Image request, "draw", "image of", "generate a picture", → Stability AI
Question or chat, default fallback, → Gemini
Operator handoff trigger, "talk to human", "speak with someone", → human queue

The classifier is a regex pass plus a small Gemini call when the regex is ambiguous. The cheap path catches 70% of messages without touching the LLM, which cuts costs meaningfully.

Prompt Design That Stayed Stable

The system prompt has three sections, in this order, every time:

Identity and tone. Who MONA is, how she speaks, what she will and will not do.
Context. The last N messages of the conversation, plus any structured data the user has shared (their order id, location, etc).
Task. The user's latest message, isolated and labeled clearly.

The order matters. Putting identity first anchors the model. Putting context next gives it grounding. Putting the user's input last keeps the model's attention on what to answer.

Image Generation Without Hammering Stability

Image requests are slow (4-8 seconds) and expensive. The queue gates them:

Per-user rate limit: 5 images per 24 hours, enforced in Redis with a sliding window.
Global concurrency: max 3 in-flight Stability calls. Excess requests wait in BullMQ.
Result caching: identical prompts return the previously generated image for 24 hours.

The user feels nothing different. The bill goes down by 60%.

What Broke at Scale

Puppeteer Memory Leaks

After a few days the worker would consume 4GB of RAM. The fix: schedule a graceful restart every 6 hours. The watchdog handles the cutover so no messages are dropped.

Conversation Context Bloat

Naively appending every message to the context window made Gemini calls progressively more expensive. I now summarize conversations older than 20 messages into a single "memory" block and prepend that instead of the full history. Context size stays bounded.

Token Bombs

Users pasting huge documents or message dumps. The pre-processor truncates anything over 4,000 characters and asks the user to clarify what they want done with it. Token usage and latency both stabilized.

What I Would Change

I would migrate to the WhatsApp Cloud API as soon as a project survives validation. Puppeteer is great for prototypes and hostile to operations. The Cloud API has rate limits but never randomly logs out.

I would also push more of the classifier into a small fine-tuned model. The regex catches the obvious cases, but a tiny supervised classifier would catch the ambiguous ones without paying Gemini latency.

The Stack

Node.js + TypeScript, BullMQ + Redis, Postgres, Puppeteer with whatsapp-web.js, Gemini for chat, Stability AI for image generation, Docker on a small VPS. Total monthly cost at the scale I ran it: under $40 including infra.

The lesson worth repeating: separate the listener from the worker. Anything that talks to WhatsApp should do nothing else. Anything that talks to an LLM should never block message intake. With that split, the rest is engineering discipline.