Day 3: WhatsApp Onboarding
Not every service can be set up by an agent—wrestling with Meta’s developer portal, managing secrets, and debugging a live LLM classifier in production logs.
The Limits of Automation
Not every service can be set up autonomously by an agent.
There are roughly 6 steps to creating a WhatsApp bot which looks deceptively simple. Once you start the process by registering an app on the Meta developer portal, you will realise that each step also has its own lengthy tutorials to complete. A lot of this workflow cannot be automated by an agent through CLI because Meta wants to verify that the WhatsApp business account is actually owned by a person. This is detrimental to the agentic onboarding experience.
My pain points can be roughly summarised as follows.
- Register a WhatsApp business account using a spare phone number
- Create an app and give my business account permission to control it
- Generate a scoped API token for the app to act on behalf of my account
- Setup my business profile, payment method, and webhook endpoint
The Secrets Problem
After spending half a day obtaining a WhatsApp API token, I manually added these secrets to my Fly.io app. Claude then verified that the webhook endpoint is working by generating a fake JSON payload based on the type definition in code and checking the HTTP response status. Writing an end‑to‑end test manually would have taken me at least 15 minutes of focused work as I have to read up the docs to craft the curl request.
Adding application runtime secrets is one area that can’t be securely handed off to agents yet. But this is mostly a fault with WhatsApp not supporting browser login flow for generating API tokens. If there’s an official wacli with auth command, we could have securely passed down the application secrets by piping them to flyctl, for example:
wacli auth token | flyctl secrets set
Claude Code does not need to be aware of the secrets at all times so the onboarding experience can be fully and securely agentic.
Improving the Template Matcher
After validating the messaging flow from an iOS device, I found a problem with the keyword‑based template matcher and prompted agents to suggest an improvement. Most of its ideas were focused on improving the keyword‑based matcher which I rejected. Instead I asked it to implement an LLM‑based intent classifier that we left out from the initial implementation discussion due to complexity.
I believe this is quite typical of the product feedback cycle where the customer discovers an issue and raises it to the engineering team as a feature request. Most of the time, the feedback is vague and requires multiple follow‑up questions before implementation work can start. It would be nice if we can generalise a prompt template that customers can fill out. Once approved, an agent could immediately pick up the work and drive it to completion.
The challenge here is implementation details that a typical customer won’t be able to answer. In this case, I had a strong preference to use OpenRouter for the intent classifier because it’s free and highly available. Incorporating my feedback into the prompt process is an interesting product design problem.
Debugging in Production
Since I didn’t have a harness that the agent can use for validating the WhatsApp flow, I ended up testing the new classifier together with the agent. I would send a message to the bot while my agent watched the logs from Fly.io. The first request to OpenRouter returned 400 status without any error details. The agent realised that the response body wasn’t logged so it quickly updated the production code to show more detailed logs. On the second request, it found the error details in logs and suggested updating my privacy settings on OpenRouter to allow free models to log my prompt request. This is another manual step which I had to do because OpenRouter doesn’t have a CLI for programmatic update.
When checking the app logs is straightforward for the agent, running adhoc database queries is not. With Fly.io’s unmanaged postgres, the database is only accessible from your local environment through a ssh tunnel. You can run psql through this tunnel using the configured database password to execute arbitrary queries manually. However, Claude Code does not have the ability to attach to the stdin of the tunneled shell, hence it always hangs.
After a series of tweaks, the classifier finally worked. Next, we will be implementing a task executor for messages that require background processing.