A Capable and Safe Agent to Manage Applications and Infrastructure on AWS
Integrating AI Agents into real businesses requires more than signing up for a Claude Max subscription. With frontier models becoming increasingly capable recently, it seemed like a good opportunity to develop a solution on AWS that can be used to safely develop and manage web applications and debug production issues. The video below is an example of the resulting agent doing just that.
Video demonstrating conti-agent opening a real pull-request against my production site, and creating design mockups, in Slack. Also my review of the agent's work, approval, and verification of the new feature live.
The Agent Must be Trustable
The latest models are good, however the risk no longer lies in the possible inability of these tools to perform complex tasks competently, but rather in being able to trust such competence with access to your secure production environment and its applications, especially when real revenue or compliance concerns are involved. It would be difficult to justify deploying such an agent, however capable, without being able to guarantee its inability to autonomously perform destructive actions with the organization's resources: there have already been countless high-profile cases of rogue agents doing just this.
The solution is straightforward and follows many of the same principles that already motivate infrastructure design: least-privilege, well-defined blast radii, and clear audit trails, with one notable addition: strict human-in-the-loop guardrails that bring all critical decisions (and, importantly, responsibility) to an authorized biological operator.

The agent is incapable of making changes to the production database without documented human approval
What it Does
The agent is currently capable of debugging production issues: it pulls the logs, finds the broken deployment, and identifies exactly what changed. It watches spend, inspecting Cost Explorer and flagging anomalies. It ships software: it clones a repo, navigates the code, makes the change, generates the database migration, and opens a pull request for you to review; it is incapable of pushing to main / production. It operates production applications: the agent is capable of managing the content of a web application or CMS system via API, in a draft-first, HITL workflow. It also produces real design work, turning requests into mockups that can be reviewed in detail before any code is written.
The agent's AWS CLI access is strictly read-only, and mediated through specific, tightly scoped skills. The agent cannot execute a terraform apply command. Any infrastructure changes are submitted as draft PRs against the IaC, and implemented only in authorized CI pipelines.

The Architecture, Briefly
The shape of the system follows the same principle as everything above: the agent is given room to work and stopped at the boundary that matters.

The agent runs on the Amazon Bedrock AgentCore Runtime. Every invocation gets its own isolated microVM that is destroyed when the session ends, and the runtime has no public endpoint: it is reachable only through authenticated, signed calls. The agent itself is built on the Strands Agents SDK, which keeps the whole thing defined as code rather than configuration.
Its capabilities are organized as skills. The agent carries a short catalog and loads the full instructions for a skill only when it needs them, which keeps it fast and focused. Each skill is a self-contained unit: its own instructions, tools, and least-privilege permissions. Adding a capability means defining in code exactly what that capability is allowed to do, and nothing more. The model underneath can be any current frontier model on Bedrock, and the long-term trend seems to favor these becoming largely interchangeable in the future.
Motivation Behind Building This
This project was begun in an attempt to provide a client-owned AI agent solution that businesses could deploy to support their existing or future operations. Prior to actually delivering such a solution to clients as part of my consultancy, however, it was necessary to thoroughly validate the performance against an actual production AWS environment and application. This agent is currently the platform engineer managing the website and AWS accounts associated with my own business (this website). The next step is to validate this solution's real business value by deploying it into a company doing millions in annual revenue. Efforts are currently underway to deploy this solution to Fine's Gallery. I'll be writing about that deployment as it happens.
