Inside Docker's Fleet: How Autonomous AI Agents Accelerate Development

From Porno720, the free encyclopedia of technology

At Docker, the Coding Agent Sandboxes team has pioneered an innovative approach to software testing and maintenance by deploying a virtual team of AI agents, known as the Fleet. These agent roles—each with a distinct persona and responsibilities—autonomously test products, triage issues, post release notes, and even fix bugs. Built on Claude Code skills, the Fleet runs both locally and in CI, enabling faster iteration and more reliable releases. Below, we explore the key questions about this cutting-edge system.

What is the Coding Agent Sandboxes project?

The Coding Agent Sandboxes project, often abbreviated as "sbx," provides secure, microVM-based isolation for running AI coding agents such as Claude Code, Gemini, Codex, Docker Agent, and Kiro. Each agent operates inside a sandbox with its own Docker daemon, network, and filesystem, completely isolated from the host system. This setup ensures that agents have full autonomy to execute tasks without risking the user's environment. The sbx tool itself is a CLI that manages sandbox lifecycles—creating, starting, stopping, removing, configuring networking, and mounting workspaces—across macOS, Linux, and Windows. Every release requires thorough testing across platforms, upgrade paths, and under sustained load. Traditionally, this would involve writing test scripts and reporting tools, but the team chose a more autonomous path: the Fleet.

Inside Docker's Fleet: How Autonomous AI Agents Accelerate Development
Source: www.docker.com

What is the Fleet and how does it work?

The Fleet is a virtual team of seven AI agent roles built on top of the Coding Agent Sandboxes. Each role is defined by a skill—a markdown file that gives the agent a persona, responsibilities, and allowed tools. For instance, a build engineer role knows how to compile code and make decisions about build failures. Unlike traditional scripts that follow predetermined steps, skills function as role descriptions, empowering agents to use judgment. The same skill file runs identically whether on a developer's laptop or in CI, ensuring consistency. The Fleet automates tasks like exploratory testing, issue triage, release note generation, and bug fixing. This approach reduces manual effort and speeds up the development cycle by letting agents handle routine work autonomously, while humans focus on higher-level decisions.

How are agent skills different from scripts?

Agent skills are fundamentally different from traditional scripts. A script is a rigid set of instructions: "run these steps in order; if a step fails, stop." A skill, on the other hand, is a role description that says, "You are the build engineer. Here’s what you know and how you make decisions." This distinction matters because agents need judgment, not just instructions. When a test fails unexpectedly, a script halts with no recourse. An agent role, however, investigates: it can analyze logs, adjust parameters, or try alternative approaches within its authority. Skills are written in markdown and define a persona, responsibilities, and tool permissions. They are not procedural; they are declarative. This allows the agent to adapt to dynamic situations, making the Fleet resilient and efficient. The same skill can be invoked locally for quick debugging or in CI for automated execution, with no need for separate versions.

Why is the 'local first, CI second' approach important?

The 'local first, CI second' principle is central to the Fleet's design. Instead of building agent skills directly inside CI workflows—which leads to painful commit-push-wait-read-logs cycles—the team develops and debugs skills on their local machines first. For example, when building the /cli-tester skill (the Fleet's exploratory tester), they invoked it locally from the terminal. They watched the agent build binaries, exercise CLI commands, find issues, and report results. Each iteration took seconds, allowing rapid tweaks to the skill file. Only after the skill performed correctly locally did they wire it into CI. This approach eliminates the friction of debugging through CI logs, where each iteration can take minutes. The result: faster development, clearer understanding of agent behavior, and a single skill that runs flawlessly in both environments. CI becomes just another runtime for the same skill, not a separate version.

How does the /cli-tester skill operate?

The /cli-tester skill is one of the Fleet's roles, functioning as an exploratory tester. It autonomously builds the Docker Coding Agent Sandboxes binaries, exercises CLI commands (create, start, stop, remove, configure networking, mount workspaces), and searches for issues—like resource leaks, incorrect output, or unexpected failures. The skill runs across macOS, Linux, and Windows, ensuring platform compatibility. It operates on a schedule (e.g., nightly) and in response to events (e.g., a new release). When a problem is found, the agent reports it to a designated channel (such as a GitHub issue or Slack). Because the skill is defined as a role description, the agent uses judgment to decide which areas to test based on recent changes or historical failure patterns. This autonomous testing catches bugs that might be missed by traditional test scripts, and it operates with zero human intervention once triggered.

Inside Docker's Fleet: How Autonomous AI Agents Accelerate Development
Source: www.docker.com

How does CI integrate with the Fleet?

CI integration with the Fleet is elegantly simple: CI is just another runtime for the same agent skills. There is no separate "CI version" of a skill or translation layer. For example, the /cli-tester skill that runs nightly on macOS, Linux, and Windows runners is the exact same markdown file that developers invoke from their terminals. The CI workflow sets up the environment (checking out code, installing dependencies) and then calls the skill. That's it. This consistency means that any improvements made to a skill locally are automatically reflected in CI. It also means that debugging is easier: if a skill fails in CI, developers can reproduce the issue locally by running the same skill in a similar environment. The Fleet's CI jobs run autonomously, posting release notes, triaging issues, and even fixing bugs without human intervention. This approach reduces the overhead of maintaining separate test scripts and ensures that agent behavior is identical everywhere.

What are the benefits of using autonomous agent roles?

Using autonomous agent roles like the Fleet offers several key benefits. First, it dramatically reduces the manual effort required for testing, triaging, and release management. Instead of humans writing and maintaining scripts, agents autonomously perform these tasks, freeing up developers for more creative work. Second, agents use judgment, not just instructions, so they can adapt to unexpected situations—like a flaky test or a new platform issue—and investigate or escalate appropriately. Third, the 'local first, CI second' design means faster iteration during development and seamless deployment to CI. Fourth, the consistent skill file across environments eliminates the maintenance burden of separate scripts for different runtimes. Fifth, agents can operate around the clock, catching issues early in the development cycle. Finally, the Fleet scales easily: adding a new agent role is as simple as writing a new skill file. This modularity makes the system future-proof and adaptable to new requirements.