Rolling out your first AI pilot

Designing your first AI pilot gives you a chance not only to learn about and safely rollout your first AI implementation, but also to set the foundation for a faster and more effective pilot in the future. It is a critical stage for mission-driven organizations because it allows you to test assumptions, identify potential harms to your values, and build organizational knowledge.

Here is how to design a pilot rollout that balances innovation with your organization’s integrity.

Who Should Test?

  • Enthusiasts and Skeptics: Include both people who are excited to explore the tool and  those who have concerns about quality or ethics. Engaging skeptics early helps identify potential problems that enthusiasts might overlook and helps build broader buy-in.

  • Domain Experts: Ensure participants have sufficient expertise in the specific task to recognize subtle AI errors or hallucinations.

  • Representative Staff: Talk to workers who perform the tactical, front-line tasks. They understand the "workarounds" and hidden inefficiencies that leadership might not see.

  • A "Super-User" Network: Identify a few "super-users" who can provide peer support and explain concepts in the context of your organization’s daily work.

What are the steps?

While there is no universal timeline, a pilot should be long enough to move past the initial "learning dip" where performance often temporarily drops as people adjust to new workflows.

  • Establish a Baseline: Before the pilot begins, measure current performance for a few weeks to have a clear comparison point. Use this time to work out the kinks in your measurement schema and get at least 4 weeks of good data before changing things.

  • Iterative Rollout: Rather than a sudden rollout, implement AI gradually. Start with your first pilot group and expand gradually: team by team or based on need. This gives staff time to adapt and provides opportunities for you to adjust the strategy based on early findings.

  • Identify Success Criteria: A pilot can be considered tentatively successful once it meets pre-established success criteria, such as maintaining error rates below a specific threshold (e.g., 5%) for a set period.

Taking in Feedback

Effective pilots require active management and multi-layered feedback channels to ensure the technology is actually serving your mission.

  • Frequent Touch-points: In the first week, check in with participants as often as once a day to catch access issues or confusion. After the first week, move to regular weekly check-ins.

  • Open-Ended Qualitative Feedback: Ask questions that reveal the "human dimension" of the change, such as: "What is harder now?" "Where are you uncertain?" or "How has the nature of your work changed?" “Is there anything you wish you could do with this tool, but you haven’t figured out yet?”

  • Error Logging: Maintain an "incident report" log to document what went wrong and how the organization responded. Encourage a culture where questioning AI outputs is seen as a critical professional skill.

  • Failure planning: If something bad happens while they are using the tool as intended, it should be clear to staff who to talk to so that the technology, policy, or practices can adapt to avoid the error in the future. There may be types of failure that require shutting down the pilot for a time: make those clear in advance. 

With a thoughtful rollout, you can reduce risk, harness AI’s strengths, and build the foundation for future implementations. 



LLM disclosure:
I created out an outline of the rest of the pilot series and asked ChatGPT 5.2 to draft posts based on my outlines. I find this is a great way to make sure the AI saves me time, but doesn’t guide my thinking.

Next
Next

Pilots: What to Measure & How