How to choose your first AI pilot

Jan 26

Written By Karen Boyd, PhD. Economist at PIC

Once you decide to run an AI pilot, the next question jumps out right away: where do we start?

Vendors will happily suggest use cases for you. Colleagues will bring you long wish lists. Some ideas sound exciting but risky; others feel safe but not worth the effort.

Most organizations can imagine dozens of possible AI use cases. If you chase them all at once, you’ll confuse and over-burden staff (and learn very little). Focusing on high-impact cases is a great way to also start with the highest-risk parts of your work.

This post offers a practical way to pick a first pilot workflow that is:

Low enough risk that you can sleep at night
Annoying enough today that staff will actually feel the benefit
Rich enough in learning that it sets you up with information you can use for future pilots
Not irreversible or high-stakes

By the end, you’ll have a shortlist of candidate workflows and a simple rubric you can use to choose one.

Step 1: Start with a concrete problem

Before you talk about tools, ground the project in a plain-language version of the problem.

Try to finish these sentences:

“If this pilot works, our staff would…”
“If this pilot works, our clients or residents would…”
“If this pilot works, our board or funders would…”

Examples:

A housing nonprofit: “If this pilot works, staff would spend less time retyping the same information into three different systems, and applicants would get faster updates about their status.”
A city arts commission: “If this pilot works, staff would spend less time summarizing public comments and more time spotting themes and designing better programs.”

Selecting a specific problem will help avoid implementing AI for AI’s sake.

Keep the scope tight. You’re looking for one workflow, or even just one task. When in doubt, go for the smaller scope.

Step 2: Talk to the people who do the work

Pilots go better when they start with real pain points that staff care about and are based in a deep, specific knowledge of the workflow.

This could take the form of interviews, a journaling exercise, or focus groups. Focus groups can save time, but be aware of power dynamics (don’t put people in the same group as their boss) and how group dynamics (like groupthink) can influence your results.

Ask open ended questions (and don’t “lead the witness!”) For example:

Where does your work feel slow or repetitive?
Where do you see a lot of rework or preventable errors?
Where are we offering a lower quantity or quality of service because we don’t have enough time?

Capture concrete tasks, not abstract goals. For example:

“Turning program updates into tailored emails for different partners”
“Drafting first versions of case notes from structured fields and bullet points
“Summarizing approximately 50 public comments into 5-10 themes with example quotes and exceptions noted”
“Translating technical guidance into lay language for residents.”

Aim for several candidate tasks across a few workflows. You’ll narrow them down in the next steps.

As you are doing this (and future steps) ethical questions may pop up.

Don’t ignore these questions! Write them down along with the candidate task. For example:

“Draft first versions of case notes from structured fields and bullet points”
[Note: Worried about privacy, HIPAA. Will automation bias lead us to override our professional judgment?]

These notes are going to help you later! You will decide to eliminate, prioritize, accommodate, or monitor tasks with ethical risks.

Step 3: Screen out “not now” tasks

Some ideas are tempting but too risky for a first or early pilot.

For your first round, set these aside:

Deciding eligibility or benefits.
Handling highly-sensitive data in unvetted tools.
Areas already under legal or public scrutiny.
Tasks that would require heavy integration work you can’t realistically fund yet.
Tasks that would immediately raise fears of job cuts.

We’re not saying you can never implement AI in tasks that fall under one of these higher-risk categories, but a few successful full implementations will build a foundation of skills, guardrails, and trust.

Step 4: Use a simple rubric to rank the remaining workflows

Now you have a list of candidate workflows. To turn that into a shortlist, you can use a lightweight scoring rubric. Here are some items you can consider putting on your rubric:

Mission impact

How critical is this task or workflow to executing your mission?

Example: Summarizing public comments for a major policy decision might rate a 4. Formatting internal slide decks might rate a 2. Determining eligibility for services is likely a 5.

Risk and reversibility

If the AI makes mistakes, how easy is it for staff to catch and correct them?
Can you confine the pilot to a small group, time period, or non-binding drafts?
Can you turn the pilot off quickly without breaking essential services?

First pilots are usually strongest when they score high on reversibility, low on risk. For example, drafting staff-facing summaries that humans always review is far safer than drafting letters that auto-send to clients.

Effort and pain today

How many hours does this task absorb each week or month?
How much does it contribute to staff burnout or backlogs?
Are people already complaining about this part of the job?

A good pilot should tackle work that people will be delighted to offload or streamline, not work they barely notice or that they see as a rewarding part of their job.

Data and process readiness

Is there a clear, repeatable process today, even if it’s clunky?
Do you have examples of “good” outputs that could train or guide AI assistance?
Are the inputs already digital and accessible?

Tasks with a documented process and a stack of past examples make better pilot material than work that lives entirely in someone’s head. Once you’ve learned from a few pilots, you can start asking people to document their process and knowledge.

Learning value and reusability

If this pilot works, will the lessons apply to other programs or departments?
Will you learn something about prompts, guardrails, or workflows that you can reuse?

For instance, a pilot that teaches you how to safely summarize long documents could apply to grants, policy memos, case notes, and board reports.

Step 5: Run an equity and trust check on your shortlist

Once you rank your candidate workflows, you’ll likely end up with 2–4 front-runners. Before you choose one, pause for an equity and trust check.

For each top candidate, ask:

Who benefits if this goes well? Is it staff, clients with easier access, communities with clearer information, or all of the above?
Who could be harmed or left out? Could translation errors confuse non-native speakers? Could summaries flatten or erase the voices of smaller groups? Could people with limited digital access face new barriers?
Does the pilot create new power imbalances? For example, does it make it harder for people to reach a human when they really need one?
What transparency and feedback do we owe people affected by this pilot? Even if the pilot is internal, do unions, advisory bodies, or community partners need a heads-up?

Step 6: Decide, name an owner, and write one clear pilot question

Now: pick one. Avoid the temptation to run three pilots at once. Focus lets you learn faster.

For the chosen workflow:

Name a pilot owner. One person, not a committee, should be responsible for shepherding the pilot, coordinating with IT and legal, and keeping stakeholders informed.
Write a single, clear pilot question.
This question will guide your metrics and evaluation plan in the next post. A good pilot question looks like:

“Can AI help our program staff draft first versions of case notes quickly enough to save at least 20% of their time on documentation, while keeping quality at or above our current standard?”
“Can AI help us summarize public meeting notes into clear, plain-language themes that staff and community members agree reflect what was said?”

Capture your reasoning.
Jot down why you chose this workflow: the scores on your rubric, what you heard from staff, and how you weighed equity and trust. You’ll be documenting your process and results: this is the start.

What’s next

In the next post in this series, we’ll talk about what to measure and how to set up measurement so you can tell whether your pilot actually helps: not just for efficiency, but for service quality, workforce wellbeing, equity, and trust.

For now, if you can say “Yes, we know what we’re piloting and why,” you’re ready for the next step.

Karen Boyd, PhD. Economist at PIC https://drkarenboyd.com