Workflow: Building an App Knowledge Base
This is the complete walkthrough for building a knowledge base from scratch — from first launch to saved checkpoints and executable test flows. It takes 1-2 hours for a medium-complexity app (20-40 screens), and the result is an artifact that saves that time back on every subsequent agent session.
Phase 1: Setup (10 minutes)
Section titled “Phase 1: Setup (10 minutes)”Pre-flight
Section titled “Pre-flight”Before you start the tour, prepare the simulator to minimize friction:
“Grant location permission for my app, disable password autofill on the simulator, and boot a simulator”
Your agent handles the permissions (grant_permission), disables autofill (xcrun simctl spawn ... defaults write -g AutoFillPasswords -bool NO), and gets a device ready. This prevents system permission prompts and save-password dialogs from interrupting the tour.
If your app has test accounts, create one now:
“Create a test account with enough activity to skip any first-time-user experience”
Apps often have onboarding flows that trigger below a certain activity threshold (e.g., fewer than 4 finds, no completed orders). A test account that clears these thresholds means the tour focuses on the real app, not onboarding.
Initialize
Section titled “Initialize”“Initialize an app knowledge base for this project”
Your agent scaffolds the .quern/knowledge/ directory. The first files to fill in are app.md (bundle ID, URL scheme, universal link domains), states.md (auth, subscription, onboarding modes), and environments.md (staging vs production, how to switch).
Tell the agent these up front — they’re things the agent can’t discover from the UI:
“The bundle ID is com.example.myapp. Staging uses staging.example.com, production uses api.example.com. There are Basic and Premium tiers — Premium unlocks filtering, lists, and statistics. Accounts with fewer than 3 orders see a first-time tutorial.”
This shapes the entire tour. The agent now knows which features are gated, which screens depend on account state, and what interceptors to expect.
Phase 2: The Tour (45-90 minutes)
Section titled “Phase 2: The Tour (45-90 minutes)”Launch and Orient
Section titled “Launch and Orient”“Launch the app, log in with the test account, and start documenting screens”
Your agent logs in, handles any onboarding screens, and arrives at the home screen. It documents the tab bar, primary navigation structure, and entry points. As it moves through screens, it creates documents for each one and stubs for screens it discovers but doesn’t visit yet.
Your job during the tour: Watch the simulator and interject when you notice something the agent got wrong, missed, or wouldn’t know. You don’t need to answer questions — just correct and add:
- “That screen has a detail section that expands when you tap the username”
- “The first time you interact with each item type, an explainer modal pops up”
- “That’s a web view — the accessibility tree won’t see the content inside it”
- “There’s also a celebration screen that appears after every successful submission”
These corrections become the most valuable entries in the knowledge base.
Top-Level Screens First
Section titled “Top-Level Screens First”The tour works from the outside in. Start with all the top-level screens — every tab, every settings section header — before diving into sub-screens. This builds the navigation graph first, so deeper exploration has a map to work from.
“Visit each tab and document what you find, then go through Settings”
Sub-Screens
Section titled “Sub-Screens”After the top level is documented, work through the sub-screens systematically. Settings sub-screens, profile sections, detail views for the main content types. The agent can check its progress:
“How many screens are documented vs stubs?”
init_app_knowledge reports the counts and lists which stubs still need visiting.
Alerts as You Find Them
Section titled “Alerts as You Find Them”Every time an unexpected modal appears — stop and document it. Don’t dismiss it and move on. Alerts are the single highest-value entry type because they’re the most common cause of agent failure in future sessions.
“Document that coaching tip as an alert before dismissing it”
For each alert, ask yourself: does this appear every time, or just once? Can it be suppressed via a plist flag? Which screens can it appear on?
Premium and Alternate States
Section titled “Premium and Alternate States”If your app has subscription tiers or role-based features, do a second pass with a different account type:
“Sign out, create a Premium account, and log in. Let’s see what’s different.”
Document which screens gain new elements, which locked features become accessible, and which new modals appear (coaching tips for premium features, onboarding for newly unlocked sections).
Phase 3: Flows (30 minutes)
Section titled “Phase 3: Flows (30 minutes)”After screens are documented, trace the most important user journeys end-to-end. This is where the knowledge base proves its value — and where you’ll discover interceptors that the screen-by-screen tour missed.
Write the Flow
Section titled “Write the Flow”“Write a flow document for logging in and reaching the home screen”
Your agent drafts the flow using the screen docs it already has: step-by-step tool calls with verification at each hop.
Execute the Flow
Section titled “Execute the Flow”“Now actually execute that flow on the simulator”
This is the critical test. The flow will encounter things the screen docs didn’t predict:
- A coaching tip that only appears during multi-step actions
- A celebration/survey screen after completing an action
- A permission prompt triggered by a specific feature, not at launch
- An action sheet that appears between two screens
Each of these becomes an alert document, and the flow document gets updated with the recovery steps. After execution, the flow is verified — an agent can follow it and it will work.
Core Flows to Document
Section titled “Core Flows to Document”Most apps need these flows documented:
- Login — from app launch to the home screen, handling onboarding, permissions, and environment selection
- The primary user action — whatever the app’s main purpose is (placing an order, posting content, completing a task)
- Navigation to a specific item — the fastest path to view a particular piece of content
- Settings change — toggling a preference that affects app behavior
Each flow should note shortcuts (state restoration, deep links) that skip early steps.
Phase 4: State Discovery (20 minutes)
Section titled “Phase 4: State Discovery (20 minutes)”This is where the knowledge base goes from “navigation guide” to “test infrastructure.”
Find the State Flags
Section titled “Find the State Flags”“Read the app’s preferences plist and look for flags that control coaching modals, onboarding state, or feature toggles”
Your agent reads the plist files in the app’s data container and app group containers. Look for keys with patterns like:
hasShown*,hasSeen*,didComplete*— coaching/onboarding flagsisStaging*,environment*— server environmentfeatureFlags*,experimentSettings*— feature toggles
Many apps store their coaching state in an app group container (shared with widgets/extensions) rather than the main app plist. If the main plist is mostly third-party SDK data, check for an app group:
“Check for app group containers and read their plists too”
Document the Flags
Section titled “Document the Flags”Create a quirk document mapping each flag to the behavior it controls:
kHasSeenWelcomeScreen→ onboarding slideskHasSeenHelpPopUpTraditional→ cache type explainer for Traditional cacheskHasSeenFavoriteViaLogIntroMomentHelp→ favorite point coaching tip
Update the alert documents with suppression information — which flag to set, in which container, to prevent the alert from appearing.
Verify the Round-Trip
Section titled “Verify the Round-Trip”Test that setting a flag actually changes behavior:
“Set the coaching flag for the Mystery explainer to true, relaunch the app, and tap a Mystery item. Does the explainer still appear?”
If it doesn’t appear — the flag works. The agent can now pre-configure any combination of coaching states.
Save Checkpoints
Section titled “Save Checkpoints”Once everything is configured — all coaching modals dismissed, correct account logged in, environment selected — save the state:
“Save the app state as ‘premium-clean-all-tips-dismissed’”
This checkpoint is the foundation for all future testing. Restoring it takes seconds and puts the app in a known, clean state with no modals, no onboarding, no surprises.
Save multiple checkpoints for different scenarios:
logged-out-staging— for testing login flowsbasic-clean— Basic account, all tips dismissedpremium-clean— Premium account, all tips dismissedfresh-install— no state at all, for testing onboarding
Phase 5: Using the Knowledge Base
Section titled “Phase 5: Using the Knowledge Base”Live Agent Sessions
Section titled “Live Agent Sessions”An agent starting a test session reads the knowledge base and immediately has:
- A map of every screen and how to reach it
- Stable element selectors for every interactive element
- Advance knowledge of every modal that might interrupt
- Pre-built flows for common tasks
- Checkpoint restoration for instant setup
Instead of:
“Navigate to the profile screen” (agent taps around, hits a modal, recovers, finds the tab bar, taps Profile)
The agent executes:
restore_app_state label="premium-clean"launch_app bundle_id="com.example.myapp"tap_element label="Profile" element_type="radioButton"Programmatic Test Scripts
Section titled “Programmatic Test Scripts”The knowledge base entries translate directly to test scripts. A flow document is essentially pseudocode for a test:
# From flows/find-and-log-cache.md, step by step:launch_app("com.example.myapp")tap_element(label="Map", element_type="radioButton")tap_element(identifier="_Search button")tap_element(identifier="_Geocache button")type_text("GC8AB9F\n")wait_for_element(identifier="_Log button", timeout=10)# ...handle cache type explainer if it appears...tap_element(identifier="_Log button")tap_element(identifier="_Found it button")# ...handle favorite point tip if premium...type_text("TFTC! Great cache.")tap_element(label="Post", element_type="button")# ...handle post-log celebration...tap_element(identifier="_Continue button")The knowledge base tells the script exactly which interceptors to handle and how. See Agent-Generated Test Scripts for the full pattern.
State-Driven Test Scenarios
Section titled “State-Driven Test Scenarios”The plist flags documented in Phase 4 enable precise test scenarios:
Test the onboarding flow:
# Delete all coaching flags to trigger fresh onboardingdelete_app_plist_key(key="kHasSeenWelcomeScreen")delete_app_plist_key(key="kHasSeenDisclaimerScreen")launch_app("com.example.myapp")# Verify onboarding screens appearTest a specific coaching modal:
# Restore clean state, then clear one specific flagrestore_app_state(label="premium-clean")delete_app_plist_key(key="kHasSeenFavoriteViaLogIntroMomentHelp")launch_app("com.example.myapp")# Navigate to log entry and verify the tip appearsTest environment switching:
# Flip from staging to production via plistset_app_plist_value(key="INTERNAL_isStagingServerEnvironmentKey", value=False)launch_app("com.example.myapp")# Verify production behaviorThis is surgical control. Instead of manually navigating to trigger a specific state, you set exactly the flags you need and verify exactly the behavior you expect.
Identifier Audit
Section titled “Identifier Audit”The tour naturally produces an inventory of accessibility identifier issues — missing identifiers, shared identifiers, misleading identifiers. Collect these in a quirk document and share with the dev team. Each fix is usually a single line of code, and the return is immediate: more reliable automation for both agents and traditional UI tests.
Maintenance
Section titled “Maintenance”When to Update
Section titled “When to Update”- New screen added — run a mini-tour for just that screen
- Screen layout changed — visit the screen, compare to the doc, update
- New coaching modal added — document the alert, find the plist flag
- Accessibility identifiers changed — update the screen doc and the identifier audit
Staleness Detection
Section titled “Staleness Detection”The knowledge base docs reference specific accessibility identifiers and element labels. When these change in the app code, the docs are stale. If your project has an AccessibilityIdentifiers.swift or similar central file, diffs to that file are a reliable signal that screen docs may need updating.
Incremental Tours
Section titled “Incremental Tours”You don’t need to redo the full tour. When the agent encounters something on screen that doesn’t match the knowledge base, it updates the doc. Over time, the knowledge base stays current through natural use — every agent session that touches a screen verifies and corrects the documentation for that screen.
- The first tour takes the longest. Subsequent updates are much faster because the agent already has the navigation graph and can reach any screen directly.
- Don’t document everything to the same depth. Core screens (home, detail views, login) deserve full documentation. Niche settings sub-screens might only need a stub with their
reachable_fromedge. - Flow testing is non-negotiable. Flows find interceptors that screen-by-screen tours miss. Always execute your flows on the simulator before considering them done.
- Save checkpoints aggressively. A checkpoint takes seconds to create and saves minutes every time it’s restored. Save one after any multi-step setup you don’t want to repeat.
- The plist discovery is worth the time. 20 minutes mapping coaching flags gives you programmatic control over every modal in the app. That’s a permanent capability.
- Web view screens are a known limitation. The accessibility tree can’t see into web views. Document these prominently so agents know to use coordinate-based tapping or skip them.
- Share the knowledge base with the team. It’s not just for agents — the identifier audit, state flag reference, and flow documentation are useful for any developer working on the app.