WebDriverAgent Guide
WebDriverAgent (WDA) is how Quern controls physical iOS devices — tapping buttons, reading screen content, typing text, swiping. It runs on the device and exposes an API that your agent talks to. Quern manages WDA automatically, but understanding how it works helps when things go sideways.
What You Need to Know
Section titled “What You Need to Know”On simulators, Quern uses idb (Facebook’s iOS Development Bridge) for UI automation. No setup needed — it works out of the box.
On physical devices, Quern needs WDA. Your agent will set it up the first time you interact with a physical device. Here’s what happens and what might require your input.
First Time
Section titled “First Time”Tell your agent you want to work with a physical device, and it will call setup_wda. This:
- Discovers your signing identities from Xcode
- Clones the WDA repo
- Builds and installs it on the device as “QuernDriver”
The build is cached — subsequent sessions skip it unless something changes.
Signing Identity Selection
Section titled “Signing Identity Selection”If you have multiple Apple Developer teams in Xcode, your agent will ask which one to use. Pick the one you use for development on this device.
Free vs Paid Developer Accounts
Section titled “Free vs Paid Developer Accounts”This matters more than you’d think.
Paid Account ($99/year)
Section titled “Paid Account ($99/year)”- Provisioning profiles last 1 year
- Wildcard App IDs (one covers everything)
- Quern Driver (WDA) just works, indefinitely
- No device trust step required — apps signed by paid accounts are trusted automatically
Free Account (Apple ID, no enrollment fee)
Section titled “Free Account (Apple ID, no enrollment fee)”- Profiles expire after 7 days
- No wildcard App IDs — each bundle identifier uses a slot
- ~3 active App ID slots per 7-day window
- Quern Driver uses 2 slots (
dev.quern.driver+dev.quern.driver.xctrunner), leaving ~1 for your actual app - Must re-setup WDA every 7 days
If you’re on a free account, tell your agent. It will warn you about slot limits and profile expiry. When Quern Driver stops working after 7 days, tell your agent to rebuild it:
“Rebuild WDA — my profile expired”
Device Trust (Free Accounts Only)
Section titled “Device Trust (Free Accounts Only)”Free developer accounts require you to manually trust the developer profile on the device before Quern Driver can run:
Settings > General > VPN & Device Management > [your developer name] > Trust
You need to do this on the device itself — it’s a one-time step per developer identity per device. If Quern Driver won’t launch at all and you’re on a free account, this is almost certainly why. The runner log will show something about being unable to launch the app.
How Your Agent Detects This
Section titled “How Your Agent Detects This”When WDA setup discovers your account type, it includes warnings in the response. Your agent should surface these to you — things like “this is a free account, profiles expire in 7 days” and “Quern Driver is using 2 of your ~3 App ID slots.”
How the Driver Works
Section titled “How the Driver Works”When your agent interacts with a physical device (tap, screenshot, read screen), Quern automatically:
- Checks if a WDA driver process is running for that device
- If not, launches one
- Waits for WDA’s HTTP server to respond
- Routes the command through WDA
On iOS 17+, the connection goes through tunneld via IPv6. On older iOS, it uses a local port-forward over USB. This is transparent — you don’t need to know or care which path is used.
Auto-Recovery
Section titled “Auto-Recovery”WDA sessions go stale (device locks, app crashes, timeout). Quern handles this automatically:
- Session expired: Creates a new session and retries
- Connection lost: Restarts the driver process and retries
- Transport error: Same recovery as connection loss
If a command fails and succeeds on retry, that’s the recovery system working as designed.
Finding Elements
Section titled “Finding Elements”When your agent taps a button or reads screen content, it’s finding UI elements through the accessibility tree. Understanding how this works helps you build apps that are easier for the agent to work with.
What the Agent Looks For
Section titled “What the Agent Looks For”Your agent prefers these strategies, in order:
- Accessibility identifier — The fastest and most reliable. If your code sets
accessibilityIdentifier = "login-button", the agent finds it instantly. - Label text — The display text (“Sign In”, “Settings”). Works well when labels are unique.
- Label + type — When multiple elements share a label, adding the type narrows it down (“the Button labeled Edit”, not “the StaticText labeled Edit”).
- Advanced queries — NSPredicate expressions and class chain syntax for complex cases. Your agent knows how to use these when simpler approaches fail.
Common Element Types
Section titled “Common Element Types”| Type | What |
|---|---|
Button | UIButton, SwiftUI Button |
TextField | UITextField, SwiftUI TextField |
SecureTextField | Password fields |
StaticText | UILabel, SwiftUI Text |
Switch | UISwitch, SwiftUI Toggle |
Cell | Table/collection view cells |
NavigationBar | Navigation bar container |
TabBar | Tab bar container |
Alert | System and custom alerts |
SearchField | Search bars |
Complex Screens
Section titled “Complex Screens”On screens with many elements (large lists, MapKit, complex collection views), the full accessibility tree query can be slow. Quern has a fallback “skeleton” strategy that queries just the top-level containers (navigation bars, tab bars, toolbars, alerts) and their immediate children. This is automatic — your agent gets usable results even when the full tree times out.
Designing Apps for AI Automation
Section titled “Designing Apps for AI Automation”How you build your UI directly affects how well the agent can work with it.
- Set
accessibilityIdentifieron key interactive elements. This is the single most impactful thing you can do. Identifiers are stable across localizations, UI redesigns, and dynamic content.
loginButton.accessibilityIdentifier = "login-submit-button"emailField.accessibilityIdentifier = "login-email-field"-
Use standard UIKit/SwiftUI controls. They have built-in accessibility support. A
UIButtonis tappable and discoverable; a customUIViewwith a tap gesture isn’t (unless you add accessibility traits). -
Use distinct labels. Three buttons all labeled “Edit” means the agent can’t tell them apart without identifiers or structural context.
-
Don’t present views over complex screens without truly replacing them. If you push a simple modal over a complex screen (a list with hundreds of cells, a map view), the elements underneath still appear in the accessibility tree — even though the user can’t see them. The agent sees a polluted tree full of irrelevant elements from the screen behind the modal, making it hard to find what’s actually on screen. Use proper modal presentation (
.fullScreenCoverin SwiftUI,modalPresentationStyle = .fullScreenin UIKit) or remove the underlying view’s accessibility when it’s covered. -
Don’t rely on complex custom gestures. WDA supports tap, swipe, and long-press. No pinch-to-zoom, 3D touch, or custom multi-finger gestures. Provide alternative navigation paths if your app uses these.
-
Don’t make UI state depend on animations completing. The agent can tap before an animation finishes. It uses
wait_for_elementto wait for targets to appear rather than guessing timing.
Common Failure Patterns
Section titled “Common Failure Patterns”When WDA fails to start, Quern parses the runner log and tells your agent what went wrong:
| What You’ll Hear | What It Means | What to Do |
|---|---|---|
| ”Profile expired” or “Supported platforms empty” | Signing profile is invalid | Tell your agent to rebuild WDA |
| ”Device is locked” | Screen lock is on | Unlock the device |
| ”App not trusted” | Developer profile not trusted (free accounts) | Settings > VPN & Device Management > Trust |
| ”Entitlement mismatch” | WDA was reinstalled with different signing | Tell your agent to force-rebuild WDA |
| ”No signing certificate” | Xcode doesn’t have a valid cert | Xcode > Settings > Accounts > Manage Certificates |
| ”Maximum number of apps” | Free account slot limit | Wait 7 days for slots to free up, or use a paid account |
| ”Device is not available” | Device disconnected | Reconnect USB cable |
Runner logs are at ~/.quern/wda/runner-<udid-prefix>.log if you need to dig deeper.
Known Limitations
Section titled “Known Limitations”- No side/power button. Would kill the WDA process. Can’t simulate it.
- No brightness control. No API exists. Workaround for dark rooms: face the device down — USB still carries the video signal for live preview.
- No mute switch. Hardware switch, not software-controllable.
- No system UI. WDA can only interact with the frontmost app. Can’t dismiss system alerts (except notification banners), open Control Center, or use Spotlight.
- Orientation is app-level only. WDA can rotate the app content but doesn’t physically rotate the screen. The live preview always shows the native orientation.
- Slow on older devices. Accessibility tree queries on iPhone 8 / iPad Air 2 era devices can be slow. Your agent will increase timeouts or use the skeleton strategy automatically.