HarmonyOS Automation Support
Midscene can drive the HDC (HarmonyOS Device Connector) tool to automate HarmonyOS NEXT devices.
Thanks to its visual model approach, the entire automation process works with any HarmonyOS app technology stack — whether ArkTS native or other frameworks. Developers only need to debug UI automation scripts against the final rendered interface.
The HarmonyOS UI automation solution includes all Midscene features:
- Zero-code trial via Playground.
- JavaScript SDK support.
- YAML-based automation scripts and CLI tools.
- HTML report generation for replaying all action paths.
Showcases
Prompt : Open Settings, scroll to find "About phone", view device information.
View the full report of this task: report.html
See more showcases: showcases
This guide walks you through everything needed to automate HarmonyOS devices with Midscene: connecting a real device via HDC, configuring model API keys, trying the zero-code Playground, and running your first JavaScript script.
Control HarmonyOS devices with JavaScript: https://github.com/web-infra-dev/midscene-example/blob/main/harmony/javascript-sdk-demo
Integrate Vitest for testing: https://github.com/web-infra-dev/midscene-example/tree/main/harmony/vitest-demo
Set up API keys for model
Set your model configs into the environment variables. You may refer to Model strategy for more details.
For more configuration details, please refer to Model strategy and Model configuration.
Prerequisites
Before writing scripts, verify that HDC can connect to your device and the device trusts the current computer.
Install HDC
HDC (HarmonyOS Device Connector) is a command-line tool provided by HarmonyOS for communicating with devices. Installation options:
- Via DevEco Studio (recommended)
- Via HarmonyOS command-line tools standalone installation
Verify HDC is installed:
A version number in the output confirms successful installation.
If hdc is not in your system PATH, set the HDC_HOME environment variable to the directory containing HDC:
Enable Developer Mode and Verify Device
In your HarmonyOS device settings, go to Developer Options and enable USB Debugging, then connect via USB cable.
Verify the connection:
A device ID in the output confirms a successful connection:
Try Playground (Zero Code)
Playground is the fastest way to verify your connection and observe the AI Agent, without writing any code. It shares the same code implementation as @midscene/harmony, so flows validated in Playground will work identically when run via scripts.
- Launch the Playground CLI:
- Click the gear button in the Playground window and paste your API Key configuration. If you don't have an API Key yet, go back to Model Configuration to get one.
Start Your Experience
After configuration, you can start using Midscene right away. It provides several key operation tabs:
- Act: interact with the page. This is Auto Planning, corresponding to
aiAct. For example:
- Query: extract JSON data from the interface, corresponding to
aiQuery.
Similar methods include aiBoolean(), aiNumber(), and aiString() for directly extracting booleans, numbers, and strings.
- Assert: understand the page and assert; if the condition is not met, throw an error, corresponding to
aiAssert.
- Tap: click on an element. This is Instant Action, corresponding to
aiTap.
For the difference between Auto Planning and Instant Action, see the API document.
Integrate Midscene Agent
Once Playground runs successfully, you can switch to reusable JavaScript scripts.
Step 1: Install Dependencies
Step 2: Write a Script
The following example opens the Settings app on the device and performs scrolling operations.
Step 3: Run the Example
Step 4: View the Report
After a successful run, the console outputs Midscene - report file updated: /path/to/report/some_id.html. Open this HTML file in a browser to replay each interaction, query, and assertion.
Advanced
When you need to customize device behavior, integrate Midscene into a standalone framework, or troubleshoot HDC issues, refer to this section. See API Reference (HarmonyOS) for more constructor parameters.
Extending Midscene on HarmonyOS
Use defineAction() to define custom gestures and pass them via customActions. Midscene appends custom actions to the planner, allowing AI to invoke your domain-specific action names.
For more details on custom actions and action schemas, see Integrate with Any Interface.
More
- View all Agent methods: API Reference (Common)
- HarmonyOS-specific parameters and interfaces: HarmonyOS Agent API
- Demo projects
- HarmonyOS JavaScript SDK demo: https://github.com/web-infra-dev/midscene-example/blob/main/harmony/javascript-sdk-demo
- HarmonyOS + Vitest demo: https://github.com/web-infra-dev/midscene-example/tree/main/harmony/vitest-demo
Complete example (Vitest + HarmonyAgent)
Merged reports are stored inside midscene_run/report by default. Override the directory with MIDSCENE_RUN_DIR when running in CI.
FAQ
Keyboard is not dismissed or the page goes back after typing
Midscene automatically dismisses the keyboard after entering text. By default, HarmonyOS uses the ESC key so the current page is less likely to navigate back. If ESC does not close the keyboard in your app, switch to Back first:
If your input field listens for Back and clears or closes in response, disable auto keyboard dismiss:
With auto dismiss disabled, the keyboard will remain visible. You can use aiAct to manually dismiss it, e.g. await agent.aiAct('dismiss the keyboard').
How to use a custom HDC path?
Set the HDC_HOME environment variable to point to the HDC directory:
Or pass it via the constructor:
API reference
When you need to customize device behavior, integrate Midscene into a framework, or troubleshoot HDC issues, refer to this section. For common constructor parameters (reports, hooks, caching, etc.), see the platform-agnostic API Reference.
Action Space
HarmonyDevice uses the following action space. The Midscene Agent can use these operations when planning tasks:
Tap— Tap on an element.DoubleClick— Double-tap on an element.Input— Input text, supportingreplace/typeOnly/clearmodes.Scroll— Scroll from an element or screen center in any direction, supporting scroll-to-top/bottom/left/right.DragAndDrop— Drag from one element to another.KeyboardPress— Press a specific key.LongPress— Long-press a target element with optional custom duration.ClearInput— Clear input field contents.— Not supported. The HarmonyOSPinchuitestframework does not provide multi-touch input APIs.Launch— Open a HarmonyOS app (bundle name).Terminate— Force-stop a HarmonyOS app by bundle name.RunHdcShell— Execute a rawhdc shellcommand.HarmonyBackButton— Trigger system back.HarmonyHomeButton— Return to home screen.HarmonyRecentAppsButton— Open recent apps / multitasking.
HarmonyDevice
Creates an HDC device instance that can be driven by HarmonyAgent.
Import
Constructor
Device Options
deviceId: string— Value fromhdc list targetsorgetConnectedDevices().hdcPath?: string— Custom path to the HDC executable. If not set, it searchesHDC_HOMEenvironment variable and common installation paths.autoDismissKeyboard?: boolean— Automatically hide keyboard after input, defaulttrue.keyboardDismissStrategy?: 'esc-first' | 'back-first'— Key preference for automatically hiding the keyboard, default'esc-first'. HarmonyOS sends the first key from the strategy only:'esc-first'sends ESC, while'back-first'sends Back.screenshotResizeScale?: number— Deprecated. This option has been removed and no longer has any effect. UsescreenshotShrinkFactorinAgentOptinstead to control screenshot size sent to the AI model.customActions?: DeviceAction[]— Extend the planner's available actions viadefineAction.
Usage Notes
- Use
getConnectedDevices()to discover devices. The returneddeviceIdmatcheshdc list targetsoutput. - If HDC is not in your system PATH, specify it via the
HDC_HOMEenvironment variable or thehdcPathoption.
Examples
Quick Start
Launch Apps
HarmonyAgent
Binds Midscene's AI planning capabilities to a HarmonyDevice for UI automation.
Import
Constructor
HarmonyOS-Specific Options
customActions?: DeviceAction[]— Extend the planner's available actions viadefineAction.appNameMapping?: Record<string, string>— Map friendly app names to bundle names. When you pass an app name tolaunch(target), the Agent looks up the corresponding bundle name in this mapping; if no mapping is found, it tries to launchtargetas-is.- Other fields are the same as API constructors:
generateReport,reportFileName,aiActionContext,modelConfig,cacheId,createOpenAIClient,onTaskStartTip, etc.
Usage Notes
- One device connection corresponds to one Agent.
- HarmonyOS-specific helpers like
launch,terminate, andrunHdcShellcan also be used in YAML scripts. See HarmonyOS platform-specific actions. - For common interaction methods, see API Reference (Common).
HarmonyOS-Specific Methods
agent.launch()
Launch a HarmonyOS app.
uri: string— Can be an app bundle name (e.g.,com.huawei.hmos.settings), or an app name registered inappNameMapping. If a URL starting withhttp://orhttps://is passed, it opens via the browser.
agent.runHdcShell()
Run a raw hdc shell command on the connected device.
command: string— The command passed directly tohdc shell.
agent.terminate()
Terminate (force-stop) a running HarmonyOS app.
uri: string— Bundle name, app name inappNameMapping, orbundle/Ability(only the bundle part is used).
Navigation Helpers
agent.back(): Promise<void>— Trigger HarmonyOS system back.agent.home(): Promise<void>— Return to home screen.agent.recentApps(): Promise<void>— Open recent apps / multitasking.
Utilities
agentFromHdcDevice()
Create a HarmonyAgent from any connected HDC device.
deviceId?: string— Connect to a specific device; leave empty for "first available device".opts?: HarmonyAgentOpt & HarmonyDeviceOpt— Merge Agent options andHarmonyDevicesettings in a single object.
getConnectedDevices()
List HDC devices that Midscene can drive.

