HarmonyOS Automation Support

Midscene can drive the HDC (HarmonyOS Device Connector) tool to automate HarmonyOS NEXT devices.

Thanks to its visual model approach, the entire automation process works with any HarmonyOS app technology stack — whether ArkTS native or other frameworks. Developers only need to debug UI automation scripts against the final rendered interface.

The HarmonyOS UI automation solution includes all Midscene features:

  • Zero-code trial via Playground.
  • JavaScript SDK support.
  • YAML-based automation scripts and CLI tools.
  • HTML report generation for replaying all action paths.

Showcases

Prompt : Open Settings, scroll to find "About phone", view device information.

View the full report of this task: report.html

See more showcases: showcases

This guide walks you through everything needed to automate HarmonyOS devices with Midscene: connecting a real device via HDC, configuring model API keys, trying the zero-code Playground, and running your first JavaScript script.

Set up API keys for model

Set your model configs into the environment variables. You may refer to Model strategy for more details.

export MIDSCENE_MODEL_BASE_URL="https://replace-with-your-model-service-url/v1"
export MIDSCENE_MODEL_API_KEY="replace-with-your-api-key"
export MIDSCENE_MODEL_NAME="replace-with-your-model-name"
export MIDSCENE_MODEL_FAMILY="replace-with-your-model-family"

For more configuration details, please refer to Model strategy and Model configuration.

Prerequisites

Before writing scripts, verify that HDC can connect to your device and the device trusts the current computer.

Install HDC

HDC (HarmonyOS Device Connector) is a command-line tool provided by HarmonyOS for communicating with devices. Installation options:

Verify HDC is installed:

hdc version

A version number in the output confirms successful installation.

Configuring HDC Path

If hdc is not in your system PATH, set the HDC_HOME environment variable to the directory containing HDC:

export HDC_HOME=/path/to/hdc/directory

Enable Developer Mode and Verify Device

In your HarmonyOS device settings, go to Developer Options and enable USB Debugging, then connect via USB cable.

Verify the connection:

hdc list targets

A device ID in the output confirms a successful connection:

0123456789ABCDEF

Try Playground (Zero Code)

Playground is the fastest way to verify your connection and observe the AI Agent, without writing any code. It shares the same code implementation as @midscene/harmony, so flows validated in Playground will work identically when run via scripts.

  1. Launch the Playground CLI:
npx --yes @midscene/harmony-playground
  1. Click the gear button in the Playground window and paste your API Key configuration. If you don't have an API Key yet, go back to Model Configuration to get one.

Start Your Experience

After configuration, you can start using Midscene right away. It provides several key operation tabs:

  • Act: interact with the page. This is Auto Planning, corresponding to aiAct. For example:
Type “Midscene” in the search box, run the search, and open the first result
Fill out the registration form and make sure every field passes validation
  • Query: extract JSON data from the interface, corresponding to aiQuery.

Similar methods include aiBoolean(), aiNumber(), and aiString() for directly extracting booleans, numbers, and strings.

Extract the user ID from the page and return JSON data in the { id: string } structure
  • Assert: understand the page and assert; if the condition is not met, throw an error, corresponding to aiAssert.
There is a login button on the page, with a user agreement link below it
  • Tap: click on an element. This is Instant Action, corresponding to aiTap.
Click the login button

For the difference between Auto Planning and Instant Action, see the API document.

Integrate Midscene Agent

Once Playground runs successfully, you can switch to reusable JavaScript scripts.

Step 1: Install Dependencies

npm
yarn
pnpm
bun
deno
npm install @midscene/harmony dotenv --save-dev

Step 2: Write a Script

The following example opens the Settings app on the device and performs scrolling operations.

./demo.ts
import 'dotenv/config'; // load Midscene environment variables from .env if present
import {
  HarmonyAgent,
  HarmonyDevice,
  getConnectedDevices,
} from '@midscene/harmony';

const sleep = (ms) => new Promise((r) => setTimeout(r, ms));
Promise.resolve(
  (async () => {
    const devices = await getConnectedDevices();
    const device = new HarmonyDevice(devices[0].deviceId, {});

    const agent = new HarmonyAgent(device, {
      aiActionContext:
        'This is a HarmonyOS device. The system language is Chinese. If any popup appears, dismiss or agree to it.',
    });
    await device.connect();

    // Open Settings app
    await agent.launch('com.huawei.hmos.settings');
    await sleep(2000);

    // Scroll down
    await agent.aiAct('scroll down one screen');

    // Query page content
    const items = await agent.aiQuery(
      'string[], list all visible setting item names',
    );
    console.log('Settings items', items);

    // Assert
    await agent.aiAssert('There is a settings item list on the page');
  })(),
);

Step 3: Run the Example

npx tsx demo.ts

Step 4: View the Report

After a successful run, the console outputs Midscene - report file updated: /path/to/report/some_id.html. Open this HTML file in a browser to replay each interaction, query, and assertion.

Advanced

When you need to customize device behavior, integrate Midscene into a standalone framework, or troubleshoot HDC issues, refer to this section. See API Reference (HarmonyOS) for more constructor parameters.

Extending Midscene on HarmonyOS

Use defineAction() to define custom gestures and pass them via customActions. Midscene appends custom actions to the planner, allowing AI to invoke your domain-specific action names.

import { getMidsceneLocationSchema, z } from '@midscene/core';
import { defineAction } from '@midscene/core/device';
import { HarmonyAgent, HarmonyDevice, getConnectedDevices } from '@midscene/harmony';

const ContinuousClick = defineAction({
  name: 'continuousClick',
  description: 'Click the same target repeatedly',
  paramSchema: z.object({
    locate: getMidsceneLocationSchema(),
    count: z.number().int().positive().describe('How many times to click'),
  }),
  async call(param) {
    const { locate, count } = param;
    console.log('click target center', locate.center);
    console.log('click count', count);
  },
});

const devices = await getConnectedDevices();
const device = new HarmonyDevice(devices[0].deviceId, {});
await device.connect();

const agent = new HarmonyAgent(device, {
  customActions: [ContinuousClick],
});

await agent.aiAct('click the red button five times');

For more details on custom actions and action schemas, see Integrate with Any Interface.

More

Complete example (Vitest + HarmonyAgent)

import type { TestStatus } from '@midscene/core';
import { ReportMergingTool } from '@midscene/core/report';
import { sleep } from '@midscene/core/utils';
import {
  HarmonyAgent,
  HarmonyDevice,
  getConnectedDevices,
} from '@midscene/harmony';
import {
  afterAll,
  afterEach,
  beforeAll,
  beforeEach,
  describe,
  it,
} from 'vitest';

describe('HarmonyOS Settings Test', () => {
  let device: HarmonyDevice;
  let agent: HarmonyAgent;
  let itTestStatus: TestStatus = 'passed';
  const reportMergingTool = new ReportMergingTool();

  beforeAll(async () => {
    const devices = await getConnectedDevices();
    device = new HarmonyDevice(devices[0].deviceId);
    await device.connect();
  });

  beforeEach((ctx) => {
    agent = new HarmonyAgent(device, {
      groupName: ctx.task.name,
    });
  });

  afterEach((ctx) => {
    if (ctx.task.result?.state === 'pass') {
      itTestStatus = 'passed';
    } else if (ctx.task.result?.state === 'skip') {
      itTestStatus = 'skipped';
    } else if (ctx.task.result?.errors?.[0].message.includes('timed out')) {
      itTestStatus = 'timedOut';
    } else {
      itTestStatus = 'failed';
    }
    reportMergingTool.append({
      reportFilePath: agent.reportFile as string,
      reportAttributes: {
        testId: `${ctx.task.name}`,
        testTitle: `${ctx.task.name}`,
        testDescription: 'description',
        testDuration: (Date.now() - ctx.task.result?.startTime!) | 0,
        testStatus: itTestStatus,
      },
    });
  });

  afterAll(() => {
    reportMergingTool.mergeReports('my-harmony-setting-test-report');
  });

  it('toggle WLAN', async () => {
    await device.home();
    await sleep(1000);
    await device.launch('com.huawei.hmos.settings');
    await sleep(1000);
    await agent.aiAct('find and enter WLAN setting');
    await agent.aiAct(
      'toggle WLAN status *once*, if WLAN is off please turn it on, otherwise turn it off.',
    );
  });

  it('toggle Bluetooth', async () => {
    await device.home();
    await sleep(1000);
    await device.launch('com.huawei.hmos.settings');
    await sleep(1000);
    await agent.aiAct('find and enter Bluetooth setting');
    await agent.aiAct(
      'toggle Bluetooth status *once*, if Bluetooth is off please turn it on, otherwise turn it off.',
    );
  });
});
Tip

Merged reports are stored inside midscene_run/report by default. Override the directory with MIDSCENE_RUN_DIR when running in CI.

FAQ

Keyboard is not dismissed or the page goes back after typing

Midscene automatically dismisses the keyboard after entering text. By default, HarmonyOS uses the ESC key so the current page is less likely to navigate back. If ESC does not close the keyboard in your app, switch to Back first:

const device = new HarmonyDevice('device-id', {
  keyboardDismissStrategy: 'back-first',
});

If your input field listens for Back and clears or closes in response, disable auto keyboard dismiss:

const device = new HarmonyDevice('device-id', {
  autoDismissKeyboard: false,
});

With auto dismiss disabled, the keyboard will remain visible. You can use aiAct to manually dismiss it, e.g. await agent.aiAct('dismiss the keyboard').

How to use a custom HDC path?

Set the HDC_HOME environment variable to point to the HDC directory:

export HDC_HOME=/path/to/hdc/directory

Or pass it via the constructor:

const device = new HarmonyDevice('0123456789ABCDEF', {
  hdcPath: '/path/to/hdc',
});

API reference

When you need to customize device behavior, integrate Midscene into a framework, or troubleshoot HDC issues, refer to this section. For common constructor parameters (reports, hooks, caching, etc.), see the platform-agnostic API Reference.

Action Space

HarmonyDevice uses the following action space. The Midscene Agent can use these operations when planning tasks:

  • Tap — Tap on an element.
  • DoubleClick — Double-tap on an element.
  • Input — Input text, supporting replace/typeOnly/clear modes.
  • Scroll — Scroll from an element or screen center in any direction, supporting scroll-to-top/bottom/left/right.
  • DragAndDrop — Drag from one element to another.
  • KeyboardPress — Press a specific key.
  • LongPress — Long-press a target element with optional custom duration.
  • ClearInput — Clear input field contents.
  • Pinch — Not supported. The HarmonyOS uitest framework does not provide multi-touch input APIs.
  • Launch — Open a HarmonyOS app (bundle name).
  • Terminate — Force-stop a HarmonyOS app by bundle name.
  • RunHdcShell — Execute a raw hdc shell command.
  • HarmonyBackButton — Trigger system back.
  • HarmonyHomeButton — Return to home screen.
  • HarmonyRecentAppsButton — Open recent apps / multitasking.

HarmonyDevice

Creates an HDC device instance that can be driven by HarmonyAgent.

Import

import { HarmonyDevice, getConnectedDevices } from '@midscene/harmony';

Constructor

const device = new HarmonyDevice(deviceId, {
  // device options...
});

Device Options

  • deviceId: string — Value from hdc list targets or getConnectedDevices().
  • hdcPath?: string — Custom path to the HDC executable. If not set, it searches HDC_HOME environment variable and common installation paths.
  • autoDismissKeyboard?: boolean — Automatically hide keyboard after input, default true.
  • keyboardDismissStrategy?: 'esc-first' | 'back-first' — Key preference for automatically hiding the keyboard, default 'esc-first'. HarmonyOS sends the first key from the strategy only: 'esc-first' sends ESC, while 'back-first' sends Back.
  • screenshotResizeScale?: numberDeprecated. This option has been removed and no longer has any effect. Use screenshotShrinkFactor in AgentOpt instead to control screenshot size sent to the AI model.
  • customActions?: DeviceAction[] — Extend the planner's available actions via defineAction.

Usage Notes

  • Use getConnectedDevices() to discover devices. The returned deviceId matches hdc list targets output.
  • If HDC is not in your system PATH, specify it via the HDC_HOME environment variable or the hdcPath option.

Examples

Quick Start
import { HarmonyAgent, HarmonyDevice, getConnectedDevices } from '@midscene/harmony';

const [first] = await getConnectedDevices();
const device = new HarmonyDevice(first.deviceId, {});
await device.connect();

const agent = new HarmonyAgent(device, {
  aiActionContext: 'This is a HarmonyOS device. If any popup appears, agree to it.',
});

await agent.launch('com.huawei.hmos.settings');
await agent.aiAct('scroll down one screen');
const items = await agent.aiQuery(
  'string[], list all visible setting item names',
);
console.log(items);
Launch Apps
await agent.launch('com.huawei.hmos.settings'); // Open Settings
await agent.launch('com.huawei.hmos.camera');    // Open Camera
await agent.back();
await agent.home();

HarmonyAgent

Binds Midscene's AI planning capabilities to a HarmonyDevice for UI automation.

Import

import { HarmonyAgent } from '@midscene/harmony';

Constructor

const agent = new HarmonyAgent(device, {
  // common Agent options...
});

HarmonyOS-Specific Options

  • customActions?: DeviceAction[] — Extend the planner's available actions via defineAction.
  • appNameMapping?: Record<string, string> — Map friendly app names to bundle names. When you pass an app name to launch(target), the Agent looks up the corresponding bundle name in this mapping; if no mapping is found, it tries to launch target as-is.
  • Other fields are the same as API constructors: generateReport, reportFileName, aiActionContext, modelConfig, cacheId, createOpenAIClient, onTaskStartTip, etc.

Usage Notes

Info

HarmonyOS-Specific Methods

agent.launch()

Launch a HarmonyOS app.

function launch(uri: string): Promise<void>;
  • uri: string — Can be an app bundle name (e.g., com.huawei.hmos.settings), or an app name registered in appNameMapping. If a URL starting with http:// or https:// is passed, it opens via the browser.
await agent.launch('com.huawei.hmos.settings'); // Open Settings
await agent.launch('com.huawei.hmos.camera');    // Open Camera
agent.runHdcShell()

Run a raw hdc shell command on the connected device.

function runHdcShell(command: string): Promise<string>;
  • command: string — The command passed directly to hdc shell.
const result = await agent.runHdcShell('hidumper -s RenderService -a screen');
console.log(result);
agent.terminate()

Terminate (force-stop) a running HarmonyOS app.

function terminate(uri: string): Promise<void>;
  • uri: string — Bundle name, app name in appNameMapping, or bundle/Ability (only the bundle part is used).
await agent.terminate('com.huawei.hmos.settings');
  • agent.back(): Promise<void> — Trigger HarmonyOS system back.
  • agent.home(): Promise<void> — Return to home screen.
  • agent.recentApps(): Promise<void> — Open recent apps / multitasking.

Utilities

agentFromHdcDevice()

Create a HarmonyAgent from any connected HDC device.

function agentFromHdcDevice(
  deviceId?: string,
  opts?: HarmonyAgentOpt & HarmonyDeviceOpt,
): Promise<HarmonyAgent>;
  • deviceId?: string — Connect to a specific device; leave empty for "first available device".
  • opts?: HarmonyAgentOpt & HarmonyDeviceOpt — Merge Agent options and HarmonyDevice settings in a single object.
import { agentFromHdcDevice } from '@midscene/harmony';

const agent = await agentFromHdcDevice('0123456789ABCDEF'); // specific device
const agent = await agentFromHdcDevice(); // first available device
getConnectedDevices()

List HDC devices that Midscene can drive.

function getConnectedDevices(
  hdcPath?: string,
): Promise<Array<{ deviceId: string }>>;
import { getConnectedDevices } from '@midscene/harmony';

const devices = await getConnectedDevices();
console.log(devices); // [{ deviceId: '0123456789ABCDEF' }]