Improving Native Message Host Reliability on Windows

Last Update: Nov 28, 2023

Update: This change was checked into Chromium 113 before being backed out. The plan is to eventually turn it on-by-default, so extension authors really should read this post and update their extensions if needed.

The feature was relanded inside Chrome Canary version 115.0.5789.0. It’s off-by-default, behind a flag on the chrome://flags#launch-windows-native-hosts-directly page.

In Chrome 120.0.6090+ and Edge 120+, a Group Policy NativeHostsExecutablesLaunchDirectly allows admins to turn this on for users in restricted environments (Cloud PCs that forbid cmd.exe, for example).


Background

Previously, I’ve written about Chromium’s Native Messaging functionality that allows a browser extension to talk to a process running outside of the browser’s sandbox, and I shared a Native Messaging Debugger I wrote that allows you to monitor (and even tamper with) the communication channels between the browser extension and the Host App.

Obscure Problems on Windows

Native Messaging is a powerful capability, and a common choice for building extensions that need to interact with the rest of the system. However, over the years, users have reported a trail of bugs related to how the feature is implemented on Windows. While these bugs are typically only seen in uncommon configurations, they could break Native Messaging entirely for some users.

Some examples include:

  • crbug/335558 – Ampersand in Host’s path prevents launching (Fixed in 118)
  • crbug/387228 – Broken if %comspec% not pointed at cmd.exe
  • crbug/387233 – Broken when cmd.exe is disabled or set to RUNASADMIN

While the details of each of these issues differ, they all have the same root cause: On Windows, Chromium did not launch Native Message Hosts directly, instead launching cmd.exe (Windows’ console command prompt) and directing it to launch the target Host:

This approach provided two benefits: it enabled developers to implement Hosts using languages like Python, whose scripts are not directly executable in Windows, and it enabled support for Windows XP, where the APIs did not allow Chromium to easily set up the communication channel between the browser and the Native Host.

Unfortunately, the cmd-in-the-middle design meant that anything that prevented cmd.exe from running (387233, 387228) or that prevented it from starting the Host (335558) would cause the flow to fail. While these configurations tend to be uncommon (which is why the problems have existed for ten years), they also tend to be very very hard to recognize/diagnose, and the impacted customers often have little recourse short of abandoning the extension platform.

The Fix

So, over a few nights and weekends, I landed a changelist in Chromium to improve this scenario for Chromium 113.0.5656 and later. This change means that Chrome, Edge (version 113.0.1769+), and other Chromium-derived browsers will now directly invoke any Native Host that is a Windows Executable (.exe) rather than going through cmd.exe instead:

This change will reach the Stable Channel of Chrome and Edge (v113) in the last week of April 2023.

Native Hosts that are not implemented by executables (e.g. Python scripts or the like) will continue to use the old codepath.

I’ve got my fingers crossed that effectively no one even notices this change, with the exception of those unfortunate users who were encountering the old bugs who will now find that they can use previously-broken extensions.

However, this change also fixes two other bugs that were caused by the cmd-in-the-middle flow and those changes could cause problems if your Windows executable was not aware of the expected behavior for Native Hosts.

(In)Visibility

When Chromium launches a native host, it sets a start_hidden flag to prevent any UI from popping up from the host. That flag prevents the proxy cmd.exe‘s UI window (conhost.exe) from appearing on the screen. This start_hidden flag means that console-based (subsystem:console) Windows applications remain invisible during native-messaging communications. However, the start_hidden flag didn’t flow through to non-console applications (e.g. subsystem:Windows), like my Native Messaging Debugger application, which is built atop C#’s WinForms and meant to be seen by the user.

UPDATE: In the new version of this change that is available in version 115+, the browser will now look at headers inside of the target EXE. If the executable targets SUBSYSTEM:CONSOLE, it will be hidden as described in this section. If it targets SUBSYSTEM:WINDOWS (indicating a GUI application), the start_hidden flag will be set to false.

This compatibility accommodation will not resolve ALL problems, however. If you have a console app that occasionally shows a UI (e.g. a Windows certificate selection dialog box, for example) you will need to ensure that your app calls ShowWindow() explicitly.

The new Direct Launch for Executables flow changes this– now Windows .exe files are started hidden, meaning that they’re not visible to the user by default. Surprisingly, this might not be obvious to the application’s code; for example, checking frmMain.Visible in my WinForms startup code still returned true even though the window was not displayed to the user.

Fixing this in my Host was simple— I just explicitly call ShowWindow() in the application’s main form’s Load event handler:

// Inside the form's class:
private const int SW_SHOW = 5;
[DllImport("User32")]
private static extern int ShowWindow(int hwnd, int nCmdShow);

// Inside Form_Load():
ShowWindow((int)this.Handle, SW_SHOW);

While this works great for WinForms apps, depending on your app’s logic, you could conceivably need to call ShowWindow() twice due to some surprising behavior in Windows.

The Terminator

When a Native Host is no longer needed, either because the Extension’s sendNativeMessage() got a reply from the Host, or the disconnect() method was called (either explicitly or during garbage collection) on the port returned from connectNative(), Chromium shuts down the Native Host connection. First, it closes the stdin and stdout pipes that it established to communicate with the new process. Then, it checks whether the new process has exited itself (typical), and if not, sets up a timer to call Windows’ TerminateProcess() two seconds later if the Host is still running.

In the cmd.exe flow, this process termination was effectively a no-op, and a Host that did not self-terminate was always left running. You can see this with my Native Messaging Debugger app — the pipes close, but the UI remains alive.

In the new direct launch flow, the Host is reliably terminated two seconds after the pipes disconnect, if-and-only if Chromium is still running. (If the Host disconnects because Chromium is exiting entirely, the Host process’s pipes are detached but the Host process is not terminated… likely a longstanding bug where Chromium’s two-second callback is aborted during shutdown.)

While this is the intended design (preventing process leaks), unfortunately I’m not aware of an easy way for a Host that doesn’t want to exit to keep itself alive. Unlike many shutdown-type events, Windows does not allow a process to “decline” termination… it’s just there one moment and gone the next1. A Windows process can use a DACL trick to deny Process Terminate rights on handle non-elevated applications who get a handle to the process, but unfortunately this isn’t sufficient here, because Chromium gets a handle without this restriction as it launches the Host, before the Host process has a chance to protect itself. If your App truly needs to outlive the browser itself, you could either launch it via a .bat file, or you could have your Native Host itself be a stub that acts as an IPC proxy to the rest of your App.

Bonus Bug Fix

The Chromium documentation mentions that a native host can write error messages out to std_error and those error messages will be collected in Chrome’s standard error output log, which can be enabled by launching Chrome like:

chrome.exe --enable-logging 2>C:\temp\log.txt

However, prior to the new direct launch flow this did not work. For example, you can see that Chrome 112 does not pass the std_error handle through to the Native Host process, instead passing 0:

In contrast, when the Native Host is launched from Chrome 113, the handle properly points at the file-backed std_error handle inherited from Chrome:

Side Effect #1: Closing StdIn

Update: Mar 22, 2023: The developer of a popular extension found another behavior change in the new codepath that caused their extension to unexpectedly stop working. It’s a subtle issue, and hopefully theirs is the only one that will hit it.

What happened?

With the new launch flow, if your Host has an outstanding Read() on the Standard Input (stdin) handle, if you attempt to close that handle:

// Don't do this!
CloseHandle(GetStdHandle(StdIn));

…that function will now block unless/until the the Read() operation completes. If you were issuing this CloseHandle() call on the UI thread, your Host will hang until Chromium gets around to terminating your Host process, which could cause problems for your Host if it expected to perform any other cleanup after disconnecting.

The best fix for this issue is to simply not call CloseHandle(), because you don’t need to! All three STDIO handles will be correctly closed when your process exits in a few seconds anyway, so there’s no need to manually close the handle yourself.

If you really want to manually the handle, you can first call the function CancelIoEx(GetStdHandle(STD_INPUT_HANDLE), NULL); before calling CloseHandle(), but to reiterate, there’s really no good reason to bother closing the handle yourself.

Side Effect #2: Process Parent Changed

Update: Apr 18, 2023: A user of the 1Password Browser Extension found that it will no longer correctly connect to the NativeHost. The NativeHost launches, examines its runtime environment, and exits without returning a message to the extension.

When you look at the NativeHost’s log, you find that the client deliberately refuses the connection from the new browser:

opw_app::managers::browser_manager:52 > failed to validate browser. Error: opw-app\src\nmh.rs:133 untrusted chromium browser

Based on the logs, it appears what happens is that 1Password.exe walks up the process tree, from 1Password.exe to Chrome.exe to whatever launched Chrome.

Old: 1Password.exe -> cmd.exe -> Chrome.exe
New: 1Password.exe -> Chrome.exe -> WhateverLaunchedChrome.exe

For example, if you launched Chrome from Explorer, you’ll see:

opw_app::managers::browser_manager:52 > failed to validate browser. Error: opw-app\src\nmh.rs:133 untrusted chromium browser
  name: explorer, publisher: Microsoft Windows, pid: 9104, session id: 1, path: C:\Windows\explorer.exe, version: 10.0.19041.2846

Whereas if you launched Chrome from the SlickRun launcher, you’ll see:

opw_app::managers::browser_manager:52 > failed to validate browser. Error: opw-app\src\nmh.rs:133 untrusted chromium browser
 name: sr, publisher: Eric Lawrence, pid: 13424, session id: 1, path: C:\Program Files\SlickRun\sr.exe, version: 4.4.9.2

While other NativeHosts requiring the old behavior could be easily accommodated (e.g. by pointing the Host’s manifest.json at a simple batch file that launches the Host), 1Password cannot be fixed like this because their anti-tampering logic forbids it.

In general, Native Hosts should avoid any reliance on the particular process tree of their launch context, as any number of things (including this change) could cause such checks to become flaky.

(Fixed in v115) Side Effect #3: std_error

Update: May 2, 2023: The fix for this issue landed in Chrome r1135573 for version 115.0.5736.0.

A developer noticed that in the old cmd.exe flow, when the browser is started (as it is by default) without the std_error handle redirected to a file or pipe, the handle passed to the Native Host was 0, while with the new direct launch flow, the handle is INVALID_HANDLE_VALUE. While neither handle value would allow the Host to write to standard error (because there’s nothing listening), some frameworks appear to check for 0 but not INVALID_HANDLE_VALUE and will cause failures if the latter value is received. The fix for this issue in v115 reverts back to passing 0 in this case.

If you encounter another-side effect or scenario that visibly changes with this new flow enabled in Chromium-based browsers v115 and later, please let me know ASAP!

-Eric

1 It took me some time to actually figure out what was happening here. My Native Messaging Debugger app started disappearing after the pipes closed, and I didn’t know why. I assumed that an unhandled exception must be silently crashing my app. I finally figured out what was happening using the awesome Silent Process Exit debugger option inside gflags:

Published by ericlaw

Impatient optimist. Dad. Author/speaker. Created Fiddler & SlickRun. PM @ Microsoft 2001-2012, and 2018-, working on Office, IE, and Edge. Now a GPM for Microsoft Defender. My words are my own, I do not speak for any other entity.

6 thoughts on “Improving Native Message Host Reliability on Windows

  1. Wow, I definitely need to check this on work computers now. OWA S/MIME has always been problematic on DoD computers since the giant move to M365. Cmd.exe tends to be disabled as a baseline config, so most people give up using OWA S/MIME and just use the desktop app. Thanks for another great post (and the Chromium update)!

  2. Got a problem with this change in 113.beta, Native Message Host (NMH) extension unusable.

    So far I’ve found it has to do with differences in stderr handling

    1. The NMH writes short info to stderr on startup. Don’t know if this is allowed but till 112.stable this worked. With 113 I get error 0xc0000417 in WinApp Event-Log (“Invalid C-Runtime Parameter”?).

    2. Removing this write to stderr leads to:
    “Failed to make stderr stream inheritable by child process (access denied)” triggered by a non-working SetHandleInformation(stderrhandle, HANDLE_FLAG_INHERIT, HANDLE_FLAG_INHERIT))
    (This is in a library for spawning an internal sub-process from the NMH.)

    3. If chromium is started with –enable-logging –v=1 starting an additional logger command window everything works like in 112.stable. So far I know then the NMH stderr output is also captured into the chrome.log

    4. The start of the NMH executable using a wrapper .bat seems to work w/o problems.

    5. In ‘native_process_launcher_win.cc’ we find:

    // If Chrome was launched with |stderr| attached, inherit it into
    // the Native Host.
    options.stderr_handle = GetStdHandle(STD_ERROR_HANDLE);
    if (options.stderr_handle) {
    options.handles_to_inherit.push_back(options.stderr_handle);
    } else {
    options.stderr_handle = INVALID_HANDLE_VALUE;
    }
    return base::LaunchProcess(command, options);

    So probably without chrome.log enabled there is now no inheritable stderr available (INVALID_HANDLE_VALUE) where the NMH could write into and I might need to change the NMH logic to disable it in this case or use my own stderr stream?
    What do you think might be the easiest solution here? Thanks in advance.

    1. Yes, basically, Chromium now only passes a stderr handle to the native host if it has one itself.

      Would you mind filing a bug in CRBug.com and sharing the link with me? That way we can discuss with the area owners in Chromium. Thanks!

  3. Hello Eric, thanks for the explanation.
    With the latest chrome update to Version 113.0.5672.64 (Official Build) (64-bit) we started to have problems, as you already explain:

    “When Chromium launches a native host, it sets a start_hidden flag to prevent any UI from popping up from the host.”

    Is there some solution with which we will be able to override this flag and set it to options.start_hidden = false; in order UI to be visible. We have a lot of users which have already installed this native host and we can not use your proposed solution to change the code of the host and release new version. Is there some other option, to update extension for example?

    Thanks

    1. Thanks for reaching out! Sharing more information about the specific Native Host would be helpful (e.g. is it a public extension? Who made it? etc). It would also be helpful to understand what specifically prevents updating the Native Host (is it simply inconvenient, or is there a broader reason why this isn’t possible?).

      Unfortunately, no, there’s nothing on the Chrome extension side which controls visibility of the Native Host — it was always meant to start hidden, and there’s nothing on the extension side to change that. Beyond fixing the native host itself, the only other workaround cannot be achieved without updating the Native Host’s manifest (so that instead of pointing directly at the .EXE, it instead points at a .CMD which in turn launches the .EXE). That workaround is useful when you can’t change the .EXE for some reason (e.g. you don’t have the source code), but it still requires updating the client PCs.

Leave a comment