Unveiling Microsoft’s New Sync Event Log and the wonderful Backoff/Retry Mechanisms

by | Aug 28, 2024 | Blog

Earlier this week, I discussed the discovery of a new log file, AppWorkload.log, from the Intune Management Extension, highlighting its importance in app deployment and troubleshooting. In this blog, we’ll continue exploring the latest developments by diving into another significant update: the new Sync event log and the backoff/retry feature introduced by Microsoft.

This new Sync log and retry mechanism are designed to improve device management reliability, especially in environments with network challenges. We’ll explore the BackOff AKA Retry feature, its implementation, and why it’s critical for stable and efficient device management.

Introduction to the new Retry Schedule

In modern enterprise environments, devices must constantly communicate with management servers to receive updates, apply configurations, and report status. This continuous communication is vital for maintaining security, compliance, and functionality across an organization’s IT infrastructure. However, network conditions are not always optimal—connectivity issues, DNS failures, or server unavailability can disrupt these communications.

To manage such disruptions gracefully, the OMA-DM (Open Mobile Alliance Device Management) client got a nice update. In the latest 27686.1000 Windows Canary Build, Microsoft implemented a sophisticated “backoff” mechanism.

two new features were added to the latest canary inside build: 27686
Backoff_retry_scheduledtask and Checkfordisconnectednetworkduringbackoff

The Concept of Backoff/Retry in OMA-DM Client

The backoff mechanism is essentially a controlled retry logic designed to handle temporary communication failures between the client device and the management server. When the OMA-DM client encounters a failure—such as a DNS resolution error or a network timeout—it doesn’t immediately retry.

Instead, it introduces a delay before attempting to reconnect. This delay, which increases progressively with each subsequent failure, is intended to prevent overwhelming the network, reduce unnecessary power consumption, and allow time for the issue to be resolved.

The backoff mechanism is crucial for maintaining the efficiency of device management processes. This could be a big deal in environments where thousands of devices might be trying to connect to the server simultaneously.

How the Backoff Mechanism is Triggered

The backoff mechanism is triggered by specific error conditions encountered by the OMA-DM client during its operations. One such condition is a DNS resolution failure. This is a common scenario where the device cannot resolve the server’s hostname. This could be due to network issues or DNS server problems.

The moment a DNS error occurs, the corresponding error, 0x80072ee7, will be logged in the new Sync event log.

the oma-dm message failed to be sent. Result (unknown win32 error code.  0x80072ee7) which refers to some dns errors or network issues

This event will trigger the backoff process to manage the sync retries more effectively.

Detailed Examination of the BackOff Mechanism

Let’s examine how this back-off mechanism works, step by step.

Sync Operation Begins:

  • The Backoff process begins when the device attempts to sync with the server. This is managed by the SendDataToServer function in the omadmclient.exe.

  • The function is responsible for sending the device’s data to the server for synchronization.

inside the omadm client we will spot the senddatatoserver which kicks in when the device needs to sync with the service

DNS Error Occurs:

  • During the sync process, a DNS resolution failure can occur. This is a common network-related error represented by the error code 0x80072EE7 (ERROR_INTERNET_NAME_NOT_RESOLVED).

0x80072EE7 ERROR_INTERNET_NAME_NOT_RESOLVED
  • The SendDataToServer function detects this error and triggers the backoff mechanism.

Logging the DNS Error in the Sync Event lo:

  • When the DNS error is detected, it is logged in the new Sync event log (DeviceManagement-Enterprise-Diagnostics-ProviderSync). The event log captures the 0x80072ee7 error message and notes that a backoff retry will be attempted.

the dns error will be logged with the 0x80072ee7 error in the new Sync event log which showed up in the devicemanagement-enterprise-diagnostics-provider event log

Registry Key Check

  • Before the retry scheduled task is created, it checks specific registry keys to determine the backoff time.

when a dns error occurs, the backoff feature will be initialized and would check the oma dm settings: initialbackofftime and maxbackofftime.

InitialBackoffTime

  • This registry key specifies the initial delay (in milliseconds) before the first retry attempt after detecting a failure. The backoff mechanism uses this value to determine how long to wait before trying to sync again.
the initialbackofftime is set to 30000 milisecond by default and the maxbackofftime is set by 120000 miliseconds

MaxBackoffTime:

  • This registry key defines the maximum amount of time (in milliseconds) that the system will wait between retries. If multiple failures occur, the backoff time may increase exponentially up to this maximum value

Creating the Backoff Retry Task:

  • Once the DNS error is logged and the current omadm setings are checked, the system prepares to retry the sync operation. To manage this retry, a scheduled task is created using the ScheduleBackOffRetryTask function.

once the dns error occurs and the backoff feature kicks off, the retry schedule created for incomplete session will be created inside the enterprisemgt\sessionretry key
  • This function creates a task in the Windows Task Scheduler (MicrosoftWindowsEnterpriseMgmtSessionRetry.) that will attempt to retry the sync after a specified delay.

we will notice that procmon also mentions the retry schedule for incomplete session getting created

  

the retry schedule created for incomplete session showing up in the task scheduler

Queueing the Retry Task:

  • The task is queued in the Windows Task Scheduler and waits for the specified backoff time before retrying the sync operation.

the new sync event log showing that a request to store info for retry succeeded
  • This task is critical to ensuring that the device does not continuously retry too quickly, which could overload the server or cause further issues.

Retry Execution:

  • After the backoff time has elapsed, the scheduled task executes the sync operation again.

  • During each retry, the registry may be updated with details such as RetryDelay, reflecting the delay calculated for the next retry. This allows the system to keep track of how long it has waited between retries.

we will notice the retrydelay key been set reflecting the delay calculated for the next retry.
  • The backoff times (InitialBackoffTime and MaxBackoffTime) guide how long the system waits before retrying. These values are accessed but not updated during the process

  • The retry will be logged inside the new Sync event log:

when the first retry is happening this event will also be logged in the sync event log

The Retry Task

·       The Retry schedule created for incomplete session uses the DeviceEnroller with some specific parameters.

the scheduled retry task would launch the deviceenroller with some new paramaters. We will notice a new parameter: Initiationid
  • The task uses parameters such as the InitiationID (a GUID that uniquely identifies the session) to track the retry session. 

Task Deletion:

  • Once the retry session succeeded and the sync is successful again, the queued retry scheduled task is automatically deleted to prevent it from running unnecessarily.

if the retry finaly success the establish a connection when the dns or network issues are resolved we will notice that the retry schedule will be deleted
  • This cleanup ensures that only necessary tasks are present in the Task Scheduler, maintaining system efficiency.

Logging the Successful Sync in the new Sync Event log:

  • The new Sync event log is updated to reflect the successful completion of the sync operation.

With the retry session succeeded, the corrosponding event will also be logged in the new Sync event log
  • This log entry indicates that the backoff mechanism worked as intended and that the device is now synchronized.

This detailed process ensures that devices can recover from network-related issues like DNS resolution failures and eventually complete their sync operations, maintaining compliance and up-to-date configurations. Let’s zoom into the registry details and the event log a bit more.

The backoff mechanism registry keys

The backoff mechanism relies heavily on registry settings to determine the behavior of retry attempts. These registry keys are stored under specific paths within the Windows Registry, which the OMA-DM client reads and writes to during its operation. Key registry values include:

  • InitialBackOffTime: This DWORD value sets the initial delay before the first retry attempt. The value is typically represented in milliseconds, with 0x00007530 equating to 30,000 milliseconds or 30 seconds. This time is crucial as it prevents immediate and repeated retry attempts that could further strain the network.

  • MaxBackOffTime: Another DWORD value that sets the upper limit on the backoff time. For instance, 0x0001d4c0 represents 120,000 milliseconds or 120 seconds. This cap ensures that even after multiple failed attempts, the retry mechanism doesn’t delay indefinitely, allowing for periodic retrying to check if the network condition has improved.

  • ConnRetryFreq: This value controls how frequently retries should occur after the initial backoff time, providing a balance between giving enough time for potential network recovery and ensuring that retries are not too sparse to cause unnecessary delays.

  • ServerLastAccessTime, ServerLastFailureTime, ServerLastSuccessTime: These string values store timestamps of the last successful server access, the last failed attempt, and the last successful attempt, respectively. These are critical for diagnostics and understanding the communication patterns between the client and server.

These values are usually managed programmatically by the OMA-DM client and provide the flexibility to fine-tune the retry behavior depending on specific deployment scenarios.

Let’s take a look at the new Sync Event log

Throughout the backoff and retry process, detailed logs are generated and can be found in the Windows Event Viewer under DeviceManagement-Enterprise-Diagnostics-Providersync.

Thew new sync event log showing all the information we need when we need to troubleshoot sync issues

 Notably, the new Sync event log plays a critical role in enhancing visibility into synchronization activities, making it easier to distinguish between different types of events. These logs capture events such as:

  • The creation of a retry task

  • The scheduling of the retry session

  • The success or failure of retry attempts

  • Detailed information in the Sync event log, which provides insights into synchronization processes and related issues

  • The eventual outcome of the session, whether it succeeded or ultimately failed

By splitting the event logs, it becomes more evident which log is responsible for what, thereby allowing administrators to more effectively diagnose and resolve issues. These logs are invaluable for administrators who need to troubleshoot issues related to device management and understand why certain devices might be falling behind in updates or configurations.

Conclusion

The backoff feature within the OMA-DM client is a sophisticated mechanism designed to handle the realities of networked environments, where connections are not always stable or reliable. By intelligently managing retry attempts and leveraging the Windows Task Scheduler, the OMA-DM client ensures that devices remain in communication with management servers without overwhelming the network or the server itself.

Understanding this mechanism is crucial for IT administrators, as it provides insights into how devices handle communication failures and ensures that retries are done in a controlled and efficient manner. By leveraging this understanding, organizations can ensure smoother device management operations, even in challenging network environments.