tree: f7889f42be83501603a623ac9b901c86edb50567 [path history] [tgz]
  1. counter_perftest.cc
  2. cross_process_platform_thread_delegate.h
  3. hang_watcher.cc
  4. hang_watcher.h
  5. hang_watcher_unittest.cc
  6. OWNERS
  7. platform_thread.cc
  8. platform_thread.h
  9. platform_thread_android.cc
  10. platform_thread_apple.mm
  11. platform_thread_cros.cc
  12. platform_thread_fuchsia.cc
  13. platform_thread_internal_posix.cc
  14. platform_thread_internal_posix.h
  15. platform_thread_linux.cc
  16. platform_thread_linux_base.cc
  17. platform_thread_metrics.cc
  18. platform_thread_metrics.h
  19. platform_thread_metrics_apple.mm
  20. platform_thread_metrics_fuchsia.cc
  21. platform_thread_metrics_linux.cc
  22. platform_thread_metrics_unittest.cc
  23. platform_thread_metrics_win.cc
  24. platform_thread_nacl.cc
  25. platform_thread_posix.cc
  26. platform_thread_ref.cc
  27. platform_thread_ref.h
  28. platform_thread_unittest.cc
  29. platform_thread_win.cc
  30. platform_thread_win.h
  31. platform_thread_win_unittest.cc
  32. post_task_and_reply_impl.cc
  33. post_task_and_reply_impl.h
  34. post_task_and_reply_impl_unittest.cc
  35. README.md
  36. scoped_blocking_call.cc
  37. scoped_blocking_call.h
  38. scoped_blocking_call_internal.cc
  39. scoped_blocking_call_internal.h
  40. scoped_blocking_call_unittest.cc
  41. scoped_thread_priority.cc
  42. scoped_thread_priority.h
  43. scoped_thread_priority_unittest.cc
  44. sequence_bound.h
  45. sequence_bound_internal.h
  46. sequence_bound_unittest.cc
  47. sequence_local_storage_map.cc
  48. sequence_local_storage_map.h
  49. sequence_local_storage_map_unittest.cc
  50. sequence_local_storage_slot.cc
  51. sequence_local_storage_slot.h
  52. sequence_local_storage_slot_unittest.cc
  53. simple_thread.cc
  54. simple_thread.h
  55. simple_thread_unittest.cc
  56. thread.cc
  57. thread.h
  58. thread_checker.cc
  59. thread_checker.h
  60. thread_checker_impl.cc
  61. thread_checker_impl.h
  62. thread_checker_unittest.cc
  63. thread_collision_warner.cc
  64. thread_collision_warner.h
  65. thread_collision_warner_unittest.cc
  66. thread_id_name_manager.cc
  67. thread_id_name_manager.h
  68. thread_id_name_manager_unittest.cc
  69. thread_local.h
  70. thread_local_internal.h
  71. thread_local_storage.cc
  72. thread_local_storage.h
  73. thread_local_storage_perftest.cc
  74. thread_local_storage_posix.cc
  75. thread_local_storage_unittest.cc
  76. thread_local_storage_win.cc
  77. thread_local_unittest.cc
  78. thread_perftest.cc
  79. thread_restrictions.cc
  80. thread_restrictions.h
  81. thread_restrictions_unittest.cc
  82. thread_type_delegate.cc
  83. thread_type_delegate.h
  84. thread_unittest.cc
  85. threading_features.h
  86. watchdog.cc
  87. watchdog.h
  88. watchdog_unittest.cc
base/threading/README.md

HangWatcher

HangWatcher is a mechanism for detecting hangs in Chrome, logging their frequency and nature in UMA and uploading crash reports.

Definition of a hang

In this document a hang is qualified as any scope that does not complete within a certain wall-time allowance. A scope is defined by the lifetime of a WatchHangsInScope object. The time-out value can be different for each individual scope.

Example 1

A task on the IO thread encounters a lock on which it blocks for 20s. There is absolutely no progress made as the OS is bound to deschedule the thread while the contention on the lock remains. This is a hang.

Example 2

A small function that should execute relatively quickly spends 30s burning CPU without making any outwardly visible progress. In this case there is progress made by the thread in a sense, since the program counter is not static for the duration of the time-out. However, as far as Chrome, and critically its user, is concerned we are stuck and not making progress. This is a hang.

Example 3

A message pump is busy pumping millions of tasks and dispatches them quickly. The task at the end of the queue has to wait for up to 30s to get executed. This is not a hang. This is congestion. See //content/scheduler/responsiveness for more details.

Design

Hangs are monitored by one thread per process. This is a thread in the OS sense. It is not based on base::Thread and does not use the task posting APIs.

Other threads that want to be monitored register with this watcher thread. This can be done at thread creation or at any other time.

Monitored threads do not have any responsibilities apart from marking the entering and leaving of monitored scopes. This is done using a WatchHangsInScope object that is instantiated on the stack, at the beginning of the scope.

Example:

void FooBar(){
  WatchHangsInScope scope(base::TimeDelta::FromSeconds(5));
  DoWork();
}

The HangWatcher thread periodically traverses the list of registered threads and verifies that they are not hung within a monitored scope.

+-------------+       +-----------------+                       +-----------------+
| HangWatcher |       | WatchedThread1  |                       | WatchedThread2  |
+-------------+       +-----------------+                       +-----------------+
       |                       |                                         |
       | Init()                |                                         |
       |-------                |                                         |
       |      |                |                                         |
       |<------                |                                         |
       |                       |                                         |
       |            Register() |                                         |
       |<----------------------|                                         |
       |                       |                                         |
       |                       |                              Register() |
       |<----------------------------------------------------------------|
       |                       |                                         |
       |                       |                                         | SetDeadline()
       |                       |                                         |--------------
       |                       |                                         |             |
       |                       |                                         |<-------------
       |                       |                                         |
       |                       |                                         | ClearDeadline()
       |                       |                                         |----------------
       |                       |                                         |               |
       |                       |                                         |<---------------
       |                       |                                         |
       | Monitor()             |                                         |
       |---------------------->|                                         |
       |                       | ------------------------\               |
       |                       |-| No deadline, no hang. |               |
       |                       | |-----------------------|               |
       |                       |                                         |
       | Monitor()             |                                         |
       |---------------------------------------------------------------->|
       |                       |                                         | ------------------------\
       |                       |                                         |-| No deadline, no hang. |
       |                       |                                         | |-----------------------|
       |                       |                                         |
       |                       | SetDeadline()                           |
       |                       |--------------                           |
       |                       |             |                           |
       |                       |<-------------                           |
       |                       |                                         |
       | Monitor()             |                                         |
       |---------------------->| -------------------------------\        |
       |                       |-| Live expired deadline. Hang! |        |
       |                       | |------------------------------|        |
       |                       |                                         |
       | RecordHang()          |                                         |
       |-------------          |                                         |
       |            |          |                                         |
       |<------------          |                                         |
       |                       |                                         |

Protections against non-actionable reports

Ignoring normal long running code

There are cases where code is expected to take a long time to complete. It's possible to keep such cases from triggering the detection of a hang. Invoking HangWatcher::InvalidateActiveExpectations() from within a scope will make sure that not hangs are logged while execution is within it.

Example:

void RunTask(Task task) {
  // In general, tasks shouldn't hang.
  WatchHangsInScope scope(base::TimeDelta::FromSeconds(5));

  std::move(task.task).Run();  // Calls `TaskKnownToBeVeryLong`.
}

void TaskKnownToBeVeryLong() {
  // This particular function is known to take a long time. Never report it as a
  // hang.
  HangWatcher::InvalidateActiveExpectations();

  BlockWaitingForUserInput();
}

Protections against wrongfully blaming code

TODO

Ignoring system suspend

TODO