Fixes for shutting down during async operations#1141
Fixes for shutting down during async operations#1141bghgary wants to merge 25 commits intoBabylonJS:masterfrom
Conversation
da0d314 to
cc94258
Compare
If the shared_ptr is copied, the final release may end up on a different thread. Adding std::forward will ensure that the lambda is moved and not copied which will prevent multiple threads from owning the shared_ptr. The shared_ptr is being used as a hack anyways. The correct fix is to not use shared_ptr and make Dispatch use the Dispatchable class from AppRuntime or if we can use C++23 someday, use std::move_only_function.
| // | ||
| // private: | ||
| // arcana::cancellation_source m_cancellationSource; | ||
| // JsRuntimeScheduler m_runtimeScheduler; |
There was a problem hiding this comment.
Will you add a constructor to this example code just to show how the JsRuntimeScheduler is initialized?
There was a problem hiding this comment.
Yes, that's good idea.
|
|
||
| void JsRuntime::RegisterDisposing(JsRuntime& runtime, IDisposingCallback* callback) | ||
| { | ||
| auto& callbacks = runtime.m_disposingCallbacks; |
There was a problem hiding this comment.
How can we be certain this won't be called after NotifyDisposing (which invalidates m_disposingCallbacks)?
There was a problem hiding this comment.
It shouldn't happen assuming correct usage patterns with cancellation. But if it does happen, then callbacks registered after NotifyDisposing will not fire. All of these *Disposing functions must be called on the JavaScript thread.
|
|
||
| auto runtimeScheduler{std::make_unique<JsRuntimeScheduler>(JsRuntime::GetFromJavaScript(env))}; | ||
| MediaStream::NewAsync(env, videoConstraints).then(*runtimeScheduler, arcana::cancellation::none(), [runtimeScheduler = std::move(runtimeScheduler), env, deferred](const arcana::expected<Napi::Object, std::exception_ptr>& result) { | ||
| MediaStream::NewAsync(env, videoConstraints).then(arcana::inline_scheduler, arcana::cancellation::none(), [env, deferred](const arcana::expected<Napi::Object, std::exception_ptr>& result) { |
There was a problem hiding this comment.
Is inline_scheduler ok? Doesn't this mean that while it is still running on the JS thread, it's not tracked in the context of the JsRuntimeScheduler?
There was a problem hiding this comment.
I am enforcing that NewAsync must return on the JS thread, which it already does.
| if (time <= earliestTime) | ||
| { | ||
| m_runtime.Dispatch([this](Napi::Env) { | ||
| m_runtimeScheduler.Get()([this]() { |
There was a problem hiding this comment.
Can the ones in this class not just be m_runtime.Dispatch?
|
|
||
| SchedulerImpl m_scheduler; | ||
| std::atomic<int> m_count{0}; | ||
| arcana::manual_dispatcher<128> m_disposingDispatcher{}; |
There was a problem hiding this comment.
If each JsRuntimeScheduler instance has its own dispatch queue, and it is only pumped by a call to Rundown in the destructor, then if we ever have a situation where one object is trying to call into another object (as part of the async code executing during destruction), then the separate dispatch queue in the separate object will be in a state where it is getting pumped and we will basically deadlock. It seems like this is fixable by having a shared dispatch queue across all objects, and a given object's destructor's call to Rundown is pumping for all queued work. I don't think this breaks anything, but I think it would prevent this potential deadlock situation. I'm not sure where a shared dispatch queue would exist... maybe we'd need to push some of this logic up to JSRuntime.
|
|
||
| // Wait for async operations to complete. | ||
| m_runtimeScheduler.Rundown(); | ||
| } |
There was a problem hiding this comment.
Is this actually necessary?
| // HACK: This is a hack to make sure the camera device is destroyed on the JS thread. | ||
| // The napi-jsi adapter currently calls the destructors of JS objects possibly on the wrong thread. | ||
| // Once this is fixed, this hack will no longer be needed. |
There was a problem hiding this comment.
Do we have/need an issue created to address this hack in the future?
There was a problem hiding this comment.
I didn't find one. I will create one.
| if (m_cameraDevice != nullptr) | ||
| { | ||
| // The cameraDevice should be destroyed on the JS thread as it may need to access main thread resources | ||
| // move ownership of the cameraDevice to a lambda and dispatch it with the runtimeScheduler so the destructor |
| // an assert if there are outstanding schedulers not yet invoked. | ||
| // 2. The last continuation that accesses members of the N-API object, including the cancellation associated with | ||
| // the continuation, must capture a persistent reference to the N-API object itself to prevent the GC from | ||
| // collecting the N-API object during the asynchronous operation. Failing to do so will result in a hang |
There was a problem hiding this comment.
Add comment about lifetime of members during GC.
| AppRuntime::AppRuntime(std::function<void(const std::exception&)> unhandledExceptionHandler) | ||
| : m_workQueue{std::make_unique<WorkQueue>([this] { RunPlatformTier(); })} | ||
| , m_unhandledExceptionHandler{unhandledExceptionHandler} | ||
| : m_impl{std::make_unique<AppRuntimeImpl>(unhandledExceptionHandler)} |
There was a problem hiding this comment.
This this be:
: m_impl{std::make_unique<AppRuntimeImpl>(std::move(unhandledExceptionHandler))}
This change set up a coding pattern for invoking continuations on the JavaScript thread while properly handling garbage collection and shutdown scenarios. This pattern is described in the JsRuntimeScheduler class. See the class comment for more information.