Skip to content

fix(profiling): reduce shutdown CPU#3645

Open
morrisonlevi wants to merge 1 commit intomasterfrom
levi/shutdown-cpu
Open

fix(profiling): reduce shutdown CPU#3645
morrisonlevi wants to merge 1 commit intomasterfrom
levi/shutdown-cpu

Conversation

@morrisonlevi
Copy link
Collaborator

@morrisonlevi morrisonlevi commented Feb 12, 2026

Description

We've observed that we'll spend a good amount of CPU doing the busy loop, meaning the system decided not really to yield the CPU. Sleeping will actually yield the CPU, which increases the chance that the other thread will do its work. I think 100ms should still feel responsive.

This is not particularly problematic. It's mostly annoying to see in the profiles when trying to identify areas to attack to reduce it. This is why I marked it "fix" and not "perf."

Reviewer checklist

  • Test coverage seems ok.
  • Appropriate labels assigned.

@morrisonlevi morrisonlevi added the profiling Relates to the Continuous Profiler label Feb 12, 2026
Base automatically changed from levi/chore-lock to master February 12, 2026 16:15
We've observed that we'll spend a good amount of CPU doing the busy
loop, meaning the system decided not really to yield the CPU. Sleeping
will actually yield the CPU, which increases the chance that the other
thread will do its work.

This is not particularly problematic. It's mostly annoying to see in
the profiles when trying to identify areas to attack to reduce it.
@datadog-official
Copy link

datadog-official bot commented Feb 12, 2026

⚠️ Tests

Fix all issues with Cursor

⚠️ Warnings

🧪 1026 Tests failed

    testSearchPhpBinaries from integration.DDTrace\Tests\Integration\PHPInstallerTest (Fix with Cursor)

    testSimplePushAndProcess from laravel-58-test.DDTrace\Tests\Integrations\Laravel\V5_8\QueueTest (Fix with Cursor)

testSimplePushAndProcess from laravel-8x-test.DDTrace\Tests\Integrations\Laravel\V8_x\QueueTest (Datadog) (Fix with Cursor)
DDTrace\Tests\Integrations\Laravel\V8_x\QueueTest::testSimplePushAndProcess
Test code or tested code printed unexpected output: spanLinksTraceId: 698e005700000000055bd6e6f6701831
tid: 698e005700000000
hexProcessTraceId: 055bd6e6f6701831
hexProcessSpanId: a9da5a49cebe9905
processTraceId: 386138480535672881
processSpanId: 12239194210380454149
View all

ℹ️ Info

❄️ No new flaky tests detected

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 3cbd7cc | Docs | Datadog PR Page | Was this helpful? Give us feedback!

@codecov-commenter
Copy link

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 62.12%. Comparing base (cf7ba04) to head (3cbd7cc).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3645      +/-   ##
==========================================
+ Coverage   62.11%   62.12%   +0.01%     
==========================================
  Files         141      141              
  Lines       13387    13387              
  Branches     1753     1753              
==========================================
+ Hits         8315     8317       +2     
+ Misses       4273     4270       -3     
- Partials      799      800       +1     

see 1 file with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cf7ba04...3cbd7cc. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@pr-commenter
Copy link

pr-commenter bot commented Feb 12, 2026

Benchmarks [ profiler ]

Benchmark execution time: 2026-02-12 16:35:55

Comparing candidate commit 3cbd7cc in PR branch levi/shutdown-cpu with baseline commit cf7ba04 in branch master.

Found 3 performance improvements and 5 performance regressions! Performance is the same for 21 metrics, 7 unstable metrics.

scenario:php-profiler-exceptions-with-profiler-and-timeline

  • 🟥 execution_time [+96.891ms; +100.913ms] or [+100.356%; +104.521%]
  • 🟩 cpu_usage_percentage [-55.656%; -54.634%]

scenario:php-profiler-timeline-memory-control

  • 🟥 cpu_user_time [+28.973ms; +36.244ms] or [+4.723%; +5.909%]
  • 🟥 execution_time [+28.315ms; +33.940ms] or [+4.423%; +5.302%]

scenario:php-profiler-timeline-memory-with-profiler

  • 🟥 execution_time [+28.477ms; +43.944ms] or [+2.839%; +4.381%]

scenario:php-profiler-timeline-memory-with-profiler-and-timeline

  • 🟥 execution_time [+38.313ms; +55.676ms] or [+2.973%; +4.320%]
  • 🟩 cpu_system_time [-274.293ms; -234.810ms] or [-46.362%; -39.688%]
  • 🟩 cpu_usage_percentage [-24.495%; -22.829%]

@realFlowControl
Copy link
Member

If I remember the discussion and our work on #1919 I think the conclusion was that there is no way this can happen besides a fork() maybe but they mentioned they are not using pthread_fork() in PHP.

A bit later we added a fix for disabling the profiler while PHP FPM is doing preloading and way after that, we switched to pthread_atfork() libc handlers for fork() handling in #3058 so maybe this was PHP-FPM forking with preloading in #1919 already?

What I want to say: maybe with disabling profiling in preloading and handling fork() situation with pthread_atfork() we might remove all this busy loop / sleeping stuff and just join()? We should definitely have another look at this IMHO

@morrisonlevi
Copy link
Collaborator Author

What I want to say: maybe with disabling profiling in preloading and handling fork() situation with pthread_atfork() we might remove all this busy loop / sleeping stuff and just join()? We should definitely have another look at this IMHO

I think so too but the cost of being wrong is crashes so... I'm not sure it's worth it. The cost of maintaining this helper is very low.

@morrisonlevi morrisonlevi marked this pull request as ready for review February 13, 2026 03:59
@morrisonlevi morrisonlevi requested a review from a team as a code owner February 13, 2026 03:59
Copy link
Member

@realFlowControl realFlowControl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I would completely get rid of this code path and just join() because I am very confident we fixed the root cause to this issue a while ago already, see my other comment.
OTOH this only adds 100ms in MSHUTDOWN, it is most likely negligible and definitely this will lower CPU, because on most multi core machines, there will be nothing for sched_yield() to yield to 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

profiling Relates to the Continuous Profiler tracing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants