-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Labels
questionFurther information is requestedFurther information is requested
Description
There is a 2.5x difference in performance of Particle State Update between Haswell and Skylake processors of the same clockspeed. One explanation could be the use of AVX512 vector instructions on Skylake. It would be interesting to show whether this is the case.
Single thread Haswell Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz:
julia> tdac(TDAC.tdac_params(; nprt = 64, nobs = 64, enable_timers = true));
────────────────────────────────────────────────────────────────────────────────
Time Allocations
────────────────────── ───────────────────────
Tot / % measured: 77.2s / 100% 11.4GiB / 100%
Section ncalls time %tot avg alloc %tot avg
────────────────────────────────────────────────────────────────────────────────
Particle State Update 20 45.5s 59.0% 2.28s 3.00MiB 0.03% 154KiB
Process Noise 1.28k 27.8s 36.0% 21.7ms 10.7GiB 93.7% 8.55MiB
Initialization 1 1.47s 1.90% 1.47s 698MiB 5.98% 698MiB
True State Update 20 931ms 1.20% 46.5ms 42.8KiB 0.00% 2.14KiB
Resample 20 774ms 1.00% 38.7ms 12.2KiB 0.00% 624B
Particle Variance 20 343ms 0.44% 17.2ms 36.6MiB 0.31% 1.83MiB
Particle Mean 20 181ms 0.23% 9.05ms 0.00B 0.00% 0.00B
State Copy 20 126ms 0.16% 6.32ms 640B 0.00% 32.0B
Weights 20 20.8ms 0.03% 1.04ms 2.53MiB 0.02% 130KiB
Observations 1.30k 15.7ms 0.02% 12.1μs 280KiB 0.00% 221B
Observation Noise 1.28k 2.50ms 0.00% 1.96μs 60.0KiB 0.00% 48.0B
────────────────────────────────────────────────────────────────────────────────
Single thread Skylake Intel(R) Core(TM) i7-7660U CPU @ 2.50GHz
julia> tdac(TDAC.tdac_params(; nprt = 64, nobs = 64, enable_timers = true));
────────────────────────────────────────────────────────────────────────────────
Time Allocations
────────────────────── ───────────────────────
Tot / % measured: 46.2s / 100% 11.4GiB / 100%
Section ncalls time %tot avg alloc %tot avg
────────────────────────────────────────────────────────────────────────────────
Process Noise 1.28k 25.1s 54.2% 19.6ms 10.7GiB 93.5% 8.55MiB
Particle State Update 20 17.8s 38.5% 890ms 4.48MiB 0.04% 229KiB
Initialization 1 2.13s 4.61% 2.13s 698MiB 5.97% 698MiB
Resample 20 382ms 0.83% 19.1ms 12.2KiB 0.00% 624B
True State Update 20 300ms 0.65% 15.0ms 42.8KiB 0.00% 2.14KiB
Particle Variance 20 208ms 0.45% 10.4ms 36.6MiB 0.31% 1.83MiB
State Copy 20 130ms 0.28% 6.48ms 640B 0.00% 32.0B
Particle Mean 20 91.8ms 0.20% 4.59ms 0.00B 0.00% 0.00B
Observations 1.30k 17.6ms 0.04% 13.5μs 280KiB 0.00% 221B
Weights 20 3.94ms 0.01% 197μs 2.53MiB 0.02% 130KiB
Observation Noise 1.28k 2.73ms 0.01% 2.13μs 60.0KiB 0.00% 48.0B
────────────────────────────────────────────────────────────────────────────────
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested