xine-lib - xine-lib git mirror

diff options

author	Roland Scheidegger <rscheidegger_lists@hispeed.ch>	2013-09-17 23:58:37 +0100
committer	Roland Scheidegger <rscheidegger_lists@hispeed.ch>	2013-09-17 23:58:37 +0100
commit	b0fa6cab5e64886d3f47156a049c5d1b85dcabd9 (patch)
tree	77a1c17d3db18e0b107ec1d108493711f5e6da4a /po
parent	88adea15db8b5bd937b018f7ddf19c03a02a94f4 (diff)
download	xine-lib-b0fa6cab5e64886d3f47156a049c5d1b85dcabd9.tar.gz xine-lib-b0fa6cab5e64886d3f47156a049c5d1b85dcabd9.tar.bz2

Emit vzeroupper after avx memcpy

Emitting vzeroupper is necessary to avoid avx<->sse transition penalties (when using avx-256 instructions). This didn't really matter much in the past, since other code wasn't using avx, hence there was just a penalty once afterwards when sse code was executed. However, there's code in ffmpeg which mixes avx-128 and sse a lot, and each time this happens there's a huge penalty. This causes in particular ff_deblock_v_luma_8_avx to slow down by a factor of 50 or so which makes the whole decoding about twice as slow (might be dependent on the h264 stream or maybe ffmpeg version too, since ffmpeg will also emit vzeroupper when using avx-256 hence not doing it here might not always be an issue, but in the case I was seeing nothing else used avx-256).

Diffstat (limited to 'po')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: