Age | Commit message (Collapse) | Author |
|
The old code did some "averaging" which, while cheap, lead to serious
chroma shift because the weighting factors turned out to be pretty random
(arguably no averaging likely would have been given more correct results).
It also in fact lead to chroma ghosts.
To see why this was wrong read the following and then do the math.
http://www.hometheaterhifi.com/the-dvd-benchmark/179-the-chroma-upsampling-error-and-the-420-interlaced-chroma-problem.html
http://avisynth.org/mediawiki/Sampling
As an example, let's look what happens at line 4 for interlaced content
(where the code would have averaged chroma from chroma line 2 and 4):
Chroma line 2 contains chroma values for line 2 (25%) and 4 (75%) while
chroma line 4 contains chroma values for line 6 (25%) and 8 (75%) of the
original (prior to subsampling) frame.
Average these together and you get something quite wrong. Most importantly
the center of these weights will be at 5.5 instead of 4 (hence chroma shift).
For odd lines it is different (better but still wrong).
So, fix this by using the correct weights for reconstruction of the chroma
values (which is averaging for the progressive case for all pixels since the
samples are defined to be between the lines, and use different weighting
factors for odd/even/"upper"/"lower" lines).
This runs more than twice the instructions (for the mmx case), but I measured
only a performance impact of roughly 5% (on a Athlon64 X2) - seriously bound
by memory access (by comparison the sort-of-pointless post-deinterlace chroma
filter is nearly twice as slow hence if you don't need it because the values
are correct this will be a lot faster).
Note: this is only correct for codecs which use the same chroma positions
as mpeg2 (dv is definitely different, mpeg1 is also different but only for
horizontal positioning, which doesn't matter here). "yv12" as such seems
underspecified wrt chroma positioning.
On another note, while this algorithm may be correct, it is inherently
suboptimal doing this pre-deinterlace (and a post-deinterlace chroma
filter is not going to help much neither except it can blur the mess).
This NEEDS to be part of deinterlace (which btw would also be quite a bit
faster when handling planar directly due to saving one pass of going
through all memory).
The reason is while line 4 will now use the correct weighting factors,
the fact remains it will use chroma values originating from lines 2, 4, 6
and 8 of the original image. However, if the deinterlacer decides to weave
because there is no motion, it CAN and most likely wants to use chroma values
from the other field (hence values originating from line 2, 3, 4, 5 in this
case when using a very simple filter, with appropriate weighting).
--HG--
branch : point-release
extra : rebase_source : 808bb5785ca398970324bea6b391a9e24c576d2f
|
|
thread count needs to be set before avcodec_open otherwise it will be stuck
with a single thread at least for h264 (might also want to use avcodec_open2
instead?)
|
|
yuv2rgb_mmx.c scales YUV and rounds them down to 8 bits
individually before the addition. That causes red and
blue to be off by up to 2, green even off by 3.
This little patch does the stuff using 10 bits per
component, plus correct rounding.
There seems to be no noticable impact on performance,
but color gradients come out much smoother now.
|
|
Cuts roughly 10% of the instructions (with sse), results should be
identical.
Not sure why it was that complicated in the first place, the
simplification is possible because the code gave a score of 1 to top and
bottom comparisons, and 2 for the middle one, and weaved when all scores
added together were more than 2. This is equivalent to weave when
(cmp(m) AND (cmp(b) OR cmp(t))) which is a much better match for the
available hw instructions. This also reduces the number of constant
loads a lot, and the patch moves up some memory loads a bit which can
never hurt.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
(as a simple argument swap is all that's needed).
|
|
|
|
|
|
|
|
|
|
|
|
register (mmx_a2r)
|
|
|
|
|
|
--HG--
rename : src/xine-utils/xineutils.h => include/xine/xineutils.h
|
|
--HG--
rename : src/xine-utils/xineutils.h => include/xine/xineutils.h
|
|
--HG--
rename : debian/dh_xine => debian/dh_xine.in
|
|
|
|
--HG--
branch : 1.2.1-branch
|
|
|
|
--HG--
branch : 1.2.1-branch
|
|
--HG--
branch : 1.2.1-branch
|
|
|
|
|
|
driver.
|
|
|
|
|
|
--HG--
rename : include/xine.h.in => include/xine.h
rename : src/xine-engine/xine_internal.h => include/xine/xine_internal.h
rename : src/combined/ffmpeg/ffmpeg_encoder.c => src/dxr3/ffmpeg_encoder.c
|
|
|
|
--HG--
rename : src/demuxers/demux_ogg.c => src/combined/xine_ogg_demuxer.c
|
|
--HG--
rename : src/libspudvb/xine_spudvb_decoder.c => src/spu_dec/spudvb_decoder.c
|
|
It's only a cosmetic change.
--HG--
extra : rebase_source : a759588226bbc43bca331c746d14ec2e2d84c9a4
|
|
The current osd and grab logic needs a lot of output surface objects
for rendering.
The current implementation create and destroy these objects on demand.
This patch introduce a new buffer where output surfaces are hold for
reuse preventing most of the create and destroy calls.
The size of the new buffer could be configured with parameter
"video.output.vdpau_output_surface_buffer_size".
Default value is 10 surfaces. Possible range is 2...25
To further minimize surface creation and destroy the first n created surfaces
get a minimum size according to the actual display and frame size where n
is the size of the surface buffer.
These first objects will be allocated as rather big surfaces so that they
fit for most of the surface requests.
This should be considered when choosing higher buffer values.
This patch also improves dirty rect handling within osd handling.
Now dirty rect information is used even if more than one osd
object is displayed at the same time.
--HG--
extra : rebase_source : b40e365ab1f81ebdd72b2e1713cf3526d6dd7493
|
|
actual display dimension
To minimize output surface reallocation while resizing the video window
these output surfaces are now allocated with the actual display
dimension.
--HG--
extra : rebase_source : 41e16c3f5bc0c66e1c3e63221f0cc38ffe9d08be
|
|
Because displayed output surfaces are only increased in size when gui
window dimension changes the surface size could be greater than the
actual gui window size.
--HG--
extra : rebase_source : 4f7be362af8ccfe5851900bda095d0949d1c6e15
|
|
for grab feature of vdpau output driver
Fixed usage of wrong variables to determine current gui output window size for grab feature of vdpau output driver
--HG--
extra : rebase_source : f605be7e19142756f3ab388e558d8e65e3ddba5d
|
|
Currently the spu decoder sets the extend size of each generated
osd object to a fixed size of 1920x1080.
Output drivers which are extend capable (like vdpau) will do bad
scaling of these objects if video frame format is different.
This patch fixes the issue by removing the explicit extend setting.
The video driver will now use the actual video frame size by default.
--HG--
extra : rebase_source : 5800f84391bba725f5cb1ef28025412a2b6b6a35
|
|
|
|
|
|
|
|
|
|
|
|
|