xine internals
Engine architecture and data flow
xine engine architecture
Media streams usually consist of audio and video data multiplexed
into one bitstream in the so-called system-layer (e.g. AVI, Quicktime or MPEG).
A demuxer plugin is used to parse the system layer and extract audio and video
packages. The demuxer uses an input plugin to read the data and stores it
in pre-allocated buffers from the global buffer pool.
The buffers are then added to the audio or video stream fifo.
From the other end of these fifos the audio and video decoder threads
consume the buffers and hand them over to the current audio or video
decoder plugin for decompression. These plugins then send the decoded
data to the output layer. The buffer holding the encoded
data is no longer needed and thus released to the global buffer pool.
In the output layer, the video frames and audio samples pass through a
post plugin tree, which can apply effects or other operations to the data.
When reaching the output loops, frames and samples are enqueued to be
displayed, when the presentation time has arrived.
A set of extra information travels with the data. Starting at the input and
demuxer level, where this information is generated, the data is attached to
the buffers as they wait in the fifo. The decoder loops copy the data to
a storage of their own. From there, every frame and audio buffer leaving
the stream layer is tagged with the data the decoder loop storage currently
holds.
Plugin system
The plugin system enables some of xine's most valuable features:
drop-in extensiability
support parallel installation of multiple (incompatible) libxine versions
support for multiple plugin directories
($prefix/lib/xine/plugins,
$HOME/.xine/plugins, …)
support for recursive plugin directories
(plugins are found even in subdirectories of the plugin directories)
version management
(On start, xine finds all plugins in its plugin (sub)directories and
chooses an appropriate version (usually the newest) for each plugin.)
simplification
(Plugins don't have to follow any special naming convention,
and any plugin may contain an arbitrary subset of input, demuxer,
decoder or output plugins.)
Essentally, plugins are just shared objects, ie dynamic libraries. In
contrast to normal dynamic libraries, they are stored outside of the
system's library PATHs and libxine does its own bookkeeping, which
enables most advanced features mentioned above.
Plugin location and filesystem layout
The primary goal for this new plugin mechanism was the need to support
simultaneous installation of several (most likely incompatible)
libxine versions without them overwriting each other's
plugins. Therefore, we have this simple layout:
Plugins are installed below XINE_PLUGINDIR
(/usr/local/lib/xine/plugins by default).
Note that plugins are never directly installed into XINE_PLUGINDIR.
Instead, a separate subdirectory is created for each "plugin
provider". A plugin provider is equivalent with the exact version of
one source package. Typical examples include "xine-lib-0.9.11" or
"xine-vcdnav-1.0". Every source package is free to install an
arbitrary number of plugins in its own, private directory. If a
package installs several plugins, they may optionally be organized
further into subdirectories.
So you will finally end up with something like this:
/usr/local/lib/xine/plugins
xine-lib-0.9.11
demux_mpeg_block.so
decode_mpeg.so
video_out_xv.so
…
xine-vcdnav-0.9.11
input_vcdnav.so
xine-lib-1.2
input
file.so
stdin_fifo.so
vcd.so
demuxers
fli.so
avi.so
…
decoders
ffmpeg.so
mpeg.so (may contain mpeg 1/2 audio and video decoders)
pcm.so
…
output
video_xv.so
audio_oss.so
…
xine-lib-3.0
avi.so (avi demuxer)
mpeg.so (contains mpeg demuxers and audio/video decoders)
video_out_xv.so (Xv video out)
…
As you can see, every package is free to organize plugins at will
below its own plugin provider directory.
Additionally, administrators may choose to put plugins directly into
XINE_PLUGINDIR, or in a "local" subdirectory.
Users may wish to put additional plugins in ~/.xine/plugins/.
Again, there may be subdirectories to help organize the plugins.
The default value for XINE_PLUGINDIR can be obtained using the
pkg-config --variable=plugindir libxine command.
Plugin Content: What's inside the .so?
Each plugin library (.so file) contains an arbitrary number of (virtual)
plugins. Typically, it will contain exactly one plugin. However, it
may be useful to put a set of related plugins in one library, so they
can share common code.
First of all, what is a virtual plugin?
A virtual plugin is essentially a structure that is defined by the
xine engine. This structure typically contains lots of function
pointers to the actual API functions.
For each plugin API, there are several API versions, and each API
version may specify a new, incompatible structure. Therefore, it is
essential that only those plugins are loaded that support current
libxine's API, so the .so file needs a plugin list that
provides libxine with the version information, even before it tries to
load any of the plugins.
This plugin list is held in an array named xine_plugin_info":
plugin_info_t xine_plugin_info[] = {
/* type, API, "name", version, special_info, init_function */
{ PLUGIN_DEMUX, 20, "flac", XINE_VERSION_CODE, NULL, demux_flac_init_class },
{ PLUGIN_AUDIO_DECODER, 13, "flacdec", XINE_VERSION_CODE, &dec_info_audio, init_plugin },
{ PLUGIN_NONE, 0, "", 0, NULL, NULL }
};
The structure of xine_plugin_info may never be changed.
If it ever needs to be changed, it must be renamed to avoid
erraneous loading of incompatible plugins.
xine_plugin_info can contain any number of plugins
and must be terminated with a PLUGIN_NONE entry. Available plugin
types are:
#define PLUGIN_NONE 0
#define PLUGIN_INPUT 1
#define PLUGIN_DEMUX 2
#define PLUGIN_AUDIO_DECODER 3
#define PLUGIN_VIDEO_DECODER 4
#define PLUGIN_SPU_DECODER 5
#define PLUGIN_AUDIO_OUT 6
#define PLUGIN_VIDEO_OUT 7
#define PLUGIN_POST 8
The plugin version number is generated from xine-lib's version number
like this: MAJOR * 10000 + MINOR * 100 + SUBMINOR.
This is not required, but it's an easy way to ensure that the version
increases for every release.
Every entry in xine_plugin_info has an initialization
function for the plugin class context.
This function returns a pointer to freshly allocated (typically
via malloc()) structure containing mainly function
pointers; these are the "methods" of the plugin class.
The "plugin class" is not what we call to do the job yet (like decoding
a video or something), it must be instantiated. One reason for having the
class is to hold any global settings that must be accessed by every
instance. Remember that xine library is multistream capable: multible
videos can be decoded at the same time, thus several instances of the
same plugin are possible.
If you think this is pretty much an object-oriented aproach,
then you're right.
A fictitious file input plugin that supports input plugin API 12 and
13, found in xine-lib 2.13.7 would then define this plugin list:
#include <xine/plugin.h>
…
plugin_t *init_api12(void) {
input_plugin_t *this;
this = malloc(sizeof(input_plugin_t));
…
return (plugin_t *)this;
}
/* same thing, with different initialization for API 13 */
const plugin_info_t xine_plugin_info[] = {
{ PLUGIN_INPUT, 12, "file", 21307, init_api12 },
{ PLUGIN_INPUT, 13, "file", 21307, init_api13 },
{ PLUGIN_NONE, 0, "", 0, NULL }
}
This input plugin supports two APIs, other plugins might provide a
mixture of demuxer and decoder plugins that belong together somehow
(ie. share common code).
You'll find exact definitions of public functions and plugin structs
in the appropriate header files for each plugin type:
input/input_plugin.h for input plugins,
demuxers/demux.h for demuxer plugins,
xine-engine/video_decoder.h for video decoder plugins,
xine-engine/audio_decoder.h for audio decoder plugins,
xine-engine/post.h for post plugins,
xine-engine/video_out.h for video out plugins,
xine-engine/audio_out.h for audio out plugins.
Additional information will also be given in the dedicated sections below.
Many plugins will need some additional "private" data fields.
These should be simply added at the end of the plugin structure.
For example a demuxer plugin called "foo" with two private
fields "xine" and "count" may have a plugin structure declared in
the following way:
typedef struct {
/* public fields "inherited" from demux.h */
demux_plugin_t demux_plugin;
xine_t *xine;
int count;
} demux_foo_t;
The plugin would then access public members via the
demux_plugin field and private fields directly.
Summary: Plugins consist of two C-style classes, each representing a different context.
The first is the so called "plugin class" context. This is a singleton context,
which means it will exist either not at all or at most once per xine context.
This plugin class context is a C-style class which is subclassing the related
class from the xine plugin headers. This contains functions, which are
independent of the actual instance of the plugin. Most prominently, it contains
a factory method to instantiate the next context.
The second context is the instance context. This is another C-style class, which
is constructed and disposed withing the plugin class context. This one does
the actual work and subclasses the related plugin struct from the xine plugin
headers. It is instantiated for every separate running instance of the plugin
What is this metronom thingy?
Metronom serves two purposes:
Generate vpts (virtual presentation time stamps) from pts (presentation time stamps)
for a/v output and synchronization.
Provide a master clock (system clock reference, scr), possibly provided
by external scr plugins (this can be used if some hardware decoder or network
server dictates the time).
pts/vpts values are given in 1/90000 sec units. pts values in mpeg streams
may wrap (that is, return to zero or any other value without further notice),
can be missing on some frames or (for broken streams) may "dance" around
the correct values. Metronom therefore has some heuristics built-in to generate
clean vpts values which can then be used in the output layers to schedule audio/video
output.
The heuristics used in metronom have always been a field of research. Current metronom's
implementation tries to stick to pts values as reported from demuxers,
that is, vpts may be obtained by a simple operation of vpts = pts + vpts_offset,
where vpts_offset takes into account any wraps. Whenever pts is zero,
metronom will estimate vpts based on previous values. If a difference is found between the
estimated and calculated vpts values by above formula, it will be smoothed by using a
"drift correction".
How does xine synchronize audio and video?
Every image frame or audio buffer leaving decoder is tagged by metronom with
a vpts information. This will tell video_out and audio_out threads when that
data should be presented. Usually there isn't a significative delay associated
with video driver, so we expect it to get on screen at the time it's
delivered for drawing. Unfortunately the same isn't true for audio: all sound
systems implement some amount of buffering (or fifo), any data being send to it
now will only get played some time in future. audio_out thread
must take this into account for making perfect A-V sync by asking the sound latency
to audio driver.
Some audio drivers can't tell the current delay introduced in playback. This is
especially true for most sound servers like ESD or aRts and explain why in such
cases the sync is far from perfect.
Another problem xine must handle is the sound card clock drift. vpts are
compared to the system clock (or even to a different clock provided by a scr plugin)
for presentation but sound card is sampling audio by its own clocking
mechanism, so a small drift may occur. As the playback goes on this
error will accumulate possibly resulting in audio gaps or audio drops. To avoid that
annoying effect, two countermeasures are available (switchable with xine config
option audio.synchronization.av_sync_method):
The small sound card errors are feedbacked to metronom. The details
are given by audio_out.c comments:
/* By adding gap errors (difference between reported and expected
* sound card clock) into metronom's vpts_offset we can use its
* smoothing algorithms to correct sound card clock drifts.
* obs: previously this error was added to xine scr.
*
* audio buf ---> metronom --> audio fifo --> (buf->vpts - hw_vpts)
* (vpts_offset + error) gap
* <---------- control --------------|
*
* Unfortunately audio fifo adds a large delay to our closed loop.
*
* These are designed to avoid updating the metronom too fast.
* - it will only be updated 1 time per second (so it has a chance of
* distributing the error for several frames).
* - it will only be updated 2 times for the whole audio fifo size
* length (so the control will wait to see the feedback effect)
* - each update will be of gap/SYNC_GAP_RATE.
*
* Sound card clock correction can only provide smooth playback for
* errors < 1% nominal rate. For bigger errors (bad streams) audio
* buffers may be dropped or gaps filled with silence.
*/
The audio is stretched or squeezed a slight bit by resampling, thus compensating
the drift: The next comment in audio_out.c explains:
/* Alternative for metronom feedback: fix sound card clock drift
* by resampling all audio data, so that the sound card keeps in
* sync with the system clock. This may help, if one uses a DXR3/H+
* decoder board. Those have their own clock (which serves as xine's
* master clock) and can only operate at fixed frame rates (if you
* want smooth playback). Resampling then avoids A/V sync problems,
* gaps filled with 0-frames and jerky video playback due to different
* clock speeds of the sound card and DXR3/H+.
*/
Overlays and OSD
The roots of xine overlay capabilities are DVD subpictures and subtitles support
(also known as 'spu'). The DVD subtitles are encoded in a RLE (Run Length Encoding - the
most simple compressing technique) format, with a palette of colors and transparency
levels. You probably thought that subtitles were just simple text saved into DVDs, right?
Wrong, they are bitmaps.
In order to optimize to the most common case, xine's internal format for screen overlays
is a similar representation to the 'spu' data. This brings not only performance
benefit (since blending functions may skip large image areas due to RLE) but also
compatibility: it's possible to re-encode any xine overlay to the original spu format
for displaying with mpeg hardware decoders like DXR3.
Displaying subtitles requires the ability to sync them to the video stream. This
is done using the same kind of pts/vpts stuff of a-v sync code. DVD subtitles,
for example, may request: show this spu at pts1 and hide it at pts2. This brings the
concept of the 'video overlay manager', that is a event-driven module for managing
overlay's showing and hiding.
The drawback of using internal RLE format is the difficulty in manipulating it
as graphic. To overcome that we created the 'OSD renderer', where OSD stands
for On Screen Display just like in TV sets. The osd renderer is a module
providing simple graphic primitives (lines, rectagles, draw text etc) over
a "virtual" bitmap area. Everytime we want to show that bitmap it will
be RLE encoded and sent to the overlay manager for displaying.
overlays architecture
Overlay Manager
The overlay manager interface is available to any xine plugin. It's a bit unlikely
to be used directly, anyway here's a code snippet for enqueueing an overlay for
displaying:
video_overlay_event_t event;
event.object.handle = this->video_overlay->get_handle(this->video_overlay,0);
memset(this->event.object.overlay, 0, sizeof(*this->event.object.overlay));
/* set position and size for this overlay */
event.object.overlay->x = 0;
event.object.overlay->y = 0;
event.object.overlay->width = 100;
event.object.overlay->height = 100;
/* clipping region is mostly used by dvd menus for highlighting buttons */
event.object.overlay->clip_top = 0;
event.object.overlay->clip_bottom = image_height;
event.object.overlay->clip_left = 0;
event.object.overlay->clip_right = image_width;
/* the hard part: provide a RLE image */
event.object.overlay->rle = your_rle;
event.object.overlay->data_size = your_size;
event.object.overlay->num_rle = your_rle_count;
/* palette must contain YUV values for each color index */
memcpy(event.object.overlay->clip_color, color, sizeof(color));
/* this table contains transparency levels for each color index.
0 = completely transparent, 15 - completely opaque */
memcpy(event.object.overlay->clip_trans, trans, sizeof(trans));
/* set the event type and time for displaying */
event.event_type = EVENT_SHOW_SPU;
event.vpts = 0; /* zero is a special vpts value, it means 'now' */
video_overlay->add_event(video_overlay, &event);
OSD Renderer
OSD is a general API for rendering stuff over playing video. It's available both
to xine plugins and to frontends.
The first thing you need is to allocate a OSD object for drawing from the
renderer. The code below allocates a 300x200 area. This size can't be changed
during the lifetime of a OSD object, but it's possible to place it anywhere
over the image.
osd_object_t osd;
osd = this->osd_renderer->new_object(osd_renderer, 300, 200);
Now we may want to set font and color for text rendering. Although we will
refer to fonts over this document, in fact the OSD can be any kind of bitmap. Font
files are searched and loaded during initialization from
$prefix/share/xine/fonts/ and ~/.xine/fonts.
There's a sample utility to convert truetype fonts at
xine-lib/misc/xine-fontconv.c. Palette may be manipulated directly,
however most of the time it's convenient to use pre-defined text palettes.
/* set sans serif 24 font */
osd_renderer->set_font(osd, "sans", 24);
/* copy pre-defined colors for white, black border, transparent background to
starting at the index used by the first text palette */
osd_renderer->set_text_palette(osd, TEXTPALETTE_WHITE_BLACK_TRANSPARENT, OSD_TEXT1);
/* copy pre-defined colors for white, no border, translucid background to
starting at the index used by the second text palette */
osd_renderer->set_text_palette(osd, TEXTPALETTE_WHITE_NONE_TRANSLUCID, OSD_TEXT2);
Now render the text and show it:
osd_renderer->render_text(osd, 0, 0, "white text, black border", OSD_TEXT1);
osd_renderer->render_text(osd, 0, 30, "white text, no border", OSD_TEXT2);
osd_renderer->show(osd, 0); /* 0 stands for 'now' */
There's a 1:1 mapping between OSD objects and overlays, therefore the
second time you send an OSD object for displaying it will actually substitute
the first image. By using set_position() function we can move overlay
over the video.
for( i=0; i < 100; i+=10 ) {
osd_renderer->set_position(osd, i, i );
osd_renderer->show(osd, 0);
sleep(1);
}
osd_renderer->hide(osd, 0);
For additional functions please check osd.h or the public header.
OSD palette notes
The palette functions demand some additional explanation, skip this if you
just want to write text fast without worring with details! :)
We have a 256-entry palette, each one defining yuv and transparency levels.
Although xine fonts are bitmaps and may use any index they want, we have
defined a small convention:
/*
Palette entries as used by osd fonts:
0: not used by font, always transparent
1: font background, usually transparent, may be used to implement
translucid boxes where the font will be printed.
2-5: transition between background and border (usually only alpha
value changes).
6: font border. if the font is to be displayed without border this
will probably be adjusted to font background or near.
7-9: transition between border and foreground
10: font color (foreground)
*/
The so called 'transitions' are used to implement font anti-aliasing. That
convention requires that any font file must use only the colors from 1 to 10.
When we use the set_text_palette() function we are just copying 11 palette
entries to the specified base index.
That base index is the same we pass to render_text() function to use the
text palette. With this scheme is possible to have several diferent text
colors at the same time and also draw fonts over custom background.
/* obtains size the text will occupy */
renderer->get_text_size(osd, text, &width, &height);
/* draws a box using font background color (translucid) */
renderer->filled_rect(osd, x1, y1, x1+width, y1+height, OSD_TEXT2 + 1);
/* render text */
renderer->render_text(osd, x1, y1, text, OSD_TEXT2);
OSD text and palette FAQ
Q: What is the format of the color palette entries?
A: It's the same as used by overlay blending code (YUV).
Q: What is the relation between a text palette and a palette
I set with xine_osd_set_palette?
A: xine_osd_set_palette will set the entire 256 color palette
to be used when we blend the osd image.
"text palette" is a sequence of 11 colors from palette to be
used to render text. that is, by calling osd_render_text()
with color_base=100 will render text using colors 100-110.
Q: Can I render text with colors in my own palette?
A: Sure. Just pass the color_base to osd_render_text()
Q: Has a text palette change effects on already drawed text?
A: osd_set_text_palette() will overwrite some colors on palette
with pre-defined ones. So yes, it will change the color
on already drawed text (if you do it before calling osd_show,
of course).
If you don't want to change the colors of drawed text just
use different color_base values.
Q: What about the shadows of osd-objects? Can I turn them off
or are they hardcoded?
A: osd objects have no shadows by itself, but fonts use 11
colors to produce an anti-aliased effect.
if you set a "text palette" with entries 0-9 being transparent
and 10 being foreground you will get rid of any borders or
anti-aliasing.
MRLs
This section defines a draft for a syntactic specification of MRLs as
used by xine-lib. The language of MRLs is designed to be a true subset
of the language of URIs as given in RFC2396. A type 2 grammar for the
language of MRLs is given in EBNF below.
Semantically, MRLs consist of two distinct parts that are evaluated by
different components of the xine architecture. The first part,
derivable from the symbol <input_source> in the given grammar, is
completely handed to the input plugins, with input plugins signaling
if they can handle the MRL.
The second part, derivable from <stream_setup> and delimited from the
first by a crosshatch ('#') contains parameters that modify the
initialization and playback behaviour of the stream to which the MRL
is passed. The possible parameters are mentioned in the manpage to
xine-ui.
The following definition should be regarded as a guideline only.
Of course any given input plugin only understands a subset of all
possible MRLs. On the other hand, invalid MRLs according to this
definition might be understood for convenience reasons.
Some user awareness is required at this point.
EBNF grammar for MRLs:
<mrl> ::= <input_source>[#<stream_setup>]
<input_source> ::= (<absolute_mrl>|<relative_mrl>)
<absolute_mrl> ::= <input>:(<hierarch_part>|<opaque_part>)
<hierarch_part> ::= (<net_path>|<abs_path>)[?<query>]
<opaque_part> ::= (<unreserved>|<escaped>|;|?|:|@|&|=|+|$|,){<mrl_char>}
<relative_mrl> ::= (<abs_path>|<rel_path>)
<net_path> ::= //<authority>[<abs_path>]
<abs_path> ::= /<path_segments>
<rel_path> ::= <rel_segment>[<abs_path>]
<rel_segment> ::= <rel_char>{<rel_char>}
<rel_char> ::= (<unreserved>|<escaped>|;|@|&|=|+|$|,)
<input> ::= <alpha>{(<alpha>|<digit>|+|-|.)}
<authority> ::= (<server>|<reg_name>)
<server> ::= [[<userinfo>@]<host>[:<port>]]
<userinfo> ::= {(<unreserved>|<escaped>|;|:|&|=|+|$|,)}
<host> ::= (<hostname>|<ipv4_address>|<ipv6_reference>)
<hostname> ::= {<domainlabel>.}<toplabel>[.]
<domainlabel> ::= (<alphanum>|<alphanum>{(<alphanum>|-)}<alphanum>)
<toplabel> ::= (<alpha>|<alpha>{(<alphanum>|-)}<alphanum>)
<ipv4_address> ::= <digit>{<digit>}.<digit>{<digit>}.<digit>{<digit>}.<digit>{<digit>}
<port> ::= {<digit>}
<reg_name> ::= <reg_char>{<reg_char>}
<reg_char> ::= (<unreserved>|<escaped>|;|:|@|&|=|+|$|,)
<path_segments> ::= <segment>{/<segment>}
<segment> ::= {<path_char>}{;<param>}
<param> ::= {<path_char>}
<path_char> ::= (<unreserved>|<escaped>|:|@|&|=|+|$|,)
<query> ::= {<mrl_char>}
<stream_setup> ::= <stream_option>;{<stream_option>}
<stream_option> ::= (<configoption>|<engine_option>|novideo|noaudio|nospu)
<configoption> ::= <configentry>:<configvalue>
<configentry> ::= <unreserved>{<unreserved>}
<configvalue> ::= <stream_char>{<stream_char>}
<engine_option> ::= <unreserved>{<unreserved>}:<stream_char>{<stream_char>}
<stream_char> ::= (<unreserved>|<escaped>|:|@|&|=|+|$|,)
<mrl_char> ::= (<reserved>|<unreserved>|<escaped>)
<reserved> ::= (;|/|?|:|@|&|=|+|$|,|[|])
<unreserved> ::= (<alphanum>|<mark>)
<mark> ::= (-|_|.|!|~|*|'|(|))
<escaped> ::= %<hex><hex>
<hex> ::= (<digit>|A|B|C|D|E|F|a|b|c|d|e|f)
<alphanum> ::= (<alpha>|<digit>)
<alpha> ::= (<lowalpha>|<upalpha>)
<lowalpha> ::= (a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z)
<upalpha> ::= (A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z)
<digit> ::= (0|1|2|3|4|5|6|7|8|9)
With <ipv6_reference> being an IPv6 address enclosed in [ and ] as defined in RFC2732.