xine internals

xine internals Engine architecture and data flow xine engine architecture Media streams usually consist of audio and video data multiplexed into one bitstream in the so-called system-layer (e.g. AVI, Quicktime or MPEG). A demuxer plugin is used to parse the system layer and extract audio and video packages. The demuxer uses an input plugin to read the data and stores it in pre-allocated buffers from the global buffer pool. The buffers are then added to the audio or video stream fifo. From the other end of these fifos the audio and video decoder threads consume the buffers and hand them over to the current audio or video decoder plugin for decompression. These plugins then send the decoded data to the output layer. The buffer holding the encoded data is no longer needed and thus released to the global buffer pool. In the output layer, the video frames and audio samples pass through a post plugin tree, which can apply effects or other operations to the data. When reaching the output loops, frames and samples are enqueued to be displayed, when the presentation time has arrived. A set of extra information travels with the data. Starting at the input and demuxer level, where this information is generated, the data is attached to the buffers as they wait in the fifo. The decoder loops copy the data to a storage of their own. From there, every frame and audio buffer leaving the stream layer is tagged with the data the decoder loop storage currently holds. Plugin system The plugin system enables some of xine's most valuable features: drop-in extensiability support parallel installation of multiple (incompatible) libxine versions support for multiple plugin directories ($prefix/lib/xine/plugins, $HOME/.xine/plugins, …) support for recursive plugin directories (plugins are found even in subdirectories of the plugin directories) version management (On start, xine finds all plugins in its plugin (sub)directories and chooses an appropriate version (usually the newest) for each plugin.) simplification (Plugins don't have to follow any special naming convention, and any plugin may contain an arbitrary subset of input, demuxer, decoder or output plugins.) Essentally, plugins are just shared objects, ie dynamic libraries. In contrast to normal dynamic libraries, they are stored outside of the system's library PATHs and libxine does its own bookkeeping, which enables most advanced features mentioned above. Plugin location and filesystem layout The primary goal for this new plugin mechanism was the need to support simultaneous installation of several (most likely incompatible) libxine versions without them overwriting each other's plugins. Therefore, we have this simple layout: Plugins are installed below XINE_PLUGINDIR (/usr/local/lib/xine/plugins by default). Note that plugins are never directly installed into XINE_PLUGINDIR. Instead, a separate subdirectory is created for each "plugin provider". A plugin provider is equivalent with the exact version of one source package. Typical examples include "xine-lib-0.9.11" or "xine-vcdnav-1.0". Every source package is free to install an arbitrary number of plugins in its own, private directory. If a package installs several plugins, they may optionally be organized further into subdirectories. So you will finally end up with something like this: /usr/local/lib/xine/plugins xine-lib-0.9.11 demux_mpeg_block.so decode_mpeg.so video_out_xv.so … xine-vcdnav-0.9.11 input_vcdnav.so xine-lib-1.2 input file.so stdin_fifo.so vcd.so demuxers fli.so avi.so … decoders ffmpeg.so mpeg.so (may contain mpeg 1/2 audio and video decoders) pcm.so … output video_xv.so audio_oss.so … xine-lib-3.0 avi.so (avi demuxer) mpeg.so (contains mpeg demuxers and audio/video decoders) video_out_xv.so (Xv video out) … As you can see, every package is free to organize plugins at will below its own plugin provider directory. Additionally, administrators may choose to put plugins directly into XINE_PLUGINDIR, or in a "local" subdirectory. Users may wish to put additional plugins in ~/.xine/plugins/. Again, there may be subdirectories to help organize the plugins. The default value for XINE_PLUGINDIR can be obtained using the pkg-config --variable=plugindir libxine command. Plugin Content: What's inside the .so? Each plugin library (.so file) contains an arbitrary number of (virtual) plugins. Typically, it will contain exactly one plugin. However, it may be useful to put a set of related plugins in one library, so they can share common code. First of all, what is a virtual plugin? A virtual plugin is essentially a structure that is defined by the xine engine. This structure typically contains lots of function pointers to the actual API functions. For each plugin API, there are several API versions, and each API version may specify a new, incompatible structure. Therefore, it is essential that only those plugins are loaded that support current libxine's API, so the .so file needs a plugin list that provides libxine with the version information, even before it tries to load any of the plugins. This plugin list is held in an array named xine_plugin_info": plugin_info_t xine_plugin_info[] = { /* type, API, "name", version, special_info, init_function */ { PLUGIN_DEMUX, 20, "flac", XINE_VERSION_CODE, NULL, demux_flac_init_class }, { PLUGIN_AUDIO_DECODER, 13, "flacdec", XINE_VERSION_CODE, &dec_info_audio, init_plugin }, { PLUGIN_NONE, 0, "", 0, NULL, NULL } }; The structure of xine_plugin_info may never be changed. If it ever needs to be changed, it must be renamed to avoid erraneous loading of incompatible plugins. xine_plugin_info can contain any number of plugins and must be terminated with a PLUGIN_NONE entry. Available plugin types are: #define PLUGIN_NONE 0 #define PLUGIN_INPUT 1 #define PLUGIN_DEMUX 2 #define PLUGIN_AUDIO_DECODER 3 #define PLUGIN_VIDEO_DECODER 4 #define PLUGIN_SPU_DECODER 5 #define PLUGIN_AUDIO_OUT 6 #define PLUGIN_VIDEO_OUT 7 #define PLUGIN_POST 8 The plugin version number is generated from xine-lib's version number like this: MAJOR * 10000 + MINOR * 100 + SUBMINOR. This is not required, but it's an easy way to ensure that the version increases for every release. Every entry in xine_plugin_info has an initialization function for the plugin class context. This function returns a pointer to freshly allocated (typically via malloc()) structure containing mainly function pointers; these are the "methods" of the plugin class. The "plugin class" is not what we call to do the job yet (like decoding a video or something), it must be instantiated. One reason for having the class is to hold any global settings that must be accessed by every instance. Remember that xine library is multistream capable: multible videos can be decoded at the same time, thus several instances of the same plugin are possible. If you think this is pretty much an object-oriented aproach, then you're right. A fictitious file input plugin that supports input plugin API 12 and 13, found in xine-lib 2.13.7 would then define this plugin list: #include <xine/plugin.h> … plugin_t *init_api12(void) { input_plugin_t *this; this = malloc(sizeof(input_plugin_t)); … return (plugin_t *)this; } /* same thing, with different initialization for API 13 */ const plugin_info_t xine_plugin_info[] = { { PLUGIN_INPUT, 12, "file", 21307, init_api12 }, { PLUGIN_INPUT, 13, "file", 21307, init_api13 }, { PLUGIN_NONE, 0, "", 0, NULL } } This input plugin supports two APIs, other plugins might provide a mixture of demuxer and decoder plugins that belong together somehow (ie. share common code). You'll find exact definitions of public functions and plugin structs in the appropriate header files for each plugin type: input/input_plugin.h for input plugins, demuxers/demux.h for demuxer plugins, xine-engine/video_decoder.h for video decoder plugins, xine-engine/audio_decoder.h for audio decoder plugins, xine-engine/post.h for post plugins, xine-engine/video_out.h for video out plugins, xine-engine/audio_out.h for audio out plugins. Additional information will also be given in the dedicated sections below. Many plugins will need some additional "private" data fields. These should be simply added at the end of the plugin structure. For example a demuxer plugin called "foo" with two private fields "xine" and "count" may have a plugin structure declared in the following way: typedef struct { /* public fields "inherited" from demux.h */ demux_plugin_t demux_plugin; xine_t *xine; int count; } demux_foo_t; The plugin would then access public members via the demux_plugin field and private fields directly. Summary: Plugins consist of two C-style classes, each representing a different context. The first is the so called "plugin class" context. This is a singleton context, which means it will exist either not at all or at most once per xine context. This plugin class context is a C-style class which is subclassing the related class from the xine plugin headers. This contains functions, which are independent of the actual instance of the plugin. Most prominently, it contains a factory method to instantiate the next context. The second context is the instance context. This is another C-style class, which is constructed and disposed withing the plugin class context. This one does the actual work and subclasses the related plugin struct from the xine plugin headers. It is instantiated for every separate running instance of the plugin What is this metronom thingy? Metronom serves two purposes: Generate vpts (virtual presentation time stamps) from pts (presentation time stamps) for a/v output and synchronization. Provide a master clock (system clock reference, scr), possibly provided by external scr plugins (this can be used if some hardware decoder or network server dictates the time). pts/vpts values are given in 1/90000 sec units. pts values in mpeg streams may wrap (that is, return to zero or any other value without further notice), can be missing on some frames or (for broken streams) may "dance" around the correct values. Metronom therefore has some heuristics built-in to generate clean vpts values which can then be used in the output layers to schedule audio/video output. The heuristics used in metronom have always been a field of research. Current metronom's implementation tries to stick to pts values as reported from demuxers, that is, vpts may be obtained by a simple operation of vpts = pts + vpts_offset, where vpts_offset takes into account any wraps. Whenever pts is zero, metronom will estimate vpts based on previous values. If a difference is found between the estimated and calculated vpts values by above formula, it will be smoothed by using a "drift correction". How does xine synchronize audio and video? Every image frame or audio buffer leaving decoder is tagged by metronom with a vpts information. This will tell video_out and audio_out threads when that data should be presented. Usually there isn't a significative delay associated with video driver, so we expect it to get on screen at the time it's delivered for drawing. Unfortunately the same isn't true for audio: all sound systems implement some amount of buffering (or fifo), any data being send to it now will only get played some time in future. audio_out thread must take this into account for making perfect A-V sync by asking the sound latency to audio driver. Some audio drivers can't tell the current delay introduced in playback. This is especially true for most sound servers like ESD or aRts and explain why in such cases the sync is far from perfect. Another problem xine must handle is the sound card clock drift. vpts are compared to the system clock (or even to a different clock provided by a scr plugin) for presentation but sound card is sampling audio by its own clocking mechanism, so a small drift may occur. As the playback goes on this error will accumulate possibly resulting in audio gaps or audio drops. To avoid that annoying effect, two countermeasures are available (switchable with xine config option audio.synchronization.av_sync_method): The small sound card errors are feedbacked to metronom. The details are given by audio_out.c comments: /* By adding gap errors (difference between reported and expected * sound card clock) into metronom's vpts_offset we can use its * smoothing algorithms to correct sound card clock drifts. * obs: previously this error was added to xine scr. * * audio buf ---> metronom --> audio fifo --> (buf->vpts - hw_vpts) * (vpts_offset + error) gap * <---------- control --------------| * * Unfortunately audio fifo adds a large delay to our closed loop. * * These are designed to avoid updating the metronom too fast. * - it will only be updated 1 time per second (so it has a chance of * distributing the error for several frames). * - it will only be updated 2 times for the whole audio fifo size * length (so the control will wait to see the feedback effect) * - each update will be of gap/SYNC_GAP_RATE. * * Sound card clock correction can only provide smooth playback for * errors < 1% nominal rate. For bigger errors (bad streams) audio * buffers may be dropped or gaps filled with silence. */ The audio is stretched or squeezed a slight bit by resampling, thus compensating the drift: The next comment in audio_out.c explains: /* Alternative for metronom feedback: fix sound card clock drift * by resampling all audio data, so that the sound card keeps in * sync with the system clock. This may help, if one uses a DXR3/H+ * decoder board. Those have their own clock (which serves as xine's * master clock) and can only operate at fixed frame rates (if you * want smooth playback). Resampling then avoids A/V sync problems, * gaps filled with 0-frames and jerky video playback due to different * clock speeds of the sound card and DXR3/H+. */ Overlays and OSD The roots of xine overlay capabilities are DVD subpictures and subtitles support (also known as 'spu'). The DVD subtitles are encoded in a RLE (Run Length Encoding - the most simple compressing technique) format, with a palette of colors and transparency levels. You probably thought that subtitles were just simple text saved into DVDs, right? Wrong, they are bitmaps. In order to optimize to the most common case, xine's internal format for screen overlays is a similar representation to the 'spu' data. This brings not only performance benefit (since blending functions may skip large image areas due to RLE) but also compatibility: it's possible to re-encode any xine overlay to the original spu format for displaying with mpeg hardware decoders like DXR3. Displaying subtitles requires the ability to sync them to the video stream. This is done using the same kind of pts/vpts stuff of a-v sync code. DVD subtitles, for example, may request: show this spu at pts1 and hide it at pts2. This brings the concept of the 'video overlay manager', that is a event-driven module for managing overlay's showing and hiding. The drawback of using internal RLE format is the difficulty in manipulating it as graphic. To overcome that we created the 'OSD renderer', where OSD stands for On Screen Display just like in TV sets. The osd renderer is a module providing simple graphic primitives (lines, rectagles, draw text etc) over a "virtual" bitmap area. Everytime we want to show that bitmap it will be RLE encoded and sent to the overlay manager for displaying. overlays architecture Overlay Manager The overlay manager interface is available to any xine plugin. It's a bit unlikely to be used directly, anyway here's a code snippet for enqueueing an overlay for displaying: video_overlay_event_t event; event.object.handle = this->video_overlay->get_handle(this->video_overlay,0); memset(this->event.object.overlay, 0, sizeof(*this->event.object.overlay)); /* set position and size for this overlay */ event.object.overlay->x = 0; event.object.overlay->y = 0; event.object.overlay->width = 100; event.object.overlay->height = 100; /* clipping region is mostly used by dvd menus for highlighting buttons */ event.object.overlay->clip_top = 0; event.object.overlay->clip_bottom = image_height; event.object.overlay->clip_left = 0; event.object.overlay->clip_right = image_width; /* the hard part: provide a RLE image */ event.object.overlay->rle = your_rle; event.object.overlay->data_size = your_size; event.object.overlay->num_rle = your_rle_count; /* palette must contain YUV values for each color index */ memcpy(event.object.overlay->clip_color, color, sizeof(color)); /* this table contains transparency levels for each color index. 0 = completely transparent, 15 - completely opaque */ memcpy(event.object.overlay->clip_trans, trans, sizeof(trans)); /* set the event type and time for displaying */ event.event_type = EVENT_SHOW_SPU; event.vpts = 0; /* zero is a special vpts value, it means 'now' */ video_overlay->add_event(video_overlay, &event); OSD Renderer OSD is a general API for rendering stuff over playing video. It's available both to xine plugins and to frontends. The first thing you need is to allocate a OSD object for drawing from the renderer. The code below allocates a 300x200 area. This size can't be changed during the lifetime of a OSD object, but it's possible to place it anywhere over the image. osd_object_t osd; osd = this->osd_renderer->new_object(osd_renderer, 300, 200); Now we may want to set font and color for text rendering. Although we will refer to fonts over this document, in fact the OSD can be any kind of bitmap. Font files are searched and loaded during initialization from $prefix/share/xine/fonts/ and ~/.xine/fonts. There's a sample utility to convert truetype fonts at xine-lib/misc/xine-fontconv.c. Palette may be manipulated directly, however most of the time it's convenient to use pre-defined text palettes. /* set sans serif 24 font */ osd_renderer->set_font(osd, "sans", 24); /* copy pre-defined colors for white, black border, transparent background to starting at the index used by the first text palette */ osd_renderer->set_text_palette(osd, TEXTPALETTE_WHITE_BLACK_TRANSPARENT, OSD_TEXT1); /* copy pre-defined colors for white, no border, translucid background to starting at the index used by the second text palette */ osd_renderer->set_text_palette(osd, TEXTPALETTE_WHITE_NONE_TRANSLUCID, OSD_TEXT2); Now render the text and show it: osd_renderer->render_text(osd, 0, 0, "white text, black border", OSD_TEXT1); osd_renderer->render_text(osd, 0, 30, "white text, no border", OSD_TEXT2); osd_renderer->show(osd, 0); /* 0 stands for 'now' */ There's a 1:1 mapping between OSD objects and overlays, therefore the second time you send an OSD object for displaying it will actually substitute the first image. By using set_position() function we can move overlay over the video. for( i=0; i < 100; i+=10 ) { osd_renderer->set_position(osd, i, i ); osd_renderer->show(osd, 0); sleep(1); } osd_renderer->hide(osd, 0); For additional functions please check osd.h or the public header. OSD palette notes The palette functions demand some additional explanation, skip this if you just want to write text fast without worring with details! :) We have a 256-entry palette, each one defining yuv and transparency levels. Although xine fonts are bitmaps and may use any index they want, we have defined a small convention: /* Palette entries as used by osd fonts: 0: not used by font, always transparent 1: font background, usually transparent, may be used to implement translucid boxes where the font will be printed. 2-5: transition between background and border (usually only alpha value changes). 6: font border. if the font is to be displayed without border this will probably be adjusted to font background or near. 7-9: transition between border and foreground 10: font color (foreground) */ The so called 'transitions' are used to implement font anti-aliasing. That convention requires that any font file must use only the colors from 1 to 10. When we use the set_text_palette() function we are just copying 11 palette entries to the specified base index. That base index is the same we pass to render_text() function to use the text palette. With this scheme is possible to have several diferent text colors at the same time and also draw fonts over custom background. /* obtains size the text will occupy */ renderer->get_text_size(osd, text, &width, &height); /* draws a box using font background color (translucid) */ renderer->filled_rect(osd, x1, y1, x1+width, y1+height, OSD_TEXT2 + 1); /* render text */ renderer->render_text(osd, x1, y1, text, OSD_TEXT2); OSD text and palette FAQ Q: What is the format of the color palette entries? A: It's the same as used by overlay blending code (YUV). Q: What is the relation between a text palette and a palette I set with xine_osd_set_palette? A: xine_osd_set_palette will set the entire 256 color palette to be used when we blend the osd image. "text palette" is a sequence of 11 colors from palette to be used to render text. that is, by calling osd_render_text() with color_base=100 will render text using colors 100-110. Q: Can I render text with colors in my own palette? A: Sure. Just pass the color_base to osd_render_text() Q: Has a text palette change effects on already drawed text? A: osd_set_text_palette() will overwrite some colors on palette with pre-defined ones. So yes, it will change the color on already drawed text (if you do it before calling osd_show, of course). If you don't want to change the colors of drawed text just use different color_base values. Q: What about the shadows of osd-objects? Can I turn them off or are they hardcoded? A: osd objects have no shadows by itself, but fonts use 11 colors to produce an anti-aliased effect. if you set a "text palette" with entries 0-9 being transparent and 10 being foreground you will get rid of any borders or anti-aliasing. MRLs This section defines a draft for a syntactic specification of MRLs as used by xine-lib. The language of MRLs is designed to be a true subset of the language of URIs as given in RFC2396. A type 2 grammar for the language of MRLs is given in EBNF below. Semantically, MRLs consist of two distinct parts that are evaluated by different components of the xine architecture. The first part, derivable from the symbol <input_source> in the given grammar, is completely handed to the input plugins, with input plugins signaling if they can handle the MRL. The second part, derivable from <stream_setup> and delimited from the first by a crosshatch ('#') contains parameters that modify the initialization and playback behaviour of the stream to which the MRL is passed. The possible parameters are mentioned in the manpage to xine-ui. The following definition should be regarded as a guideline only. Of course any given input plugin only understands a subset of all possible MRLs. On the other hand, invalid MRLs according to this definition might be understood for convenience reasons. Some user awareness is required at this point. EBNF grammar for MRLs: <mrl> ::= <input_source>[#<stream_setup>] <input_source> ::= (<absolute_mrl>|<relative_mrl>) <absolute_mrl> ::= <input>:(<hierarch_part>|<opaque_part>) <hierarch_part> ::= (<net_path>|<abs_path>)[?<query>] <opaque_part> ::= (<unreserved>|<escaped>|;|?|:|@|&|=|+|$|,){<mrl_char>} <relative_mrl> ::= (<abs_path>|<rel_path>) <net_path> ::= //<authority>[<abs_path>] <abs_path> ::= /<path_segments> <rel_path> ::= <rel_segment>[<abs_path>] <rel_segment> ::= <rel_char>{<rel_char>} <rel_char> ::= (<unreserved>|<escaped>|;|@|&|=|+|$|,) <input> ::= <alpha>{(<alpha>|<digit>|+|-|.)} <authority> ::= (<server>|<reg_name>) <server> ::= [[<userinfo>@]<host>[:<port>]] <userinfo> ::= {(<unreserved>|<escaped>|;|:|&|=|+|$|,)} <host> ::= (<hostname>|<ipv4_address>|<ipv6_reference>) <hostname> ::= {<domainlabel>.}<toplabel>[.] <domainlabel> ::= (<alphanum>|<alphanum>{(<alphanum>|-)}<alphanum>) <toplabel> ::= (<alpha>|<alpha>{(<alphanum>|-)}<alphanum>) <ipv4_address> ::= <digit>{<digit>}.<digit>{<digit>}.<digit>{<digit>}.<digit>{<digit>} <port> ::= {<digit>} <reg_name> ::= <reg_char>{<reg_char>} <reg_char> ::= (<unreserved>|<escaped>|;|:|@|&|=|+|$|,) <path_segments> ::= <segment>{/<segment>} <segment> ::= {<path_char>}{;<param>} <param> ::= {<path_char>} <path_char> ::= (<unreserved>|<escaped>|:|@|&|=|+|$|,) <query> ::= {<mrl_char>} <stream_setup> ::= <stream_option>;{<stream_option>} <stream_option> ::= (<configoption>|<engine_option>|novideo|noaudio|nospu) <configoption> ::= <configentry>:<configvalue> <configentry> ::= <unreserved>{<unreserved>} <configvalue> ::= <stream_char>{<stream_char>} <engine_option> ::= <unreserved>{<unreserved>}:<stream_char>{<stream_char>} <stream_char> ::= (<unreserved>|<escaped>|:|@|&|=|+|$|,) <mrl_char> ::= (<reserved>|<unreserved>|<escaped>) <reserved> ::= (;|/|?|:|@|&|=|+|$|,|[|]) <unreserved> ::= (<alphanum>|<mark>) <mark> ::= (-|_|.|!|~|*|'|(|)) <escaped> ::= %<hex><hex> <hex> ::= (<digit>|A|B|C|D|E|F|a|b|c|d|e|f) <alphanum> ::= (<alpha>|<digit>) <alpha> ::= (<lowalpha>|<upalpha>) <lowalpha> ::= (a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z) <upalpha> ::= (A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z) <digit> ::= (0|1|2|3|4|5|6|7|8|9) With <ipv6_reference> being an IPv6 address enclosed in [ and ] as defined in RFC2732.