Motion JPEG on SGI Systems

By Michael Portuesi.

Thanks to Chris Pirazzi, Eric Bloch, Angela Lai, the SGI patch database, and a cast of thousands for providing various tidbits of obscure information and verifying its correctness.

Introduction

The following is a discussion of the development of Motion JPEG compression hardware and software at Silicon Graphics. This document was created to help you understand the technical limitations and workarounds which have developed with each release, so you may create software and movie files which are backwards-compatible with earlier IRIX releases.

Hardware Requirements for Motion JPEG Compatibility

For the original Cosmo-1 board, a movie file must have the following attributes in order to be processed by the board in realtime. Likewise, the same restrictions apply when recording movie files:

Typically, an NTSC full-frame movie created with the Cosmo hardware is either 480 or 496 lines high. A full-frame NTSC image usually contains 486 lines of useful information. If the 480 height is chosen, some lines of information are lost. If the 496 height is chosen, all lines of useful information are retained, but the additional ten lines contain random noise.

The height of PAL video happens to meet the hardware's multiple of 8 requirements, so no discordance exists between the height of the actual video frame and the height of the image track in the movie file.

For the O2 built-in JPEG compression hardware, the constraints are the same as Cosmo. The constraints were retained for compatibility between the two boards.

The IMPACT Compression board will accept motion JPEG movies recorded on either Cosmo or O2. Also, some of the width and height restrictions have been relaxed, though they are not known to the author as of this writing.

Split-fields Frame Arrangement

Prior to IRIX 6.3, the finest level of granularity supported by the Movie Library API was the frame. Fields, as encountered in interlaced video, were an unknown concept to the Movie Library.

The original Cosmo JPEG hardware recorded and compressed fields. To store these into a movie file, the compressed data for each field in a frame was butted together for form a field pair, and the field pair was written together as one frame into the movie file. The following diagram depicts how each frame was stored on disk:

This is referred to as "split fields" by dminfo.

An application can retrieve the data corresponding to a particular frame. However, there is no way to know the start point of the compressed data for the second field without searching the data.

Field-based Movies

Beginning with IRIX 6.3, the Movie Library API supports field information in the image track of a movie file. For more information about video fields, consult the Lurker's Guide document: Fields: Why Video is Crucially Different from Graphics.

As before, the two fields are stored together as a field pair, but now there may be an optional gap between the fields. The new API contains a call, mvSetTrackDataFieldInfo(), to allow an application to specify where the data for second field starts. The application specifies the size of the first compressed field in bytes, the size of the gap between the two fields, and the size of the second field.

The Movie Library has a call, mvTrackDataHasFieldInfo(), which can be used in a program to determine if a data chunk in a movie track contains field information. To retrieve the field offset information if it exists, use mvGetTrackDataFieldInfo(). Each of these calls has a manual page.

Note that even though you can read and write only whole frames of data, the field calls allow you to find the start of the data for the second field in a buffer of compressed data. The gap is not present in the data buffer returned by the libmovie read/write calls, for sake of compatibility. Therefore, the offset to the start of the data for the second field is the size of the first field.

If you are looking at the file offsets for the data chunks in a movie file, the gap does become important. Then, the file offset for the start of the second field in a data chunk equals the file offset for the start of the chunk plus the size of the first field, plus the size of the gap, as in this diagram:

Movie applications based on IRIX 6.2 or earlier cannot open movies created with field information. This is because the 6.2 Movie Library cannot parse the new QuickTime atoms which provide the field information, and will thus return an error.

Field duplication

If one or more video fields were dropped during recording, the recording software is expected to correct for this deficiency by substituting a copy of previously recorded field into the portion of the current frame which lacks data. dmrecord contains a simple algorithm which performs this substitution.

Mediarecorder contains a very sophisticated algorithm for substituting dropped fields. It uses two disciplines for choosing replacement fields from earlier in time. With temporal duplication, mediarecorder will substitute the most recently captured field in place of the missing field. This discipline is most suitable for a live video signal. With spatial duplication, mediarecorder will substitute the most recently captured field of the same type (F1 or F2) in place of the missing field. This discipline is most suitable for video captures from the workstation screen. Mediarecorder saves a token to the movie file denoting which discipline was used to handle any dropped frames, as a pragma to movie playback software.

If the output movie is not to contain field information, Mediarecorder simply copies the data for lost field(s) into the new frame. However, if the output movie does contain fields, Mediarecorder can refer to earlier fields in the file without duplicating the data. If the missing field is an F1, Mediarecorder uses an unusually large gap between the F1 which occurs earlier in the file and the very recent F2. If the missing field is an F2, Mediarecorder uses a negative gap to refer to an F2 which starts before an F1.

Here is a diagram showing how the data chunks are laid out in the movie file when a F1 field is lost, and must be substituted with an F1 from earlier in the file:

Here is the same diagram, showing how the chunks are laid out in the event an F2 is lost. Note the negative gap which refers to an F2 occurring before the F1 in the file:

For more information on spatial and temporal field duplication, see the Lurker's Guide documents Fields: Why Video is Crucially Different from Graphics and Fields, F1/F2, Interleave, Field Dominance And More.

Pixel Aspect

Pixel aspect is a floating point number defined in the dmconvert manual page as "the vertical extent of a pixel divided by its horizontal extent." The pixel aspect is not to be confused with the aspect ratio of a entire video frame.

Video recorded with square pixels has a pixel aspect of 1.0.

Prior to IRIX 6.3, dmrecord would always set the pixel aspect ratio for a movie's image track to 1.0, regardless of the video signal it was recorded from. In IRIX 6.3, dmrecord sets the pixel aspect ratio correctly for all forms of digital video. See the Lurker's Guide document Square and Non-Square Pixels for the precise aspect ratios.

Most unfortunately, in IRIX 6.3 Media Recorder defines the pixel aspect as the horizontal extent divided by the vertical, in conflict with dmrecord, dmconvert, and dminfo. So movies recorded with Media Recorder will not agree with those recorded by dmrecord.

It is unfortunate that the pixel aspect is not stored as a genuine fraction. And given the inconsistencies with pixel aspect ratio in movie files to date, it seems best not to rely upon them. A more reliable method to determine pixel aspect is to look at the width and height of the image, and to infer the pixel aspect ratio from the image dimensions. A 720 pixel wide image that is 480 or 496 pixels high is likely to be Rec. 601 525 video, and a 720 pixel wide image that is 576 pixels high is likely Rec. 601 625 video.

Audio/Video Interleaving

Audio/video interleaving refers to the manner in which data chunks for audio are physically alternated, or "interleaved," with the video data chunks in the movie file.

In IRIX 6.2, the dmconvert utility performed this interleaving backwards from the manner in which the Movie Library playback engine expects to see it; it placed each data chunk containing video before the corresponding audio chunk. This problem was corrected in dmconvert from IRIX 6.3 on forwards.

The Media Recorder application is not guaranteed to perform uniform interleaving of audio and video data chunks if it is recording uncompressed data, or data compressed via a motion JPEG codec. This is because in these situations, it schedules the writes to disk opportunistically, as internal buffers from each audio and video source are ready and as availability of disk bandwidth permits.

32 and 64-bit offsets

Prior to IRIX 6.3, the largest data offset which could be expressed in a QuickTime movie is 32 bits in length. Unfortunately, this restricts a QuickTime movie to be less than 2 GB in length, a serious restriction for realistic amounts of video, and especially for uncompressed video.

In IRIX 6.3, SGI introduced a new scheme for storing 64-bit offsets into QuickTime movies. A new atom for describing chunk offsets was introduced, which is used preferentially to the standard chunk offset descriptor. This new chunk offset atom holds a 64-bit offset index.