UST and Graphics

By Chris Pirazzi. Thanks to Wiltse Carpenter for several ideas.

There is no UST or UST/MSC support for graphics (OpenGL and IRIS GL) yet. This document presents some probabilistic hacks you can use to get a reasonable UST for a graphics vertical retrace on a swapbuffer boundary today. Then it discusses graphics UST/MSC.

Getting Retrace USTs

You can do a job of getting a UST for vertical retrace which is as accurate as we would guess most people need it now (see below) by simply doing:
(single-buffered IRIS GL app)

  dmGetUST(&ust1);
  gsync();
  finish();
  dmGetUST(&ust2);

(double-buffered IRIS GL app)

  dmGetUST(&ust1);
  swapbuffers();
  finish();
  dmGetUST(&ust2);

(either, OpenGL app)

  dmGetUST(&ust1)
  glXWaitVideoSyncSGI(...);
  dmGetUST(&ust2);
These techniques give you an estimated UST for vertical retrace with one nice advantage over the "wait" call alone: you get an error bound. The retrace must fall between ust1 and ust2. In the second case, the returned retrace will be on a swap interval boundary (see swapinterval(3G)).

The finish() call in the IRIS GL case is necessary because on some boards, gsync() and swapbuffers() do not actually block until the next retrace or swap. Instead, the next graphics call after gsync() and swapbuffers() blocks. The finish() call is a nice harmless graphics call to use (you could also draw a point or clear the screen). If you were to port the IRIS GL cases to OpenGL, you'd need to issue some kind of graphics call for the same reason. It is not clear to us if glFinish() would do the trick.

You can get an even tighter error bound by doing this:

  dmGetUST(&oldust);
  glXGetVideoSyncSGI(&oldcount);
  while (1)
    {
      dmGetUST(&thisust);
      glXGetVideoSyncSGI(&thiscount);
      if (thiscount != oldcount)
        {
          dmGetUST(&afterust);
	  /* retrace falls between oldust and afterust
           * (thisust contains no useful information)
           */
          break;
        }
      oldcount = thiscount;
      oldust = thisust;
      /* perhaps sginap() or nanosleep() here if you cannot
       * hard-spin, but this may limit your accuracy.
       */
    }
The probability that afterust-oldust will be large is even lower than the probability that ust2-ust1 will be large for the techniques above. And again you get an error bound.

With any of these techniques, you can repeat the test until you get a measurement that is within some required error bound. You would therefore be trading off probabilistic execution time for accuracy. Not the best, but it works ok for a lot of people right now.

UST/MSC for Graphics

SGI should clearly provide graphics UST/MSC support. It is an interesting test of your understanding of UST/MSC to figure out what else SGI needs to provide in order for graphics UST/MSC to be useful.

Double-buffered graphics appears to fit quite nicely into the UST/MSC model. Comparing a double-buffered OpenGL context (or IRIS GL "context") with a memory to video VLPath:

To complete the UST/MSC picture, we would need a solid definition of MSC, an operation to get a UST/MSC pair, and an operation to get a frontier MSC.

Defining MSC and providing a UST/MSC pair is doable and SGI may do this. It would make hacks like those above unnecessary.

The frontier MSC operation for double-buffered visuals is where you have to be a little careful. The glXGetVideoSyncSGI() call is not it; that call returns the count of the frame that is being scanned out the output jack "now," not the count of the frame you are about to enqueue. The main reason why graphics UST/MSC has not been in high demand historically is that a graphics frontier MSC operation would not be very useful given the capacity of the graphics buffer (1), the reliability of single-processor IRIX process scheduling, and the lack of rendering speed guarantees offered by graphics hardware.

To understand this, consider what would happen if a VL app created a memory to video, VL_CAPTURE_INTERLEAVED (frame-based) VL path with a VL buffer of capacity 1. Since there is so little buffering between the app and the device, the app would have to run every frame time, guaranteed, in order to produce smooth output. Although you can easily run every 33 or 40ms on average on any SGI system, IRIX 6.2 and IRIX 6.3 still do not give you a solid guarantee that you will run that often all the time on single-processor systems (see Seizing Higher Scheduling Priority for more information).

A frontier MSC operation is useful when you can use it in combination with an adequately sized buffer to ride over process scheduling hiccups and still produce smooth, synchronized output. If your buffer is not as big as your worst-case scheduling holdoff, then a frontier MSC operation provides you with no more functionality than this hackish code:

  buf = vlGetNextFree();

  /* these are supposed to be atomic, but won't really be */
  dmGetUST(&now); /* get the time "now" */
  nfilled = vlGetFilled(); /* see how many things are buffered "now" */
  /* render "the right" frame */
  render_frame(buf, now + nfilled * ust_per_frame);

  /* enqueue that frame */
  vlPutValid(buf);
Graphics just makes this situation even worse since its frame rates are typically much higher than those of video (thus stressing the IRIX process scheduler even more), and since (unlike a video device) there is no guarantee that the graphics hardware will be able to render the primitives you send it in one frame time.

SGI supports process scheduling guarantees in the microsecond range on multiprocessor systems. But the lack of rendering speed guarantees still applies on these platforms. Practically though, many applications (such as movie players!) never even scratch the capability of the graphics engine and so this is not an issue. So a graphics UST/MSC would be useful on these platforms today.

In the future, SGI may provide additional process scheduling guarantees, or SGI may provide more-than-double-buffering for graphics. Either of these would make graphics UST/MSC useful on a wide variety of platforms.

For single-buffered graphics, there is no buffer in the UST/MSC sense. The application is totally dependent on sub-frame scheduling for smooth output. You could say that the frontier MSC didn't exist, or define it to be the same as glXGetVideoSyncSGI() so that you could use it with a UST/MSC pair operation to get a UST for the "current" frame.

If you have an application for which graphics UST/MSC would be useful today, even without the scheduling guarantees, let us know!