Seizing Higher Scheduling Priority

By Chris Pirazzi.

This page describes a simple but extremely useful feature of IRIX that gives your process probabilistically better real-time response.

Context

Some video applications need to respond to incoming events (video data, audio data, serial bytes, etc) with a corresponding outgoing event within a small amount of time. We call these low-latency applications. Such applications need to run more often on the CPU than applications which can buffer up larger amounts of incoming data before producing any output data.

For some applications (such as Sony-Style VTR emulation on RS-422 serial ports), the latency requirement is very tight compared to what IRIX normally delivers to a process (the requirement is 2-9ms in the case of the VTR emulation example), and the application cannot succeed unless the latency requirement is always met. Such an application thus requires latency guarantees. Currently SGI offers such scheduling guarantees on suitably configured multiprocessor configurations using the REACT/Pro software.

For other applications, all that is needed is a probabilistic improvement in latency. This page describes one quick schedctl(2) hack that can give you that probabilistic improvement. This technique has made the difference between acceptable and unacceptable response for many audio and video applications on SGI.

IRIX Scheduling

A normal UNIX process competes for CPU time with all other processes. Its priority relative to the other processes, though influenced by its UNIX "nice" value, is really determined by an I/O vs. CPU usage heuristic buried in the UNIX kernel. This heuristic was designed in the time of teletypes and is not always appropriate for modern audio and video applications. Furthermore, when a process of higher priority than the currently running process becomes runnable, UNIX may let the currently running process continue for a while (say 10ms) before actually switching to the other process.

IRIX allows you to rip a process out of the standard UNIX priority scheme, and give it a "non-degrading" priority relative to other processes. You use schedctl(NDPRI,...) to set this non-degrading priority level.

The normal UNIX processes (called "timeshare" processes) occupy a band of process priorities called the "normal" band. You can raise your process priority to a band of priorities called the "high" band, which means that whenever your process is runnable, it will run instead of any normal process. You set this priority using:

  int pri = a priority between NDPHIMAX and NDPHIMIN (inclusive);
  schedctl(NDPRI, 0, pri);

A process with non-degrading priority in the "high" band gets other kinds of special treatment too. When such a process becomes the most runnable process, IRIX will switch to it immediately rather than let the currently running process run until the next 10ms scheduler tick. Non-degrading high-priority processes are granted finer timer resolution (1ms instead of 10ms) on some systems (on the other systems, all processes always have 1ms timer and sleep resolution).

You must have superuser permissions in order to raise your priority level to the high band (see below for a hack which makes this more palatable).

On IRIX 6.3 and earlier OSes, the high band consists of priority 30 (the most favored priority) through 39 (the least favored of the high-band priorities). The normal band starts at 40 and goes up. Several kernel daemons such as bdflush run at NDPRI 39. You should choose a priority more favored than 39, such as 35 or 30. It may be a good idea to avoid 30 so that a user can run another process at higher priority if he or she desires.

Dealing With the Superuser Thing

Some users have complained that the need for root permissions to get the high non-degrading scheduling priority is a problem. They want the root user to be able to grant scheduling permission to an application without also granting all the other UNIX superuser permissions such as file permissions. There is a trick to work around this which is acceptable for many users. The trick is to make your application setuid root (

chown
root.sys foo; chmod 4755 foo

), and then to add this code sequence to the very beginning of your program (first line of main):

int main(int argc, char *argv[])
{
  /* Absolutely the first thing you do! */
  setreuid(getuid(), getuid());
  
  ... this part executes without root euid ...
}

When you need root privileges to affect run priority, do this:

{
  ... initially not running with root euid ...

  setreuid(getuid(), 0);

  /* do what you need to do */
  schedctl(NDPRI, 0, pri);
  ...

  /* Get rid of root privileges */
  setreuid(getuid(), getuid());
  
  ... again this part does not have root euid ...
}

With this hack, the only part of your program that runs with root permissions (root effective user id) is the part within the setreuid() pair.

If you are very worried about security at your site, don't use this trick. Because it involves making the application setuid, there are hundreds of security holes you suddenly have to be very careful about. Only use this trick if you want to give trusted users a convenient way to run your app high-priority, but have all file permissions be those of the user. A future IRIX release may make this possible in a robust and secure way via a capabilities mechanism.

Here is an example of the tricky security issues this brings up: if you are using C++, some of your code (constructors for statically declared objects in global or function scope) will get called before main(). That code will run with root euid. Whether you're in C or C++, library initialization routines for the libraries with which you link will also get called before main, with root euid. You can work around this by creating your own init routine and making sure it runs first. As you can see, it gets tricky.

Another trick is to change the kernel tuneable variable ndpri_hilim, which determines the non-degrading priorities a non-superuser process is allowed to request. The default is 128 (super-lame priority). If you change it to a value between 30 and 39 like this:

become root
type: systune -i
type: ndpri_hilim 35
answer yes to the "are you sure?" question
type: quit
reboot your system

Then any normal process will be able to make itself non-degrading high priority up (numerically, down) to the value you specify. Again this hack is for systems with trusted users: a user who can run ndpri hi processes can deadlock the system, as we describe below.

Careful: Priority Can Be Dangerous

Don't forget that some of the timeshare processes, such as the X server, must have a chance to run occasionally in order for your system to be useful. A non-degrading high-priority process which does not yield at least some of the time will cause a single processor system's user interface to hang.

The telltale sign of this is that all GUI activity other than the mouse cursor freezes, and then after a certain amount of mouse movement, even the mouse cursor freezes. The X server is actually frozen from the moment that the GUI freezes. The mouse cursor keeps moving for a while because the mouse driver (which executes at interrupt level in the kernel) is the one that moves the mouse cursor. The mouse driver continues to update the mouse cursor until the queue of mouse events between the mouse driver and the X server fills up, at which time the mouse driver begins discarding mouse events and not moving the cursor. Your system is dead.

You need to make sure your app is not running periodically. Sometimes your app has a natural way to do this. For example, video apps which go to sleep on a VL file descriptor, and wake up and do small amounts (ie, way less than one field/frame time) of processing every time a video buffer arrives, need not be changed at all to run with non-degrading high priority.

Here is a trick for CPU-bound applications which are able to curtail their usage of the CPU dynamically: a process which is the highest non-degrading priority will run if it is runnable. Therefore, if the process wants to use P percent of the CPU, then the process should periodically measure (using a simple timer call such as dmGetUST(3dm)) the amount of time T it has been running since it last went to sleep, and then explicitly sleep for T*(100-P)/P (using select(2), nanosleep(2), sginap(2), or similar call).

Another trick that can greatly ease debugging of non-degrading high-priority processes is to connect a serial terminal of some kind (a dumb terminal or another computer) to the machine's serial port, and run a login on that terminal at an even higher non-degrading priority than the application being debugged. If the application goes haywire, then you can still kill the application from the serial port even though the system's GUI is frozen. When you kill the errant application, the system is back to normal.

Before you try out the schedctl() hack, make sure that your problem is one of latency and not bandwidth. If your application performance is limited by the CPU's computational ability rather than the latency with which your application can react to input events, and you cannot dynamically curtail the amount of CPU you need, then enabling this scheduling feature will almost definitely not help you. In fact, it may cause your process to hang the system as described above.

The Catch

The following information applies to any processor on an SGI system which has not been dedicated to real-time tasks using the REACT/Pro software. So it applies most strongly to single processor machines.

We've been very careful to say that schedctl() gives you "probabilistic" latency improvements. This is because in the CPU priority scheme of all IRIX releases up to IRIX 6.3, there are many pieces of code in the IRIX kernel that execute in exclusion of all IRIX processes, regardless of their priority. For example, an interrupt handler or a system call handler which takes a long time will hold off all user processes. Typically a non-degrading high-priority IRIX process will have no problem getting scheduled, say, every 5 milliseconds. But occasionally it will experience one of these holdoffs. The holdoffs can range anywhere from 5ms all the way to hundreds of milliseconds. They can occur every couple of seconds or they can be tied to some user action such as window movements. The larger holdoffs are identified as bugs and we fix them regularly. But currently, there is no official, supported upper bound for process holdoffs. Currently we only offer guarantees of when a process will run on multiprocessor systems with REACT/Pro.

Don't be surprised if this changes in a future IRIX release.