Techniques for Implementing UNIX-Style Pipelines on the 1100/2200

Introduction

This presentation is aimed at three types of 1100/2200 professional. First, programmers and technical support personnel who develop or support batch applications. Second, system software designers and programmers who work for Unisys or independent vendors. Third, anyone who writes utility processors such as those distributed by the UNITE Program Library Interchange (UPLI).

Unix provides a feature whereby the standard output of one program can be piped directly into the standard input of another. (MS-DOS provides a limited version of this feature.) The vertical bar ('|') is the pipe operator. For example, if I want to show only those files in my current directory containing the string 'sjm', I can type:

        ls  |  grep sjm

The 'ls' command lists files in a directory (like the MS-DOS 'dir' command). I've piped its output directly into 'grep', which is a powerful utility that searches for strings specified by regular expressions. The resulting output from 'grep' will be a list of those files that contain the string 'sjm'.

You can see what a powerful capability this is. Since OS 1100 provides no similar mechanism, this presentation examines how we might implement pipelines in user batch jobs. We will look at a couple of examples of how pipelining techniques can be used to help write smarter batch jobs.

Both of our examples will involve passing data between programs via a temporary file containing ASCII data images in the form of Stream Generation Statements (SGS), to be processed by the Symbolic Stream Generator (SSG).

Example 1: Disk Space Availability

Consider the following operational problem. At times my site's available fixed disk space is very limited, depending on what batch jobs are running. I have a batch job that needs 50,000 tracks for a very short period of time, but I don't want it to cause files to roll out. Instead, I would like this job to determine how much fixed disk space is available. If there's less than, say, 100,000 tracks available, the job should wait a while and try again. Sooner or later some other batch job will finish and release enough space.

Traditionally, disk space has been monitored using small utility processors that were written in assembly language and passed around from site to site. There are many of these, with names such as @DISCS, @MS, @AVAIL, @DISQUE, etc. All of these share a couple of common features. They read OS 1100's Logical Device Access Table and produce a printed report showing the names and status of configured disk drives. And they require a person to read and interpret their output.

The printed output from these processors has column headings, spacing and page breaks. While these make the report pleasant for a person to read, they make it difficult for another program to read. What another program needs is simply the data free from embellishments that aid human readers.

A common solution to this problem is to breakpoint the print to a file and then edit this file with a text editor, such as @IPF or @ED. A text editor macro could be used to massage the print file into a format that can be read by a program. This can work well but has three problems. First, writing the text editor macro is very time consuming and rather boring. Second, you're never really sure that the text editor macro is bullet-proof. After months of working correctly it may fail because a rarely seen warning message shows up in the print and disrupts the macro. Third, healthy paranoia tells you that the next release of the processor may slightly modify the print file format and your text editor macro might have to be re-written.

Another solution is to write a program specifically to solve your problem. We might choose to write an assembly language program to check for total available fixed disk space. The main drawback to this approach is that writing single purpose programs means we are constantly be re-inventing the wheel. We would rather have a set of general-purpose utilities that can be used as building blocks for a wide variety of applications.

So a better solution would be to have a general-purpose disk space utility processor produce a standard System Data Format (SDF) file containing just the selected data in a simple, program-readable format (i.e., ASCII). This file could then be piped into another program in a manner analogous to pipelining in Unix.

@DISKPL is such a program. Written in assembly language (MASM), it is approximately 300 lines long. The only thing it does is read the Logical Device Access Table and write a record for each configured disk drive. @DISKPL writes these records to a temporary file called PL$$DISKPL. It does this by calling a general-purpose MASM subroutine, PIPELINE.

PIPELINE is a 300-line assembly language subroutine that provides three entry points for pipelining--open the pipeline, write the pipeline, close the pipeline. This subroutine is collected with the calling program.

As part of its initialization code, @DISKPL calls PIPELINE to initialize the pipeline. PIPELINE assigns a temporary SDF file called PL$$xxxxxx (where 'xxxxxx' is the processor name), attaches an @USE name of PL$$ to it, and writes the SDF header. @DISKPL then calls PIPELINE to write the RELATION SGSs to the pipeline (see below). Then, for each disk drive found, @DISKPL calls PIPELINE to write an SGS describing that device. As part of its termination logic, @DISKPL calls PIPELINE to write the SDF terminator.

After @DISKPL terminates, PL$$ is available just like any other temporary file. Its format makes it ideal input for an SSG skeleton, but it can be input to any other program capable of reading SDF files, including ED and COBOL (the COBOL 'UNSTRING' verb would come in handy). Figure 1 shows a sample PL$$ file produced by @DISKPL.

Figure 1 Sample SGSs from @DISKPL

  RELATION   DISK  1  DISK_LDATX        . LDAT INDEX
  RELATION   DISK  2  DISK_DEVNAM       . DEVICE NAME
  RELATION   DISK  3  DISK_EQUIP        . EQUIPMENT MNEMONIC
  RELATION   DISK  4  DISK_STATUS       . STATUS (UP,DN,SU,RV)
  RELATION   DISK  5  DISK_PACKID       . LOGICAL PACK-ID
  RELATION   DISK  6  DISK_PREP         . PREP FACTOR
  RELATION   DISK  7  DISK_TRKAVL       . # OF TRACKS AVAILABLE
  RELATION   DISK  8  DISK_FIXREM       . PACK TYPE (FIX,REM)
  RELATION   DISK  9  DISK_ASGCNT       . ASSIGN COUNT
  .
  DISK  1  DA0   MDISK  UP  FIX101  112  25680  FIX   \  .
  DISK  2  DA1   MDISK  UP  FIX102  112  48501  FIX   \  .
  DISK  3  DA2   MDISK  UP  REM101  112  16451  REM   22 .
  DISK  4  DA3   MDISK  UP  REM102  112  26550  REM   22 .
  DISK  5  DA4   MDISK  UP  REM103  112  28671  REM   22 .
  DISK  6  DA5   MDISK  UP  REM104  112  91397  REM   6  .
  DISK  7  DA6   MDISK  UP  REM105  112  46216  REM   1  .
  DISK  8  DA7   MDISK  DN  \       \    \      \     \  .

The pipeline (PL$$) file contains SGSs with two different labels. The DISK SGS describes the disk drives configured on the system. There are fields desribing device name, pack name, device type, fixed or removable usage, etc.

The RELATION SGSs describe the fields on the DISK SGSs. If we think of the DISK SGSs as a set of third normal form relations describing disk drives, then we can think of the RELATION SGSs as the relational catalog. Thus, for example, the first RELATION SGS specifies that the first field on each DISK SGS is called 'DISK_LDATX'. The RELATION SGSs provide symbolic names that can be used in SSG skeletons to refer to fields on the DISK SGSs. By using symbolic names we make our skeletons easier to read and maintain. In addition, symbolic names allow the designer of the upstream program (e.g., @DISKPL) to make future changes in the order of the DISK SGSs without adversely affecting downstream programs.

Figure 2 shows part of a runstream that will check to ensure that there is at least 100,000 tracks of available fixed disk space. It invokes @DISKPL and then processes the SGSs in the resulting PL$$ file with an SSG skeleton, FIXEDSKEL, in Figure 3.

Figure 2 Batch Job That Waits for Fixed Disk Space

  @RUN    BIGJOB,,SJM
  @ .
  @DISKPL
  @SSG    SKELFILE.FIXEDSKEL,PL$$.
  SGS
  MINFIX    100000  . minimum acceptable fixed disk tracks
  WAITTIME  2       . wait time (minutes) before trying again
  @EOF
  @EOF
  @ .
  @ .  We now have lots of fixed mass storage, so . . .
  @ .
  @ASG,T  BIGFILE.,F/25000//50000
  etc.

Figure 3 SSG Skeleton "SJM*SKELFILE.FIXEDSKEL"

  1: *INCREMENT R_X TO [RELATION]
  2: *SET [RELATION,R_X,3,1]  =  [RELATION,R_X,2,1]
  3: *LOOP R_X
  4: *.
  5: *CLEAR TOT_FIX_TRKS           . Total fixed tracks available
  6: *.
  7: *INCREMENT D TO [DISK]        . For each disk drive
  8: *IF [DISK,D,DISK_FIXREM,1] = FIX  AND  ;
  9:     [DISK,D,DISK_STATUS,1] = UP
 10: *SET  TOT_FIX_TRKS = TOT_FIX_TRKS + [DISK,D,DISK_TRKAVL,1]
 11: *ENDIF
 12: *LOOP . D
 13: *.
 14: *IF +TOT_FIX_TRKS  <  +[MINFIX,1,1,1]
 15: *DISPLAY,O 'Only [*TOT_FIX_TRKS] tracks available';
 16:            'fixed disk.'
 17: *DISPLAY,O 'I''ll wait [WAITTIME,1,1,1] minutes.'
 18: *WAIT 60*[WAITTIME,1,1,1]
 19: #DISKPL
 20: #SSG    [SOURCE$,1,1,1],PL$$.
 21: SGS
 22: MINFIX    [MINFIX,1,1,1]
 23: WAITTIME  [WAITTIME,1,1,1]
 24: #EOF
 25: #EOF
 26: *ENDIF

Lines 1-3 of FIXEDSKEL create global numeric variables from the RELATION SGSs (see Figure 1). These variables allow the skeleton to use symbolic field references, rather than numeric field references. This makes the skeleton more readable.

Lines 5-12 calculate the total number of tracks available on fixed mass storage. Line 14 tests whether this exceeds the minimum acceptable. If not, then the skeleton notifies the computer operator (lines 15-17), waits the required number of minutes (line 18), and then re-invokes the @DISKPL and FIXEDSKEL (lines 19-25). This will loop until the requird minimum number of tracks of fixed mass storage is available; then it will proceed to the next ECL statement in the job.

There are many other uses to which @DISKPL may be put. For example, a related processor, called @DISK, writes a report suitable for human readers. It does this by invoking @DISKPL, then reading the PL$$ file that @DISKPL produces. @DISK is written in SSG.

Example 2: Selecting Files from the MFD with MSAR

The example of piping disk space availability data was used above merely for illustrative purposes. There are numerous other applications of pipelines that we could implement.

Consider this problem. I want to read the Master File Directory (MFD) and select all files on our system with qualifier 'SJM' that have not been referenced for at least one year. I then want to change the security attributes for all files selected: in particular, I want to attach an Access Control Record.

Rather than develop homegrown code, we will use a commercially available product--the Mass Storage Analysis and Retention (MSAR) utility. MSAR is developed and supported by TeamQuest Corporation, and jointly marketed with Unisys. Beginning with release 5R1, MSAR can select files from the MFD and write a set of SGSs. These SGSs can then be input to a user-written SSG skeleton. (And, of course, we could process the SGSs with a text editor or a high-level language.)

Figure 4 contains the MSAR commands that will select all files with qualifier 'SJM' that have not been referenced in at least one year. MSAR writes an SGS for each file selected into a file supplied by the user. This file may be temporary or catalogued. In Figure 4, I provided a temporary file called PL$$. The 'SSG_ADD' command directs MSAR to invoke a user-supplied SSG skeleton. In my example, I put the skeleton in 'TPF$.ACRSKEL'.

Figure 4 ECL to Select and Change Files

  @ELT,IQ  TPF$.ACRSKEL   . Define my skeleton
  #SIMAN,B                . B = SIMAN batch mode
  *INCREMENT M TO [MFDF]  . For each file selected
  UPD FIL = [MFDF,M,1,1]*[MFDF,M,2,1].
    FIL_ACC = ACR_CON
    ATT_ACR = ARC001  ACR_OWN = SECOFF ;
  *LOOP . M
  #EOF
  @EOF  . End of skeleton
  @ .
  @ASG,T       PL$$.,F    . Temp file for SGSs
  @MFDRPT,I    PL$$.
  QUALIFIER    SJM
  REF_DAYS_GT  365
  SSG_ADD      'TPF$.ACRSKEL'
  @EOF

Figure 5 shows a sample set of SGSs produced by these MSAR commands. MSAR would write these SGSs into the file I supplied (PL$$, in my example in Figure 4).

Note that MSAR creates one SGS for each file selected. The SGS has the label MFDF and contains various data about the file. My SSG skeleton uses the MFDF SGSs to generate Site Management Complex (SIMAN) commands to attach ACR 'ARC001' to the selected files.

Figure 5 SGSs Produced by MSAR

  MFDF SJM          OLDFILE      2 SJM          TECHOPS ;
  FIXED 0 2000 '''' '''' ''VP'' ''F'' '''' FAS021
  MFDF SJM          SAVEDATA     1 PREP         TECHOPS ;
  FIXED 0 2000 '''' '''' ''VP'' ''F'' '''' FAS034
  MFDF SJM          FISCAL92     1 PREP         TECHOPS ;
  FIXED 0 2000 '''' '''' ''VP'' ''F'' '''' FAS002

In addition to querying the MFD, MSAR can write another set of SGSs that provide enough information so that you can write an SSG skeleton that will re-create your TIP File Directory. You can periodically run an MSAR job to save your TIP File Directory as a set of SGSs. After a TIP initialization boot, you can then run another job to read these SGSs and re-create your TIP file directory (using FREIPS or TREG/TFUR).

When selecting files from the MFD via MSAR, you can sometimes generate more MFDF SGSs than current levels of SSG can handle. SSG's DBANK size has been limited to 262,000 words. This scaling limit is removed in SSG 23R1, due to be released with System Base 5R3.

Conclusion

The methods presented in this presentation provide several important benefits:
  1. Code that is tightly coupled with OS 1100 is isolated in a small set of data extraction routines, such as DISKPL and MSAR. It is important to minimize the amount of such code because it usually must read Unisys proprietary data structures that are subject to change without notice. In addition, this code is often written in a proprietary language (such as MASM or PLUS).
  2. By isolating OS-specific data extraction code from reporting and decision-making logic, there is almost no danger that, for example, changes to report formats could have adverse effects on the data extraction code.
  3. The logic needed to process the selected data can be written in whatever language suits you. I think SSG is ideal, but other possibilities are ED, IPF, COBOL, Fortran, MASM, and C.
  4. General-purpose data extraction routines, such as DISKPL and MSAR, can support a large number of possible applications, limited only by your imagination. This provides a "building block" approach to system management.
  5. With system data available in program-readable form, you can write smarter batch jobs. Rather than die without warning due to lack of disk space, missing files, etc., your batch jobs can diagnose the problems they're most likely to face. If a problem is found, the job can take whatever action you deem appropriate (wait and try again, notify the computer operator, etc.). I don't know how you're going to achieve automated operations without this kind of capability, although vendors of automation 'solutions' never seem to mention it.
  6. Pipelining techniques increase the productivity of support personnel, who often must perform ad hoc tasks at a demand terminal.

I encourage designers of 1100/2200 software to consider add pipelining to their products. Almost any program that has a 'list' or 'select' function is a candidate for piping its output.

@DISKPL, @DISK, the PIPELINE subroutine, plus a host of other pipelining programs, are part of the Group W Toolset, written and supported (so far as is feasible) by Tom Nelson, Bill Toner, and me. You can download it from this Web site.

Bibliography

Acknowledgement

This paper was originally presented at the Spring 1993 UNITE Conference in Nashville, Tennesee in April 1993. Minor changes have been made in this Web version.

Revised 1998-05-19