Header image with fluorescence microscopy image showing cells with supernumerary centrosomes

Geoffrey A. Charters MSc (Hons) PhD (Pathology)

Molecular oncopathologist

Introduction

Curriculum vitae

Biography

About me

Contact information

Permanent position

Geoffrey was given complete control over the software development of the partially implemented Weddel Tomoana Freezing Works mutton processing data acquisition and marshalling system, with a brief to complete basic implementation within a year. Progress was rapid, and well within that period reliability and functionality of the system were dramatically improved, and at year's end, an offer of continuing employment was made to him, which he accepted.

Refinement and extension

Bug hunt!

At one time, the stability of the Weddel process control software was quite poor. Operator interfaces would freeze, processors would crash either silently or after triggering the operating system crash dump routine, and most bizarrely, the main system clock would stop for a time, and then spontaneously restart, and always after just less than 22 minutes.

From very early on, these problems were expected to be in a device driver that had been written to implement inter-process, and inter-processor message passing. This was the suspect because it was the only code segment not written by Digital Equipment Corporation that had the necessary level of privilege to interfere with core operating system function. This highlights one of the many excellent features of the PDP-11 architecture: it had multiple states of operating privilege, and a protected memory addressing system, so rogue user-state processes simply could not address memory used by the executive, let alone corrupt it. However, a device driver operated as an extension of the executive, and could. Nevertheless, no amount of desk-checking revealed a coding error, nor was it clear how a coding error could lead to rare, intermittent faults.

A test system was configured comprising two computers doing nothing more than sending messages to each other as rapidly as possible, and using this, a debugging milestone was achieved: the system could be made to fail reliably, with the test process operating on one or other of the computers hanging awaiting I/O completion after a short period of operation. Analysis revealed a loop in the normally linearly linked pool of temporary memory available to the executive, and the finger was pointed squarely at the device driver, it having made the cardinal sin of deallocating its current I/O packet twice.

Still, inspection of the driver code did not reveal how this could be occurring; the logic appeared impeccable. Also puzzling was the rapid failure under test, when compared with the relatively low failure rate observed in normal operation. The crucial realisation was that while the driver was designed and seemingly coded to operate in full-duplex mode, due to the inherently sequential operation of the physical processes of the plant, it was rarely if ever actually called upon to do so.

A careful study of the I/O termination routines of the driver showed that one section was used for completion of both receive and transmit traffic, and that it was not reentrant, using a single local memory structure for two processes that could occur asynchronously. Superficially, the code looked fine: if following the flow of a transmit, it worked; if following the flow of a receive, it worked. It only failed if both events occurred essentially simultaneously. The solution was really quite simple, providing separate structures for the receive and transmit sides of the driver, and this was implemented and verified to resolve the issue using the test system.

Further thought about the consequences of a loop in the memory pool produced some startling conclusions. First, any task could hang awaiting I/O completion of any type, not just I/O on the flawed driver. This explained many of the unexpected process hangs observed.

Second, the active task list was maintained in the memory pool, and the introduction of a loop could logically remove some entries meaning that the task dispatcher was unaware these processes were active, and never allocated them any processor time. This accounted for further unexplained process hang events.

Third, clock interrupt handling involved decrementing a counter with the initial value of 1 for each interrupt received, and beginning to execute the clock process when that value became 0. The clock process itself executed its main functions, then incremented the counter. If it became 1 as a result, the process exited, but if not, a further clock interrupt must have occurred in the interim, so the clock process repeated its operations for this interrupt, incremented and tested the counter again, and kept doing so until eventually the counter became 1 and all pending clock interrupts had been accounted for. However, since the clock process was also in the active task list, it was also susceptible to being orphaned by a loop in the memory pool, thus it could stop processing pending interrupts. Any process waiting on a timed event would hang, and the system clock would stop, as had been observed. But why did it start again? The clock interrupt process was still busy decremented its counter, which became increasingly negative. The counter was a single precision 16-bit word, and after 65,536 decrement operations, it had cycled all the way back to 0 triggering the clock process anew and restarting the clock. The time it took to do this was 65,536 x 1/50 Hz, or 21.85 minutes. The clock stopping and starting was also explained!

There was more. The same flawed design for the PDP-11/34A RSX-11M driver was used for the corresponding RSX-11S drivers on the LSI-11/03 and LSI-11/23 systems, so they were also susceptible to the same faults. The intermittent controller hangs were also accounted for.

In one moment of insight, the cause of a host of seemingly unrelated system-wide faults became clear, and the solution obvious. Once remedied, system stability of a very high order was achieved.

Further improvements to the system were made. Large tracts of code were rewritten, greatly improving efficiency, supportability, cross-subsystem consistency, data integrity, and end-user experience. A link to the administration WANG VS computer system was implemented allowing the direct transfer of data rather than rekeying of reports. Since the only programming languages available on the VS were COBOL and what was essentially IBM 360 assembler, it was in the implementation of this that Geoffrey wrote the only COBOL program of his professional career, one more than he had hoped would ever be required.

Perhaps the most satisfying event during this period was the resolution of a set of obscure and relatively infrequent system problems, with symptoms ranging from hanging processes, crashing processors, and, most bizarrely, a system real-time clock that would occasionally stop for almost exactly 22 minutes, and then restart spontaneously! These had been vexatious issues for more than two years, and the relevant program listings and crash dump output would be scrutinised every month or so in the hope that there would be some new insight. Usually after a few days, the material was filed again in frustration. The advice of RSX-11 operating system experts from DEC was sought, but to no avail. Eventually though, a crack in the obscurity developed, and the pieces rapidly fell into place. Amazingly, these problems, affecting multiple processors of different types, and running different operating system variants, all had a single cause, and when it was identified and corrected, the implications for system-wide stability were profound. (Readers with a technical bent can find a description of how this issue was resolved in the box at right.)

He managed and implemented the transition of the process control software to newer hardware, a more standard Pascal complier, and a more advanced operating system. The parameters make strange reading now: kilowatt-consuming PDP-11/34As at the limit of their capacity with 248 kbytes of RAM and twin RK05 disk cartridge drives holding just 5 Mbytes each being replaced with LSI-11/73s with a massive 8 Mbytes of RAM and 71 Mbyte RD53 hard disks - two such systems in less than a tenth of the space used by the 11/34As!

In its final form, the process control system software comprised some 100,000 lines of program code in Pascal, Macro-11 assembler, and Aerial, the cross-assembler used for the Motorola microprocessor systems. Of this, proprietary DEC operating system code aside, perhaps 80% was Geoffrey's work.

Planning for the future

Conscious of the vulnerability that Weddel would have should he become unavailable, he implemented a staff training program that saw over 100 employees introduced to the fundamentals of computing and at least ten advance to simple programming. Two were to form his staff: one as application programmer, and one as a support person for small stand-alone systems, including PCs, which by then were becoming more common in the workplace. In part, this was due to the early recognition by Geoffrey of the trend to decentralised computing and networking.

Incredible though it may seem today, the public perception of computers in the early 1980s was that a mathematical genius was required to operate one. Since many office workers expected to find themselves lacking what they thought were necessary skills, they felt threatened by the prospect of computerisation, and strongly resisted it. Geoffrey saw an opportunity to combat this by convincing the company to buy an Apple Macintosh computer soon after they were released. He introduced members of the secretarial pool to it and they were soon revelling in the multiple type-faces and font-sizes offered by MacWrite (Mac Wright, incidentally, worked in the Tomoana Stores Department). This non-threatening introduction to the intuitive interface offered by the Macintosh platform lead to a growing understanding that things might be better, rather than worse with the advent of small computers.

At a time when ethernet was still vying with token-ring for the position of dominant networking architecture, and when it ran only at 10 Mbit/s on nearly inflexible 9.5 mm coaxial cables (10-BASE-5), Geoffrey strenuously advocated its adoption, and the making of ethernet compatibility mandatory for all new data processing equipment. He forced the installation of an ethernet backbone cable linking the main office with the beef processing plant, and the introduction of a fibre-optic link to the payroll department. In this latter case he was not entirely successful as the protocol used was multiplexed serial data, not ethernet; nevertheless, he was able to ensure that an upgrade to an ethernet link would be possible with just a change of transceivers.

Tragedy and crisis

During these Tomoana years, an event occurred that was to have a huge impact on Geoffrey's future. In early 1984, his elder sister Ruth, a prominent, vocal, and energetic feminist lawyer in Wellington, was diagnosed with bowel cancer. This was treated surgically, and the prognosis appeared fair, but very near the five-year survival milestone, it was found that she had developed liver metastases. From there, debilitating chemotherapy, and palliative treatment of pain were all that could be offered. She died on March 26, 1988, at the age of 33. This was the second cancer death in the immediate family, Geoffrey's father having succumbed to leukaemia in 1975.

These events were to have a major influence in 1994 when Geoffrey unexpectedly found himself at a decision point. The precipitating factor was the financial collapse of the Weddel Group with the loss of over 1,600 jobs at the Tomoana plant, including his. It has to be said that a degree of bitterness surrounded this closure. The management of the company did not act decisively while it still had the capacity to meet its contractual obligations to make redundancy payments, and no employee receiving anything whatsoever. It was the strongly held belief was that this failure was not due to any sudden unforeseen crisis, nor managerial incompetence, but was a deliberate strategy. It was not just the employees who were targeted this way. Farmer suppliers, contractors, and all other unsecured creditors went unpaid, while the overseas owners, ultimately the Vestey Group, simply washed their hands of the affair. Estimates of the total value of unfulfilled financial obligations ranged as high as $NZ200 million, and the impact on the Hawke's Bay community and economy was devastating, with echoes reverberating to the present day.

One asset had been protected, the Staff Pension Fund, and disbursements were made from this as its assets were liquidated. These, and savings, allowed Geoffrey an opportunity to contemplate the question "What now?".

Decision

Throughout these Tomoana years Geoffrey had maintained an interest in molecular biology and had contemplated returning to University for further study, the impediment to this being the very financially comfortable position he enjoyed at Tomoana. That impediment had suddenly disappeared, there were funds in the bank to support the student life, and his desire to learn more about the nature of cancer and contribute to the fight against it was understandably keen. At the very least, he wished to determine if he was capable of making such a contribution. Even to try, but fail, would have been better than forever to wonder if a potentially worthwhile contribution had not been made just because he had not applied himself to the task.

With this in mind, he began the process of returning to Auckland and enrolling for a Master of Science degree, some fourteen years after having completed his BSc, and with that second major in Cell Biology about to come into play.