Cal Poly and Hewlett-Packard have been working together to generate a project plan for the internal staff training, machine setup and configuration, migration, and user transfer from the old AIX system to the new HP system. This plan has been finalized and is reflected in the rough timeline referenced below.
Current targets from Cal Poly's perspective is to have the mail server and systems management system up by Mid fall quarter.Further detail is available under the "Chronology of Events" below.
The rough timeline for the project is also available.
Training for ITS support staff is also on-going as of July 8th, 1996 with several classes scheduled and several more to be scheduled. As these classes are taken, the information gained in them will allow increased information to be distributed to the users via these pages.
All of the UNIX Systems will be housed in the machine room in the Computer Science Building (Building 14)

Figure 1: Computer Science - Frank E. Pilling Building (Building 14)
Along with the six K420 machines, we are also going to receive 40 HP X-Terminals. Some of these X-Terminals will be housed on the second floor of the old Air Conditioning Engineering Building (Building 12).

Figure 2: Air Conditioning Engineering Building
Others are planned for the Reserve Room in the Kennedy Library and a location for the rest has yet to be determined. Some of these X-Terminals should be available by the beginning of Winter quarter 1997.
The replacement of the UNIX mail function on the AIX site oboe with the HP-UX Mail Server.
Implmentation of the additional machines.
The following are meaningful dates of events which have occurred during the process of acquisition of the HP systems.
December 11, 1995: Request for Proposal (RFP) issued to vendors.
January 26, 1996: Conceptual proposals due from vendors.
March 18, 1996: Final Response due from vendors.
May 31, 1996: Bid is awarded to Hewlett-Packard. Contract negotiations begins.
July 11, 1996: Contract is signed.
July 1996: Most of the Cal Poly staff involved in the project receive training on HP-UX administration for UNIX administrators, Logical Volume Manager, Operations Center, and Admin Center during four weeks of on-campus training.
Late July 1996: Ten of the 40 X-Terminals arrive on campus.
August 2, 1996: Equipment starts arriving
August 6, 1996: The second machine arrives in the machine room.
August 7, 1996: The third system is brought into the machine room along with a large quantity of boxes and unpacking begins.
August 8, 1996: The fourth system arrived in the machine room. With this system we have a fileserver, session server, management server, and mail server.
Mid August, 1996: One of the System Support group members receives training on Service Guard at an off-campus HP site.
August 19, 1996: HP's project coordinator was on campus today to identify both short and long range goals for the new clusters of machines. Short term goals have a due date of September 13th while long range goals are spread out over the next several quarters.
August 21, 1996: Based on the available time and personel, it has been descided to concentrate on bringing up the mail server and the systems management machines first. This will allow for relief on oboe of its sendmail functions and allow it to work solely on Information Server functions.
September 3-15, 1996: Conversion process in moving the UNIX email function from oboe to the new HP-UX machine that will take over the function.
Also installed second set of 2 - 2 GB Internal disk drives in each of the machines to bring their internal space up to 8 GB to allow mirroring of the root volume group.
September 16, 1996: A major problem was uncovered once the mail had been transferred from AIX to the new HP-UX Mail Server.
While simple file manipulation worked (e.g., copying, etc.), complex NFS file manipulation between the Mail Server and the mail clients (AIX machines) failed with the user session hanging. This would mean that no users would be able to access their mail from the AIX cluster, which is an unacceptable situation.
The problem was found to be a bug in the NFS implementation under AIX version 3.2.5.0. An AIX 3.2.5.1 machine was successfully tested as an NFS client. The next phase will be to see a) if we can get the fix tapes here for all seven of the production AIX machines and b) whether the AIX cluster exist with mixed versions of the AIX operating system in general and mixed versions of NFS in particular. If intermixed versions are workable, the cluster will be upgraded in parts over the next couple of weeks.
In the meantime, the mail will be migrated back to the AIX cluster with an expected uptime of tomorrow morning. While this is being done, other staff will be checking out the interoperability of the different AIX versions and ordering the tapes to upgrade the cluster to AIX 3.2.5.1.
NO mail should be lost in the process of this double migration. During the movement, all processes that would normally receive mail are set to ignore connection attempts. The sending machines will then queue the message for up to three to five days depending on their configurations.
When the tests are completed and evaluated, the next steps will be posted.
September 18, 1996: The fifth HP 9000/K420 arrives. This machine will be used for OpenMail testing before OpenMail and OpenTime are migrated to the Mail Server. Evaluations of the prior weekends events are still on-going and tapes for each of the IBM AIX systems have been ordered to bring them up to a compatible NFS fix level with the HP machines.
September 22 - October 13, 1996: Install applications and testing of the AIX upgrade that should eliminate the NFS incompatibility.
September 30 - October 4, 1996: One PC-LAN Support staff member and one Instructional Applications Support staff member take off-campus training on the installation and administration of HP's OpenMail product.
October 13, 1996: Install last of the AIX Upgrades and migrate mail from AIX to HP-UX.
The migration was successful with a few hitches. Mail is now running on the Service Guard package "polymail" site. The new site seems to be handling the load well as it starts to catch up from the previously queued twenty-four hours mail on other sites destined for the cluster. Part of the problem involved changing several of the kernel parameters on the new HP-UX system to allow for the process intensive nature of electronic mail.
System aliases were causing a problem in delivery and alias owners were mailed with instructions on how to correct the problem by issuing two commands from their accounts. This problem seems to be cleared as a result of the use of the two commands conveyed to them.
There have been some problems with misaddressed mail from sites outside the cluster and that is being researched at this time. This problem usually manifests itself by these messages stalling in the mail queue. To workaround this problem, these messages are removed from the queue and delivered manually when time is available. This problem has caused a few days of delayed delivery, but once recognized and the offending message moved, the system seems to be able to catch up within four to eight hours, depending on the backlog, during peak load.
November 13, 1996: Mail Server running well after one month.
After one month of operation, the mail server is processing roughly twice the number of messages a day as oboe did and is not backlogging messages. This is a remarkable improvement over the previous service. Prior to October 13, we saw a rate of around 45,000 message per day on the old hardware. Now, with several pathes and tuning sessions, we are getting reports of around 90,000 messages a day.
December 9, 1996: Final HP K-420 machine arrives along with 30 more X Terminals.
The final HP K420 system arrived today from the warehouse. The warehouse also received the final 30 X Terminals.
December 10, 1996: Final HP K420 is installed in the machine room.
The final K-420 was moved into the machine room and setup along with the other five machines which had arrived previously.
March 2, 1997: Mail services migrated to temporary site to free up production mail server for upgrades.
March 10, 1997: Events occuring since the March 2 migration to the temporary site.
March 18, 1997: March 16 migration of mail services back to production server postponed.
January 3, 1997: Project time line changed to reflect rearrangement of phases to provide better stability for the systems as they are installed.
This means that the installation of the Service Guarde clusters has been moved up to provide fail over capability. Installation of HP-UX 10.20 has also been moved up in order to support the 100BaseT cards sooner. Both of these events affect the project rough timeline.
February 11, 1997: The systems group coordinator put out an update on the project status.
March 26, 1997: End-of-quarter archives and migration of all user filesystems from the RISC/System 6000 sites to the HP file server.
Tuesday, March 25 at 6:00 PM, all AIX sites and the HP-UX sites in production were taken down and end-of-quarter archives were performed. Once the archives were completed on March 26, user filesystem migration was started. This involved moving and combining filesystems from the AIX systems to the new HP-UX file server.
March 31, 1997: Results of the filesystem migration performed during quarter break.
April 3, 1997: Change in the downtime schedule for April 6, 1997.
April 27, 1997: Update on the events that occurred during the April 27 downtime.
June 21-22, 1997: The scheduled downtime was used to migrate mail back to the production mail server and re-enable quotas on the fileserver.
Most problems experienced in the first couple of days following the outage were due to a system patch that the vendor requested Cal Poly to install on the system. Within two days of the outage, they requested that the patch be removed and the majority of the system problems west away. Problems included malfunctioning of system aliases and slow performance.
July 1997: Redefinition of the cluster and machine uses.
Because of performance problems with OpenMail and the original bid proposal from HP, the cluster has been redefined. What was to have been the compute server is becoming the faculty and staff OpenMail and OpenTime machine. The current email server will become the student OpenMail machine.
August 25, 1997: New user dot files are announced.
In anticipation of the new session server replacing the AIX machines on September 7, 1997, new user dot files were announced that provided compatability between the two platforms.
August 28, 1997: All user dot files are updated.
User dot files are replaced throughout the system unless they explicitly requested to be passed over on the update.
September 3, 1997: The new session server is announced and made available for logins.
The session server is made available for logins so users can have some advanced access to the new environment.
September 6-7, 1997: The system is unavailable during session server and ulib filesystem migration.
During the Septrember 6-7, 1997 downtime all systems receive an end-of-quarter archive and all user sessions are migrated to the new session server from the old AIX session servers. All of the old AIX session servers are pulled off-line except one which is renamed to rodin and has full access to all of the old AIX compilers.
The /ulib file system which is the home of the departmental libraries is also migrated to the mail server to promote high speed access to the system alias datafiles stored in those accounts.
September 15, 1997: The mail server "polymail" experiences a dual hard disk failure.
At approximately 11 PM on September 15, 1997, both the primary and mirrored copies of the root volume group start generating hardware diagnostics and the system crashes resulting in total unavailability of the UNIX mail server the following day. The system is up on the evening of September 16 with some password file problems which are repaired on the morning of the 17.
September 25, 1997: The mail server "polymail" experiences a problem causing a temporary outage.
The UNIX Systems Support Group is doing continued tuning of the system. While increaseing the number of NFS daemons on the mail server, NFS starts hanging and the system must be rebooted. Additional changes for NFS tuning are scheduled to be added on Sunday September 28, 1997 when the systems are down for scheduled maintenance.
October 8, 1997: A condition started appearing on the session server "polylog1" which generated high load.
After investigating the problem which seemed to coicide with users having problems with their incoming mailboxes. It was decsided to reboot the machine each morning until October 14, 1997 at which time we hoped to have the situation corrected.
October 14, 1997: Patches have been applied to the session server and the sendmail server.
Several performance related patches have been applied to the session server (polylog1 on October 11, 1997) and the sendmail server (polymail on October 14, 1997) in an effort to correct the performance problems we have been experiencing.
October 17, 1997: The detective work continues on the performance problems.
Several more machines receive the performance patches. The session server is on its second full production day since a reboot when it is notices that the performance problem is still there. The symptoms are increased system utilization of the CPU with less idle and user utilization. As a result, half the number of users are generating over twice the load than we would see fully loaded immediately after a reboot.
A new symptom is discovered. After a reboot with a full load, 1.5 GB of memory is free. On a second production day with fewer users, only .6 GB of memory is free. This becomes highly suspicious and similar to a problem noted on another machine. We are currently looking into this as a strong clue to what is happening within the cluster.
October 23, 1997: Hewlett-Packard sends a performance engineer to Cal Poly.
After the Hewlett-Packard engineer observed the system for a couple of hours, he recommended that we change the maximum amount of memory reserved for dynamic file buffers down to 10% of the total memory from the value was at which was 30%. It turns out that NFS file caching is handled in a single threaded manner and that the system could not keep up with the large number of buffers present with the larger memory configuration. Lowering the maximum size will decrease the cache hit rate, but allow the system to keep up with the I/O rate required to communicate with the file server and the mail server.
We will continue to monitor the system after this change scheduled for October 26, 1997.
October 26, 1997: The change is made to the maximum amount of memory that can be allocated to dynamic buffers.
October 28, 1997: The system has been up for over 48 hours and system load remains at a low level.
November 16-19, 1997: Systems Acceptance Testing on General Session Server.
NOTE: These pages are very dynamic at this point and changes may occur daily.
Revised by: George Westlund (gwestlun@calpoly.edu)