A few weeks ago, I wanted to use one of the vintage IBM 1401 mainframe computers at the Computer History Museum, but the computer wasn't working.1 This article describes the multi-week repair process to get the computer working again.
The problem started when the machine was powered up at the same time someone shut down the main power, apparently causing some sort of destructive power transient. The computer's core memory completely stopped working, making the computer unusable. To fix this we had to delve into the depths of the computer's core memory circuitry and the power supplies.
Debugging the core memory
The IBM 1401 was a popular business computer of the early 1960s. It had 4000 characters of internal core memory with additional 12000 characters in an external expansion box.2 Core memory was a popular form of storage in this era as it was relatively fast and inexpensive. Each bit is stored in a tiny magnetized ferrite ring called a core. (If you've ever heard of a "core dump", this is what the term originally referred to.) The photo below is a magnified view of the cores, along with the red wires used to select, read and write the cores.4 The cores are wired in an X-Y grid; to access a particular address, one of the X lines is pulsed and one of the Y lines is pulsed, selecting the core where they intersect.3
In the 1401, there are 4000 cores in each grid, forming a core plane that stores 4000 bits. Planes are then stacked up, one for each bit in the word, to form the complete core module, as shown below.
To diagnose the memory problem, the team started probing the 1401 with an oscilloscope. They checked the signals that select the core module, the memory control signals, the incoming addresses, the clock signals and so forth, but everything looked okay.
The next step was to see if the X and Y select signals were being generated properly. These pulses are generated by two boards called "matrix switches", one for the X pulse and one for the Y pulse.5 Some address lines are decoded and fed into the X matrix switch, while the other address lines are decoded and fed into the Y matrix switch. The matrix switches then create pulses on the appropriate X and Y select lines to access the desired address in the core planes.
The photo below shows the core memory module and its supporting circuitry inside the 1401. The core memory module itself is at the bottom, with the two matrix switch boards mounted on it. Above it, three rows of circuit boards (each the size of a playing card) provide the electronics. The top row consists of inhibit drivers (used for writing memory) and the current source and current driver boards (providing current to the matrix switches). The middle row has 17 boards to decode the memory addresses. At the bottom 19 sense amplifier boards read the data signals from the cores. As you can see, core memory requires a lot of supporting electronics and wiring. Also note the heat sinks on most of these boards due to the high currents required by core memory.
After some oscilloscope measurements, we found that one of the matrix switches wasn't generating pulses, which explained why the memory wasn't working. We started checking the signals going into the matrix switch and found one matrix switch input line showed some ringing, apparently enough to keep the matrix switch from functioning.
Since the CHM has two 1401 computers, we decided to swap cards with the good machine to track down the fault. First we tried swapping the thermal switch board (below). One problem with core memory is that the properties of ferrite cores change with temperature. Some computers avoid this problem by heating the core memory to a constant temperature in air (as in the IBM 1620 computer) or an oil bath (as in the IBM 7090). The 1401 on the other hand uses temperature-controlled switches to adjust the current based on the ambient temperature. We swapped the "AKB" thermal switch board (below) and the associated "AKC" resistor board, with no effect.
Next we tried swapping the "AQW" current source boards that control current through the matrix switches.6 We swapped these board and the 1401's memory started working. Replacing the original boards one at a time, we found the bad board, shown below.
I examined the bad board and tested its components with an multimeter. There were two 1.2mH inductors on the board (the large green cylinders). I measured 3 ohms across one and 3 megaohms across the other, indicating that the second inductor had failed. With an open inductor, the board would only provide half the current. This explained why the matrix switch wasn't generating pulses, and thus why the core memory didn't work.
I gave the bad inductor to Robert Baruch of Project 5474 for analysis. He found that the connection between the lead and the inductor wire was intermittent. He dissolved the inductor's package in acid and took photographs of the winding inside the inductor.7
We looked in the spare board cabinet for an AQW board to replace the bad one and found several. However, the replacement boards were different from the original—they had one power transistor instead of two. (Compare the photo below with the photo of the failed card from the computer.)
Despite misgivings from some team members, the bad AQW card was replaced with a one-transistor AQW card and we attempted to power the system back up. Relays clicked and fans spun, but the computer refused to power up. We put the old card back (after replacing the inductor), and the computer still wouldn't start. So now we had a bigger problem. Apparently something had gone wrong with the computer's power supplies so the debugging effort switched focus.
Diagnosing the power supply problem
The power supply system for the IBM 1401 is more complex than you might expect. Curiously, the main power supplies for the system are inside the card reader; a 1250W ferro-resonant transformer in the card reader regulates the line input AC to 130V AC, which is fed to the 1401 computer itself through a thick cable under the floor. Smaller power supplies inside the 1401 then produce the necessary voltages.
Since it was built before switching power supplies became popular, the IBM 1401 uses bulky linear power supplies. The photo below shows (left to right) the +30V, -6V, +6V and -12V supplies.8 In the lower left, under the +30V supply, you can see eight relays for power sequencing. The circuit board to the right of the relays is one of the "sense cards" that checks for proper voltages. Under the +6V supply is a small "+18V differential" supply for the core memory. Foreshadowing: these components will all be important later.9
After measuring voltages on the multiple power supplies, the team concluded that the -6V power supply wasn't working right. This was a bit puzzling because the AQW card (the one we replaced) only uses +12 and +30 volts. Since it doesn't use -6 volts at all, I didn't see how it could mess up the -6 volt supply.
The team removed the -6V supply and took it to the lab. In the photo above, you can see the heavy AC transformer and large electrolytic capacitors inside the power supply. Measuring the output transistors, they found one bad transistor and some weak transistors and decided to replace all six transistors. In the photo below, you can see the new transistors, mounted on the power supply's large heat sink. These are germanium power transistors; the whole computer is pre-silicon.
The -6V power supply tested okay in the lab with the new transistors, so it was installed back in the 1401. We hit the "Power On" button on the console and... it still didn't work. We still weren't getting -6V and the computer wouldn't power up.
In the next repair session, we tried to determine why the computer wasn't powering up. Recall the eight relays mentioned earlier; these relays provide AC power to the power supplies in sequence to ensure that the supplies start up in the right order. If there is a problem with a voltage, the next relay in the sequence won't close and the power-up process will be blocked. We looked at which relays were closing and which weren't, and measured the voltages from the various power supplies. Eventually we determined that about halfway through the power-up process, relay #1 was not closing when it should, stopping the power-up sequence.
Relay #1 was driven by the +30V supply and was activated by a "sense card" that checked the +6V supply. But the +30V and +6V supplies were powering up fine and the sense card was switching on properly. Thus, the problem seemed to be a failure with the relay itself. Just before we pulled out the relay for testing, someone found an updated schematic showing the relay didn't use the regular +30V supply but instead obtained its 30 volts through the "18V differential supply".11 And the schematic for the 18V differential supply had a pencilled-in fuse.10
Could the power problem be as simple as a burnt-out fuse? We opened up the 18V differential supply, and sure enough, there was a fuse and it was burnt out. After replacing the fuse, the system powered up fine and we were back in business.
With the computer operational, I could finally run my program. After a few bug fixes, my program used the computers's reader/punch to punch a card with a special hole pattern:
Happy holidays everyone!12
Conclusion
After all this debugging, what was the root cause of the problems? As far as we can tell, the original problem was the inductor failure and it's just a coincidence that the problem occurred after the power loss during system startup. The new AQW card must have caused the fuse to blow, although we don't have a smoking gun.13 The reason the -6V power supply wasn't showing any voltage is because it was sequenced by relay #1, which didn't close because of the fuse. The bad transistors in the -6V power supply problem were apparently a pre-existing and non-critical problem; the good transistors handled enough load to keep the power supply working. The moral from all this is that keeping an old computer running is challenging and takes a talented team.
Thanks to Robert Baruch for the inductor photos. Thanks to Carl Claunch for providing analysis. The Computer History Museum in Mountain View runs demonstrations of the IBM 1401 on Wednesdays and Saturdays so check it out if you're in the area; the demo schedule is here.
Follow me on Twitter or RSS to find out about my latest blog posts.
Notes and references
-
Although there are two IBM 1401 computers at the CHM, only one of them has the "column binary punch" feature that I needed. "Column binary" lets you punch arbitrary patterns on a punch card (to store binary) rather than being limited to the standard punch card character set of 64 characters. ↩
-
Note that the 1401 has 4000 characters of memory and not 4096 because it is a decimal machine. Also, the memory stores 6-bit characters plus a (metadata) word mark and not bytes. ↩
-
If you want to know more about the 1401's core memory, I've written in detail about core memory and described a core memory fix . ↩
-
The trick that makes core memory work is that the cores have extremely nonlinear magnetic characteristics. If you pass a current (call it I) through a wire through a core, the core will become magnetized in that direction. But if you pass a smaller current (I/2) through a wire, the core doesn't change magnetization at all. The result is that you can put cores on a grid of X and Y wires. If you put current I/2 through an X wire and current I/2 through a Y wire, the core at their intersection will get enough current to change state, while the rest of the cores will remain unchanged. Thus, individual cores can be selected. ↩
-
The matrix switch is another set of cores in a grid, but used to generate pulses rather than store data. The 1401's memory has 50 X lines and 80 Y lines (yielding 4000 addresses), so generating the X and Y pulses with transistors would require 50 + 80 expensive, high-current transistors. The X matrix switch has 5 row inputs and 10 column inputs, and 50 outputs—one from each core. The address is decoded to generate the current pulses for these 15 inputs. Thus, instead of using transistor circuits to decode and drive 50 lines, just 15 lines need to be decoded and driven, and the matrix switch generates the final 50 lines from these. The Y lines are similar, using a second matrix switch to drive the 80 Y lines. ↩
-
Each matrix switch has two current inputs (for the row select and the column select), so there are four current source boards and four current driver boards in total. ↩
-
Strangely, half the inductor is nicely wound while the winding in the other half is kind of a mess.
The faulty inductor from the IBM 1401. -
The 1401 has more power supplies that aren't visible in the picture. They are behind the power supplies in the photo and slide out from the side for maintenance. ↩
-
If you want to see the original schematics and diagrams of the 1401's power supplies, you can find them here. Core memory schematics are here. ↩
-
The pencilled-in fused on the schematic also had a note about an IBM "engineering change". In IBM lingo, an engineering change is a modification to the design to fix a problem. Thus, it appears the the 1401 originally didn't have the fuse, but it was added later. Perhaps we weren't the first installation to have this problem, and the fuse was added to prevent more serious damage. ↩
-
The 18V differential supply provides 12 volts. This seemed contradictory, but there's an explanation. The core memory circuitry is referenced to +30 volts. It needs a supply 18 volts lower, which is provided by the 18V differential supply. Thus, the voltage is +12V above ground. Unlike the regular +12V power supply, however, the differential power supply's output will move with any changes to the +30V supply, ensuring the difference is a steady 18 volts. ↩
-
The "Merry Xmas" card was inspired by a tweet from @rrragan. (I had also designed a card with a menorah, but unfortunately encountered keypunch problems and couldn't get it completed in time. Maybe next year.) Punch cards normally encode characters by punching up to three holes per column. Since this decorative card required many holes per column, I needed to use the 1401's column binary feature, which allows arbitrary binary data to be punched. I ended up punching the card upside down to simplify the program:
Front of my "Merry Xmas" punch card. -
After carefully examining the AQW boards, we determined that one- and two-transistor cards should be compatible. The two-transistor board had the two transistors in parallel, probably using earlier transistors that couldn't handle as much current. It's possible that the filter capacitor between +30V and ground was shorted in the replacement AQW board, blowing the fuse. ↩