When you have too much memory for SheepShaver

When I first got my 133MHz BeBox (not new, sadly), it had "only" 32MB of memory and it had four more SIMM slots to fill. While Be only officially supported 256MB of RAM, I was blissfully ignorant of that, bought an additional 256MB of memory in four equally sized 72-pin SIMMs and installed it for 288MB of RAM. (It can actually take up to 1GB, I later learned.) Nice, I said! And then SheepShaver never worked again.

SheepShaver is a desperate pun and an unusual emulator: much like Classic on PowerPC Mac OS X, on big-endian PowerPC most of the MacOS and its applications run natively on the processor, in a form analogous to KVM-PR. In fact, SheepShaver on Leopard is pretty much the best way to run Classic applications on Power Macs that must run Leopard, though it also runs on Tiger and presents certain advantages there as well. It existed first on BeOS as a paid product before becoming open source, though multiple later forks fix various problems on modern platforms.

My original theory was that I had somehow broken something in the update or some other installation, and so I never did much with it (especially since I have plenty of real Power Macs around here). But while I was doing other work on the machine, after a game of BeOS Doom I accidentally double clicked on its icon on the desktop and ... it started up! What could have restored it, I feverishly wondered? Did something monkey around with the memory map? (Foreshadowing music plays here.) It only ran the one time, however, and I spent hours trying to retrace my steps to see if I could make it work again and I never could.

But this at least told me that the install was fine and the problem lay elsewhere. I had never closely looked at it in a debugger. Perhaps it was time.

The BeOS debugger isn't gdb, but you get the idea. The offending instruction was an stbu (store byte with update), but the effective address was ... really weird. It looks like it's wrapped around the entire addressing space back to 0! How did this program even work?

In the source code, for all supported platforms, SheepShaver (and Basilisk II, a 68K emulator it shares substantial code with) has a SIGSEGV handler for trapping segmentation faults; here is BeOS's. My initial thought was that somehow the handler wasn't being installed, but a couple debug printfs in the handler showed that not only was the handler being triggered, it was actually passing the segfault along to the system handler apparently on purpose.

A partial explanation appears in the Darwin (Mac OS X) port:

Under Mach there is very little assumed about the memory map of object files. It is the job of the loader to create the initial memory map of an executable. In a Mach-O executable there will be numerous loader commands that the loader must process. Some of these will create the initial memory map used by the executable. Under Darwin the static object file linker, ld, automatically adds the __PAGEZERO segment to all executables. The default size of this segment is the page size of the target system and the initial and maximum permissions are set to allow no access. This is so that all programs fault on a NULL pointer dereference. Arguably this is incorrect and the maximum permissions shoould be rwx so that programs can change this default behavior. Then programs could be written that assume a null string at the null address, which was the convention on some systems. In our case we need to have 8K mapped at zero for the low memory globals and this program modifies the segment load command in the basiliskII [sic] executable so that it can be used for data.

So, the handler expects to have actual memory mapped indeed at an effective address of zero for the MacOS's low memory globals, a holdover from the 68K days (and if I'd read the Basilisk technical notes, I would have realized that sooner). Since such a fault should never have gotten to the handler in the first place, it just passes it along and crashes. That kind of significant address space remapping clearly could not come from a user-level executable on BeOS; there had to be some sort of system component doing that remapping.

Turns out SheepShaver did in fact install a couple system extensions:


$ find /boot/home/config/add-ons -name 'sheep*' -print
/boot/home/config/add-ons/kernel/drivers/bin/sheep
/boot/home/config/add-ons/kernel/drivers/dev/sheep
$ find /boot/beos/system/add-ons -name 'sheep*' -print
/boot/beos/system/add-ons/net_server/sheep_net

The last one is used for tunneling emulated networking through the host machine; the sheep driver is the one we want (the two sheep drivers are actually the same file; the dev/ one is a symlink to the actual file in bin/). After a little digging in the source tree, I found the C source for it. It became rapidly obvious after a cursory readthrough that it manipulates the PowerPC page tables.

On PowerPC (prior to POWER9 which introduces a higher-performance radix MMU), the mapping between virtual addresses and physical addresses is maintained by a set of hashed page tables, divided into page table entry groups, or PTEGs. (There is an alternate pathway using block address translation "BAT" registers but I'm going to ignore that for the purposes of this discussion.) The low memory globals region is 8K in size, so (with 32-bit PowerPC) we need two 4K memory pages to map to 0x0000 and 0x1000, which needn't be contiguous in real memory since we'll set up mappings for each page individually. The driver allocates three pages with malloc() and takes a page-aligned slice of two pages within it, then tries to find where in physical memory those pages got mapped to using get_memory_map(). Now we want to make those pages' effective address mapping in SheepShaver point to 0x0000 and 0x1000 instead.

To find a real address in 32-bit PowerPC, the top four bits of the effective address select one of 16 segment registers mapping each 256MB effective address block. The segment register's low 24 bits (the Virtual Segment ID) is combined with the 16-bit effective address' page number and 12-bit byte number within that page to generate a 52-bit virtual address. The VSID and the page number then get hashed and combined with the storage description register SDR1 to yield the address of the PTEG, the correct PTE is found within it, and the real page number within it then becomes the upper 20 bits of the resulting real page address. We're going to work this in a similar fashion to find the PTEG that would contain the mapping for these lowest page addresses.

Traditionally the number of PTEGs is optimally half the number of real pages to be accessed, and since the next highest power of two in a 288MB BeBox is 512MB, that means 2²⁹ addressable bytes in (divided by 4K, or 2¹²) 2¹⁷ pages. Halving that yields 2¹⁶, or 65536, 64-byte PTEGs to equal a total size of 4MB. BeOS has a specific memory area for this, appropriately named pte_table, that we can look up with find_area() (thus giving us the effective address of the page table pointed to by SDR1). We find the relevant PTEG for each page by doing the same hashing steps the processor would do to resolve the address. In that PTEG, each PTE's highest bit is whether it's valid, followed by the 24-bit VSID, one bit for the hash type flag, five bits of the effective address called the Abbreviated Page Index, the 20-bit Real Page Number, and protection and access control fields.

We won't know the VSID without looking at the segment registers, but we can just walk the entire page table instead since we only have to set this mapping up once. When we find a valid PTE that matches the API, then we know this is a candidate PTEG and derive the VSID from that. We can then either directly modify an existing PTE within it or take advantage of the fact that each PTEG essentially offers up to eight hash collision resolution slots to add a PTE of our own. If we do this to the first place the CPU will look, we will take over that memory mapping for the life of the process.

The memory mapper conveniently has debug logging support for a simple tool called PortLogger that I patched up for BeOS R5. I compiled it with debugging on, restarted, ran PortLogger, started SheepShaver (it crashed, of course) and looked at the output:


$ ./PortLogger 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    PortLogger version 0.4.1
   Cameron Kaiser    - 14/02/21
   Simon Thornington - 14/02/97
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
init_hardware()
init_driver(3)
control(10000) data 0xfd001bb8, len 00000000
3 pages malloc()ed at 0x0202b228
Address aligned to 0x0202c000
Memory locked
get_memory_map returned 0
PTE table seems to be at 0x30000000
PTE table size: 4096KB
Found page 0  PtePos 58b84 V1 VSID c70 H0 API 08 RPN b11a R0 C0 WIMG2 PP0 
Found page 1d PtePos 58ba4 V1 VSID c70 H0 API 08 RPN b11b R0 C0 WIMG2 PP0 
Found page 1d PtePos 178580 V1 VSID d37 H0 API 2c RPN b11b R1 C1 WIMG2 PP0 
Found page 0  PtePos 1785a0 V1 VSID d37 H0 API 2c RPN b11a R1 C1 WIMG2 PP0 
Trying to map EA 0x00000000 -> RA 0x0b11a000
PTEG1 at 0x30034dc0, PTEG2 at 0x303cb200
 found 80069b80 00000010
 existing PTE found (PTEG1)
 written 80069b80 0b11a012 to PTE
Trying to map EA 0x00001000 -> RA 0x0b11b000
PTEG1 at 0x30034d80, PTEG2 at 0x303cb240
 found 80069b80 00001010
 existing PTE found (PTEG1)
 written 80069b80 0b11b012 to PTE

The driver seemed to properly reserve memory and find the real address (and thus real page number) for its mapping, and was able to resolve and walk the page table. But one problem jumped out immediately: we only have two pages (here 0 and 1d). Why is it that it found four? Notice that the "fraternal twin" pages have matching RPNs, but the VSIDs are different and we don't know which VSID is right. Did our algorithm effectively cause its own hash collision?

Continuing on, when we look at the existing PTE we found, the RPN is the first through fifth hex digits in the second word and both effective addresses match their real ones (80069b80 00000010 and 80069b80 00001010). That seems hinky.

My first thought was maybe we had a stale TLB and our PTE change didn't stick, because on the PowerPC 603 and 603e the code doesn't do a tlbsync to synchronize the translation lookaside buffer (which caches all this work) and this BeBox has two 603e CPUs. However, despite the code and Metrowerks saying it's 604-only, tlbsync is listed as a valid instruction in my copy of the 603e User's Manual Appendix A. I forced it to do a tlbsync by commenting out the check, compiled it again, restarted, ran PortLogger and started SheepShaver. Unfortunately, while it didn't do anything worse, it didn't work either.

My next guess was to see if maybe we were working on the wrong "twin." Assuming we really did have two sets of colliding hashes, what if we used the other one? A line of code to stop the search at the first page pair rather than the second was added and I tried again:


$ ./PortLogger 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    PortLogger version 0.4.1
   Cameron Kaiser    - 14/02/21
   Simon Thornington - 14/02/97
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
init_driver(3)
control(10000) data 0xfd001bb8, len 00000000
3 pages malloc()ed at 0x01587bb8
Address aligned to 0x01588000
Memory locked
get_memory_map returned 0
PTE table seems to be at 0x30000000
PTE table size: 4096KB
Found page 0  PtePos 33f04 V1 VSID c70 H0 API 05 RPN 45a8 R1 C1 WIMG2 PP0 
Found page 1f PtePos 33f24 V1 VSID c70 H0 API 05 RPN 45cd R0 C0 WIMG2 PP0 
Trying to map EA 0x00000000 -> RA 0x045a8000
PTEG1 at 0x30031c00, PTEG2 at 0x303ce3c0
 found 80069b80 00147010
 found 80076280 082b5190
 found 00000000 00000000
 free PTE found (PTEG1)
 written 80063800 045a8012 to PTE
Trying to map EA 0x00001000 -> RA 0x045cd000
PTEG1 at 0x30031c40, PTEG2 at 0x303ce380
 found 80069b80 00146010
 found 80076280 082b4190
 found 00000000 00000000
 free PTE found (PTEG1)
 written 80063800 045cd012 to PTE

Success! Now we actually have a free PTE, instead of modifying a questionable one, and we alter that. The mapping now takes precedence over anything else for that effective address and SheepShaver starts and runs normally. It also fixed Basilisk II, which would not run for the same reason, though SheepShaver seems to run 68K applications rather better than Basilisk II does.

Why was this never noticed? Well, like I say, Be never advertised support for more than 256MB in the BeBox, and in 1997 that would have been a significant amount of memory (my Power Mac 7300 in 2000 had 192MB and I thought that was a lot). Most PowerPC systems running BeOS probably had substantially less. Like many other bugs due to clock speed and RAM, no one ever dreamed future users would have such a surfeit of them.

The patched binary and source code are on Be-Power.