I noticed a PostgreSQL 9.1 streaming replication slave had stopped replicating. I dug in a bit, and... well, this is a new one:
08:57:57 CDT FATAL: could not receive data from WAL stream: SSL error: sslv3 alert unexpected message
08:59:51 CDT LOG: invalid magic number 0000 in log file 144, segment 203, offset 13156352
08:59:53 CDT LOG: streaming replication successfully connected to primary
09:03:01 CDT FATAL: could not send data to WAL stream: SSL error: sslv3 alert unexpected message
09:06:21 CDT LOG: unexpected pageaddr 90/CFFE0000 in log file 144, segment 234, offset 16646144
09:06:22 CDT LOG: streaming replication successfully connected to primary
09:15:17 CDT FATAL: could not receive data from WAL stream: SSL connection has been closed unexpectedly
09:20:42 CDT LOG: unexpected pageaddr 91/1320000 in log file 145, segment 37, offset 3276800
09:20:42 CDT LOG: streaming replication successfully connected to primary
09:20:42 CDT FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000040000009100000025 has already been removed
... (lots more messages complaining that the WAL segment is gone)
Huh.
I think I'm looking at a problem in the memory system of this slave server. Two other streaming replicaton slaves received the same WAL segments without incident, strongly implying the master is sending valid data, and there's a whole pile of TCP/IP checksums and SSL HMACs which should collectively rule out any network-level issues. Somehow, data got garbled after decryption.
Running pg_basebackup
again from the slave worked fine and brought the world back into sync. It's happily streaming away right now, mostly out of curiosity to see if this issue pops up again.
No matter. This whole cluster is due to be shut down and migrated to PostgreSQL 9.3 in a few days, which among other things, brings page-level data checksums.