Three mysterious hardware problems
29 June 2015
Posted in Miscellaneous
Last month three completely unrelated, yet equally mysterious, hardware problems kept me entertained at home. Each did have an obvious explanation in the end. Getting from symptoms to diagnosis, though, required both, guesswork and luck. Sure, I'm more of a software guy but I thought that I had a decent understanding of “how stuff works”. Well, looks like things have gotten pretty complex now.
Problem #1: our trusty PS3. Symptoms: We had just bought a new TV. Excited to see how games would look like on it, I powered up the PlayStation, and was greeted with a big blank screen of nothing. The TV was definitely set to display the correct input source, but, still, there was no picture. Swapping the HDMI cable to a different port on the TV, with the PS3 running, fixed the problem. It definitely left an uneasy feeling, though.
Things got even more confusing when I tried to play again another day and was again greeted with a blank screen. Now swapping back to the HDMI port that I had originally used gave me a picture. How could that be?
Solution: It turns out that the culprit was the HDMI cable. Of course, the cable we had didn't suddenly break when we got the new TV, but it is the wrong kind of HDMI cable for connecting a PS3 to the new TV. When establishing the HDMI connection the PS3 discovers that the TV is 3D capable (which our old one wasn't), or that it speaks a certain version of the HDMI protocol, or both. Either way, even when not displaying 3D pictures the TV and the PS3 seem to negotiate a version of the HDMI protocol that requires a so called high-speed HDMI cable. Using a regular HDMI cable in this setup results in a black picture. With a high-speed cable everything works.
Honestly, I still don't fully understand why swapping HDMI ports had the effect it had, and I can't even reproduce it reliably. My guess is that the port-swapping somehow interfered with the protocol negotiation and that in these cases the PS3 and the TV negotiated a protocol version that was compatible with the regular HDMI cable.
By the way, I've never been a fan of 3D for movies but with the active shutter glasses that came with the TV some PS3 games like Super Stardust HD and Wipeout HD actually look pretty compelling in 3D.
Problem #2: the device next to the PS3, our Home Theatre PC (HTPC). This is a standard PC running Windows 8.1, operated with a Bluetooth mouse and connected to the TV with an HDMI cable.
Symptoms: When watching movies or TV series, either from local storage or streaming from Netflix or Amazon, after a while the picture would freeze. Curiously, the computer didn't lock up completely; it was possible to move the mouse pointer and sometimes the taskbar appeared when moving the mouse pointer to the bottom of the screen. However, that was it, nothing else worked in these cases, not even bringing up the task manager. After rebooting everything worked again, for a while.
Solution: a firmware bug in an SSD. When I built the computer I had re-used a small SSD that I had lying around after upgrading another computer. That SSD only holds the operating system and the applications, all the media content is stored on a software RAID on spinning disks. The SSD wasn't really broken, the computer booted, there was no data corruption, and the SMART status was okay, too. However, when I ran smartmontools it printed a warning that something was known about this drive with the firmware I had, and the tool helpfully pointed my to an article on Tom's hardware.
Almost unbelievably, the firmware in the SSD had a bug that caused the SSD to become unresponsive when queried for a SMART counter in a certain way, but only after 5,184 hours of total power-on time and then only after one hour of operation. Stated differently, after the initial 5,184 hours have passed, the drive locks up about one hour after booting the computer; and it seems that Windows can't do much beyond moving the mouse pointer when the system disk simply doesn't respond. Crucial, the manufacturer of the SSD, provides a firmware update. Installing it was completely painless and fixed the problems.
Problem #3: my Hackintosh. This is basically a normal PC built from components. The twist is that it not only runs Windows but also OS X. If you are interested in the details, I wrote an article about it.
Symptoms: When waking the computer from sleep after it had been asleep for a few hours it wouldn't return to where I had left it but, instead, it would perform a cold boot, BIOS screen and all. Worse, it would reset several times during the BIOS phase of booting, sometimes even warning that the BIOS had become corrupted. In such cases the “Dual BIOS” functionality on my Gigabyte board would kick in, boot from a secondary BIOS, restore the main BIOS, and then reboot. In either case, once the computer did manage to boot it ran without any issues.
I confirmed on the Hackintosh forums that other people weren't seeing similar issues following a recent software update from Apple. (After all, a patch is needed on almost all Hackintoshes to prevent an Apple-provided kernel extension from writing into the CMOS of the PC, and there had been issues when, in response to a new European law, a so called auto power-off functionality was introduced by Apple.) Next, I started turning the computer off, instead of putting it to sleep. The problems with multiple resets and BIOS corruptions persisted. To make completely sure it didn't have anything to do with OS X, I started to unplug the computer form the mains for a minute, then boot into Windows and shut down from Windows. Still the same problem.
Solution: a faulty power supply, shown in a close-up photo with this post. It was incredibly hard finding useful information on the web. So, after having, more or less, confirmed that this was a hardware problem I started to isolate it. I confirmed that the memory was fine, and that the graphics card had no impact, thus leaving the mainboard and power supply as the likely candidates. Temporarily using the power supply from the HTPC (see above) actually fixed the issue, confirming that something weird was wrong with the power supply. I still have no idea what.
In hindsight I suspect that the “corrupted BIOS” was a erroneous error message from the board. I guess the logic on the board either wasn't able to detect the subtle cold-boot power issues or is not programmed to do so, and in cases of several failed boot attempts simply defaults to assuming that the BIOS must be corrupted.
On a more positive note, Seasonic, the manufacturer of my power supply, offers a seven year warranty on the series of power supplies I have, and they replaced my faulty power supply with a brand new one in a matter of days.
Postscript: I'm happy to report that no further hardware issues have plagued me since, touch wood.