2013/Dirty Tricks of Computer Hardware: What You Don't Know Will -Probably Not- Kill You
Ever wonder what you don’t know about how your computer hardware really works? Do you tire of lying to your relatives that “gremlins” are the cause of intermittent data loss and blue screens, and not just a car from the 1970s? Let’s take a journey into the wonderful world of wonky hardware and find out what can be done about it!
Speaker: Darrick Wong
Return to this session's details
OS is to keep user programs reasonably separate and work around hardware bugs.
Kernel image, hugely interconnected kernel files (Kernel directory only)
zoom out, lots of similar components, zoom out further, massive kernel, very very complex
[thought: could we map bug fixes over time to this?]
Dirty Computer Tricks
overview Problem 1: Electronics Problem 2: caching Problem 3: Hardware losing its brains Problem 4: application issues Recommendations
Narrowed to storage issues
Classic model: storage, storage cursor, state registers, transition function Later on: instructions and data were modified on this 'tape" storage Lets look at problems with this "tape"
each storage system has different performance characteristics. Speed, laid out from CPU by speed. Each layer has its own copy of data sometimes they store outbound writes too How do we coerce stable storage to fail at storing what we want, where we want it?
Cosmic rays from space!
silent unpredictable bit flips in memory memory buffers exist everywhere 8% of DIMMs experience some annual error event (Google, 2009) [woo, citation]
memory buffer corruption, data written to incorrect location on disk, IO device command corruption fixable by ECC - single bit errors fixable, multi-bit detected and stop machine, (why is this not more widely deployed?)
e.g. laptop had bits flipped, debian package manager, some files bit flipped to upper case
IO command corruption, most protocols have CRCs and resend to fix this
Filesystems don't like bit flips metadata is high value information [lol NSA] destruction ranges from minor (bad timestamps) to major (massively cross linked files) detectable, not entirely fixable Btrfs can RAID1 and checksum metadata Ext4 can checksum metadata Checksum only tells you about corruption, doesn't help you repair damage. File system ECC anyone?
[side issue, noisy chairs]
Problem 2 cache con- (currency?) normal IO page cache runs in buffered mode OS uses main memory as a gigantic disk cache because memory is fast and semi-plentiful - holds 'dirty' data what about write back policy?
disk write cache committing data to hardware media is slow, contemplative process made 'faster' by inserting … more buffers! spinning rust - amortizes cost of writes on an SSD amortizes erasing/writing flash blocks and updating FTL How do you ensure dirty WC contents write?
Fix: Cache flushing explicitly force dirty buffers to disk via sync/syncfs/fsync calls - tells OS to commit dirty data and make sure the hardware does too O_SYNC mode - makes everything slow O_DIRECT? Bypasses page cache, does NOT imply a disk cache flush (still need fsync)
low flush disk drive certain firmwares treat cache flushes as a NOP "for speed" look for a lack of a giant speed hit if you write+flush every time Fix: Buy better hardware
(aside, new ATA spec has "really flush", to actually flush. Hilarious)
Flash Translation Layer FTL maps LBAs to flash chips Every time you write the SSD, the FTL must be updated Flash has limited rewrite capability 5000 or so writes and that's it! SSD controller tries to batch updates This is why you see 1MB of cache per GB of flash.
FTL corruption What happens if the power goes out during an FTL update? At best, old contents or garbage Some SSDs can brick themselves Fixes? Battery backup Most SSDs have large capacitors on them for this
70% of SSDs have some kind of failure when power cycled about 13% of SSDs straight up die
Fun with low voltage Hardware not quite digital binary is implemented by thresholds what happens if the voltage dips to threshold level?
Drain Bamage (sic) Memory is very sensitive to low voltage, lots of transistors, why power problems can manifest as memory corruption, seg faults Scenario: Disk controller kept running! What if the power fails during a write?
App issues Torn disk writes Overwrite 4k on disk, read back a combo of old/new data - usually only during power failures OS does not verify writes can you detect this before it's too late? Does this actually happen? -Less often than you'd think
Fix: Journal verify maintain a write a journal, verify journal transactions a ring buffer than lands writes on stable storage as soon as possible, and occasionally flushes to other parts of disk
Lost writes Write a buffer to sector X, gets written to Y instead -bit flips, software bugs, LBA overflow Can you detect this? (yes, kernel has some)
Fix: Checksumps? Store checksums and offset data with each block maybe your filesystem can do this for you? Btrfs/zfs
Unstable page writes Memory writes happen concurrently with DMA reads Results of this are undefined How do you prevent this? Do you need to care?
Fix: Don't do that? Most programs don't do this anyway if you care, double buffer the write (needed for journaling anyway) Linux will redo anyway
SSD Trim/Discard Hints to SSD that it can stop tracking a range of blocks - No guarantees it has any effect for small ranges Really slow on ATA SSD controller's wear leveling usually compensates in the absence of trimming Most SSDs also ship with (forced) fallow areas so that there are always full pages "available" in adverse conditions Not really recommended (on linux anyway) unless you're deleting large portions of the disk (reformat?)
Overkill Store checksum+offsets with each block, push writes through a journal etc (See slide)
TLDR make sure you Fsync lobby for better behavior buy better hardware (see slide)
Future Disk acts like memory Atomic disk writes Filesystems more robust Checksums Maybe ECC some day