Monday: waded through the Augean Stables that my e-mailbox had become, and put out a number of fires that had cropped up while I was away. There wasn’t a whole lot that was really going on, actually.
Tuesday: Found a serious filesystem problem on one of the backup machines. Somehow, it thought that one of its filesystems was 30M larger than the backing device. This manifested as odd problems every so often with no real explanation. Then I finally saw “attempt to access beyond end of device. . .” in the output from dmesg this morning, and knew I had to do something. umounting and fscking the darn thing helped somewhat. However, it was impossible to resize the filesystem, because there was one file that apparently had been put in space that didn’t actually exist. I had a block number for that file—but not an inode number or anything like that.
I fixed this by finding out that you could run through the blocks of a file and call the FIBMAP ioctl on every single block. That’d give you the block number. From that, it was conceptually easy to figure out which file had the problem. Run through all the millions of files on that filesystem, find that file, delete it, then umount the filesystem and resize2fs it. Nothing to it, and there shouldn’t be any more of those weird problems happening with that machine.
It’s a very good question how that filesystem ended up like that in the first place. But it’s fixed now. I don’t know that anyone else on the team could’ve solved this without doing drastic things like zorching the filesystem, remaking it, and restoring from backups.
And I’m going to be the only person from my team who’s here Thursday and Friday. Let’s hope the end of the week is quiet. . . .




No user commented in " Back, mostly, with filesystems "
Follow-up comment rss or Leave a Trackback