Monday had a lot more to do with this JPEG than it really should’ve. Apparently, somebody changed something somewhere in the network config months ago. As a result, some third-party people won’t be able to get access to the stuff they need to get access to in time to meet their deadline. Of course, I don’t have the access I’d need to fix this. Of course, various people in the management chain know that it’s nothing I’ve done, but problems with the networking infrastructure. Of course, since I touched this problem, someone will blame me for it, even though all the parts that I actually have root access to are working just fine and have been since 30 minutes after I learned about the problem.
If the right people had been told about this a month ago, it’d probably all be working fine. However, none of the right people knew about this until Jan. 20. This shows a few problems with the way we’re approaching these types of problems, but I don’t know how to fix them. Part of it is that there are multiple elements, all managed by multiple people, some of whom are employed by totally different companies. This diffusion of responsibility leads to more “I can’t do that! Call Bob, who can!” than actually fixing what’s wrong. (I think this is endemic to large systems, and no amount of meetings or mail will ever fix it.) I just hope it doesn’t have terrible effects on stuff in the future.
That was the main bad part of the day. At least the other stuff I was doing looks like it’s worked, and only 20 jobs were stuck in the job queue at 11am. Last week, thousands of jobs were stuck in the queue at that time, so this is a big improvement. We’d like to get it so that nothing’s ever stuck, but that may be asking too much.