Moodle’s MySQL Keeps Going Down
If you’re not interested in Moodle or troubleshooting MySQL problems, than you probably don’t want to read this post. I however wanted to write some notes for myself and need it to be portable. Since that’s technically the purpose of this blog, I figured I should put it here.
Recently Nagios started informing me that HTTPS was going down infrequently (but started to become more frequent).
***** Nagios *****
Notification Type: PROBLEM
Service: HTTPS
Host: Moodle Server
Address: 123.456.78.912
State: CRITICAL
Date/Time: Fri Aug 28 01:06:50 EDT 2009
Additional Info:
CRITICAL – Socket timeout after 10 seconds
What was strange was that HTTP wasn’t going down. I felt like Nagios was lying to me so I figured I would have to wake up late at night (early in the morning?) and catch it lying to me. To do this I decided I’d try to figure out if my cellphone had an email to associate to text messaging. Mine wasn’t working so I had to talk to Sprint about it.
In the mean time though, it was still going down. Luckily last night at 4am I randomly woke up. I ran downstairs (imagine kid at Christmas…but like a reverse Christmas) and fired up the laptop. Sure enough when I went to the Moodle homepage I saw this:
Error: Database connection failed.
It is possible that the database is overloaded or otherwise not running properly.
The site administrator should also check that the database details have been correctly specified in config.php
Now in hindsight, there were so many things I could have done at this point. But it was 4am and my house was freezing so I made some mistakes. The first thing I did was restart Apache. It appeared to be running and the restart didn’t throw any errors, but it didn’t fix the problem either. Next I “turned off SSL”. I was able to get to the error page without the HTTPS so I determined at this point that Nagios clearly was lying to me, something else was wrong (p.s. i am thankful to Nagios for telling me that something was wrong). So then I decided that maybe it was a database problem. I went into MySQL and was able to access the Moodle database and do a “show tables;”. So I thought at that point I should restart MySQL, in hindsight there are so many other things I should have done first, but I will document those below. A restart of MySQL fixed the issue.
So what will I do next time (and yes…there will be a next time):
1) run top on the server: See if MySQL is hogging the CPU. This could lead you to a number of options.
2) check the mysql-slow-queries.log: I set up the mysql option log-slow-queries. this in conjunction with the long-query-time option will log any query that takes over a certain amount of time (default is 10s).
3) show mysql processlist: If you haven’t set up the log slow queries option this might also be helpful in determining what mysql is doing. Here is a brief explaination of show processlist and its companion of kill <processid>. I also found this handy script that you could set up as part of a cron to kill any runaway processes.
4) query is poorly written: I’ve found a number of reports that MySQL queries are poorly written in Moodle and that these can cause the CPU usage to top out at %100 which causes Moodle to be unusable. Below are a few Moodle Tracker reports of this:
To see if this is the case with your Moodle use the slow query logging (or just the plain old query logging) with a backup of your database to see if you can replicate the problem somewhere else. If you can the problem is probably the query and maybe you can figure out a way to rewrite the query (or get it rewritten) to see if that fixes the problem.
5) check your persistent db connection: Check to see if you have persistent database connections set up in the Moodle config.php file and in the php.ini file. If you do, turn it off and see if this solves the problem. it may be that your connections aren’t closing and they are killing your database regularly.
6) check your max connections: Are the maximum number of Apache connections greater than your maximum number of MySQL connections? If so that might be a problem. Basically users could get to the website but then when they go to request something in the database they can’t because the # of connections are taken up. I have a feeling though that this may be a completely different error than what I was experiencing (people…including myself…really need to take screenshots or document errors better).
7) check your MySQL performance: Perhaps you haven’t configured Moodle right. Take a look at these performance tuning tips to see if you can get it working better.
8) check your database tables: Stop mysqld and run a check on your Moodle tables. See if some of the tables are corrupted, if they are fix them.
So if you’re wondering what I’m doing at 4am tomorrow, there’s the list. Wish me luck.