Why Moodle Keeps Going Down
For those of you that aren’t Systems Administrators, here is the reason you don’t want to be. At 12:30am when things go down, you have to wake up and figure out why. And here I am writing a blog post about it. Luckily its not 4am. That was totally worse (I say this now…let’s see how long it takes me to figure this stuff out).
So I got my text message at 12:30am. It confused me but then I woke up a bit. It was a good thing I wrote the notes I did in my previous post, because trust me, I could have never come up with that stuff on my own right now.
First thing I did was run top. CPU was at 100.1% (way to work hard server) for the mysqld process. Second thing I did was check the slow query log. There were actually a few slow queries, the first ones are still puzzling me. Basically these were checking a table that had a whole lot of rows of data, the data didn’t need to be stored in those tables, i removed the data, and I stopped logging that data in the tables. I probably won’t see those queries ever again. But this doesn’t really matter, those queries weren’t the ones hanging up the system, they were only take 11 t0 17 seconds to run.
I did see the last query that ran though, and this seems to be part of the problem, but not the query that was actually causing mysqld to hang. This query was:
INSERT INTO stats_daily (stattype, timeend, courseid, roleid, stat1, stat2)
SELECT ‘enrolments’, timeend, courseid, roleid, COUNT(DISTINCT userid), 0
FROM (
SELECT 1251691200 AS timeend, pl.courseid, pl.roleid, pl.userid
FROM (
SELECT DISTINCT ra.roleid, ra.userid, c.id as courseid
FROM role_assignments ra JOIN context ctx
ON ctx.id = ra.contextid
CROSS JOIN course c
JOIN role_capabilities rc
ON rc.roleid = ra.roleid LEFT OUTER JOIN course_categories cc1
ON cc1.id = c.category
WHERE ((rc.capability = ‘moodle/course:view’ )
AND rc.permission = 1 AND rc.contextid = 1
AND (ctx.contextlevel = 10
OR (c.id = ctx.instanceid AND ctx.contextlevel = 50) OR (cc1.id = ctx.instanceid AND ctx.contextlevel = 40)))
) pl
) inline_view
GROUP BY timeend, courseid, roleid;
This query took a total of 41 seconds to execute. I’m assuming that the next query after this was the one that brought the house down. I saw the beginning of this when I went into mysql and did a show processlist:
+——-+————+———–+——–+———+——+————–+——————————————————————————————————+
| Id | User | Host | db | Command | Time | State | Info |
+——-+————+———–+——–+———+——+————–+——————————————————————————————————+
| 45654 | moodleuser | localhost | moodle | Query | 1776 | Sending data | UPDATE stats_daily
SET stat2 = (SELECT COUNT(DISTINCT ra.userid)
|
The full query finally showed up in my query log once I killed the process. Here it is in case you’re curious:
090902 0:01:29 45654 Query UPDATE stats_daily
SET stat2 = (SELECT COUNT(DISTINCT ra.userid)
FROM role_assignments ra JOIN context ctx
ON ctx.id = ra.contextid
CROSS JOIN course c
JOIN role_capabilities rc
ON rc.roleid = ra.roleid LEFT OUTER JOIN course_categories cc1
ON cc1.id = c.category
WHERE ra.roleid = stats_daily.roleid AND
c.id = stats_daily.courseid AND
((rc.capability = ‘moodle/course:view’ )
AND rc.permission = 1 AND rc.contextid = 1
AND (ctx.contextlevel = 10
OR (c.id = ctx.instanceid AND ctx.contextlevel = 50) OR (cc1.id = ctx.instanceid AND ctx.contextlevel = 40))) AND
EXISTS (SELECT ‘x’
FROM log l
WHERE l.course = stats_daily.courseid AND
l.userid = ra.userid AND l.time >= 1251604800 AND l.time < 1251691200))
WHERE stats_daily.stattype = ‘enrolments’ AND
stats_daily.timeend = 1251691200 AND
stats_daily.courseid IN
(SELECT DISTINCT l.course
FROM log l
WHERE l.time >= 1251604800 AND l.time < 1251691200)
I also saw a whole bunch of other queries locked, waiting for this query to run. I thought that perhaps switching the tables over to InnoDB rather than MyISAM might help alleviate the problem, but the truth is this appears to be a “the query is just ridiculous” problem rather than a “InnoDB is better at performance” problem.
So at anyrate I looked around and it seems this is a problem with the stats that Moodle keeps about itself. Apparently in version 1.6 people had this problem a lot (because stats was new). Basically I enabled stats because someone thought it was a good idea and I didn’t bother to consider how resource intensive this process could be (note to self: be a better sys admin and read documentation when someone thinks something might be useful). I changed some settings in the stats configuration page and hope that this reduces the load and processing time of the stats query. We’ll see if this fixes my problem.
Also, the amount of time I spent on this right now totally makes up for leaving work an hour and a half early to go meet my brother in law the electrician so he could replace outlets and light fixtures (my $12 sconces from lowes look quite lovely).