Profiling PL/pgsql functions

I spent my whole week trying to fix PL/pgsql functions. More to the point, I tried to make them perform better. My customer complains that they aren't fast enough, and I agree with him: some of his PL/pgsql functions are really slow. Unfortunately, he uses a lot of PL/pgsql functions. One function may call many other functions, and it's hard to tell which one is the culprit (if there is only one).

My first step was to use pgFouine on a three-days log to know which queries are really slow on his system. Three queries stood out as being the real offenders. I tried to EXPLAIN ANALYZE them, but found that these queries involved lots of views and functions. Not really easy queries to fix. So I set track_functions to pl, so that I can see which functions were called by one query, and which functions are the really offending ones. Here is the result I got:

b1=# SELECT funcname, calls, total_time, self_time
b1-# FROM pg_stat_user_functions;
 funcname  | calls  | total_time | self_time 
-----------+--------+------------+-----------
 f0        |      1 |     328486 |      1375
 f1        |      1 |        210 |       210
 f2        |      1 |        123 |       123
 f3        |      1 |      20310 |     20310
 f4        |      1 |        763 |       763
 f51       |      1 |      33486 |     33486
 f52       |      1 |     250363 |    239703
 f53       |      1 |      19139 |     19139
 f6        |      1 |        399 |       399
 f7        |      1 |       1770 |      1770
 f8        |      1 |        543 |       543
 f_compute | 110588 |      10660 |     10660
(12 rows)

Some context first, I only executed f0(), which itself called all the other functions. So what do I get from this? f_compute() is executed 110588 times. That's a lot, but it doesn't consume much of f0() execution time (3%). f52() is only executed once, but took 76% of the total execution time of f0(). If I can make it run faster, I would probably gain a lot.

But there's more. f52() executes other functions, but its own execution took most of the time (95%). So what I really need to know is how long each of its statements last. There's not many way to know that actually. You can modify the function source code to add RAISE LOG statements in this function (you can find many examples on this depesz's post). It takes a lot of time to do that, you may do it wrong, you cannot always do it. You can also use the plprofiler from EnterpriseDB. That's the first thing I did. Due to the number of statements in each of the functions the query called, my query went from 5.4 minutes to more than two hours of execution time. I canceled it because I couldn't wait any longer. The only thing left was to modify the function source code, but I really hate this idea.

So I read the plprofiler code, and found a better way to do it. The plprofiler is a shared library that uses hooks inside the PL/pgsql interpreter to grab lots of information for each statement of each executed function. I only had to change the code to make it less a burden on my system. All I wanted was to get a line in the PostgreSQL log files when a function starts, when it ends, and when a function's statement is executed. It turns out that it's actually quite easy to do. You can find the code on the log_functions github page. After compilation, installation, and configuration of this tool, and then the execution of the query, I got this log (somewhat edited to make it fit better on this blog post):

10:56:24 LOG:  log_functions, DECLARE, f52
10:56:24 LOG:  log_functions, BEGIN, f52
10:56:24 LOG:  log_functions, STMT, line 7, type 14
10:56:24 LOG:  log_functions, STMT, line 14, type 14
10:56:24 LOG:  log_functions, STMT, line 23, type 14
10:56:24 LOG:  log_functions, STMT, line 32, type 14
10:56:24 LOG:  log_functions, STMT, line 41, type 14
10:56:24 LOG:  log_functions, STMT, line 52, type 14
10:56:24 LOG:  log_functions, STMT, line 75, type 14
11:00:14 LOG:  log_functions, STMT, line 99, type 14
11:00:14 LOG:  log_functions, STMT, line 110, type 14
11:00:14 LOG:  log_functions, STMT, line 122, type 14
11:00:14 LOG:  log_functions, STMT, line 133, type 14
11:00:14 LOG:  log_functions, STMT, line 145, type 14
11:00:14 LOG:  log_functions, STMT, line 156, type 14
11:00:14 LOG:  log_functions, STMT, line 168, type 14
11:00:14 LOG:  log_functions, STMT, line 179, type 14
11:00:14 LOG:  log_functions, STMT, line 191, type 14
11:00:14 LOG:  log_functions, STMT, line 202, type 14
11:00:14 LOG:  log_functions, STMT, line 214, type 14
11:00:14 LOG:  log_functions, STMT, line 225, type 14
11:00:14 LOG:  log_functions, STMT, line 237, type 14
11:00:14 LOG:  log_functions, STMT, line 248, type 14
11:00:14 LOG:  log_functions, STMT, line 260, type 14
11:00:14 LOG:  log_functions, STMT, line 271, type 14
11:00:14 LOG:  log_functions, STMT, line 283, type 14
11:00:14 LOG:  log_functions, STMT, line 294, type 14
11:00:14 LOG:  log_functions, STMT, line 304, type 14
11:00:14 LOG:  log_functions, STMT, line 306, type 13
11:00:14 LOG:  log_functions, STMT, line 308, type 10
11:00:14 LOG:  log_functions, END, f52

I also got lucky here. All the time spent on this function is spent on the statement in line 75. So what I really need to know is why it takes so long to do this query. The search of performance for this query is not really relevant for this blog post. I only wanted to show how you can profile a PL/pgsql function easily without changing its source code.

As I think it may be a useful tool for many people, it's available on my github page, and on pgxn. Hope it can help people profiling their PL/pgsql functions.

To end this blog post, know that there are many things one can do with hooks in PostgreSQL. This is just one of them. I already explained how you can use another one to log only superuser queries. This new one will totally fit in with my talk on hooks at PGCon 2012 :)

Ajouter un commentaire

Les commentaires peuvent être formatés en utilisant une syntaxe wiki simplifiée.

Fil des commentaires de ce billet