Monthly Archives: February 2019

Finding the 100% CPU culprit in multi-threaded applications

Wow, after many years with Linux I just stubled upon some ultra useful functionality and now I feel kind of stupid because I didn’t know about it all that time, lol. :)

I have a multi-threaded application – namely Asterisk, a software PBX – that was always at 250% CPU for many months without a visible reason. Googling for Asterisk + high CPU brought me to this site: https://moythreads.com/wordpress/2009/05/06/why-does-asterisk-consume-100-cpu

Basically everything is already explained there but I’ll give another example. Here sample output from top:

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
16780 asterisk -11   0 4681m 115m  10m S 207.5  3.0 649:34.98 asterisk

So, Asterisk going berserk at 207% CPU. Let’s install pstack:

yum install gdb

Let’s look at all the threads of Asterisk, PID 16780, using the magic parameters -LlFm to ps which will show all threads of that process:

ps -LlFm 16780

The output will be something like the following. Look for the C column which means CPU usage:

F S UID        PID  PPID   LWP  C NLWP PRI  NI ADDR SZ WCHAN    RSS PSR STIME TTY        TIME CMD
[...]
1 S asterisk     -     - 32498  0    -  49   - -     - futex_     -   7 04:15 -        00:00:00 -
1 S asterisk     -     - 32499  0    -  49   - -     - poll_s     -   1 04:15 -        00:00:00 -
1 S asterisk     -     -   418  0    -  49   - -     - inotif     -   4 04:33 -        00:00:00 -
1 R asterisk     -     -  3967 99    -  49   - -     - -          -   0 05:59 -        12:14:44 -
1 R asterisk     -     -  4367 99    -  49   - -     - -          -   1 06:05 -        12:08:27 -
1 S asterisk     -     - 22668  0    -  49   - -     - poll_s     -   4 16:40 -        00:00:19 -
1 S asterisk     -     - 23627  0    -  49   - -     - poll_s     -   7 17:20 -        00:00:12 -
1 S asterisk     -     - 23641  0    -  49   - -     - poll_s     -   2 17:20 -        00:00:11 -
[...]

Notice those two entries with 99 CPU: LWP 3967 and 4367! Let’s look at these little f*ckers in more detail:

pstack 16780 > /tmp/asterisk.stack.txt

Let’s look into /tmp/asterisk.stack.txt and search for our two LWPs 3967 and 4367…

Thread 35 (Thread 0x7ff211a9b700 (LWP 3967)):
#0  0x00007ff248cb96ec in recv () from /lib64/libc.so.6
#1  0x00007ff23c07b8b1 in ooSocketRecv () from /usr/lib64/asterisk/modules/chan_ooh323.so
#2  0x00007ff23c06461f in ooH2250Receive () from /usr/lib64/asterisk/modules/chan_ooh323.so
#3  0x00007ff23c064fba in ooProcessCallFDSETsAndTimers () from /usr/lib64/asterisk/modules/chan_ooh323.so
#4  0x00007ff23c06518e in ooMonitorCallChannels () from /usr/lib64/asterisk/modules/chan_ooh323.so
#5  0x00007ff23c14ed95 in ooh323c_call_thread () from /usr/lib64/asterisk/modules/chan_ooh323.so
#6  0x000000000057a1a8 in dummy_start ()
#7  0x00007ff2476eaaa1 in start_thread () from /lib64/libpthread.so.0
#8  0x00007ff248cb893d in clone () from /lib64/libc.so.6
Thread 34 (Thread 0x7ff21195e700 (LWP 4367)):
#0  0x00007ff248cb96ec in recv () from /lib64/libc.so.6
#1  0x00007ff23c07b8b1 in ooSocketRecv () from /usr/lib64/asterisk/modules/chan_ooh323.so
#2  0x00007ff23c06461f in ooH2250Receive () from /usr/lib64/asterisk/modules/chan_ooh323.so
#3  0x00007ff23c064fba in ooProcessCallFDSETsAndTimers () from /usr/lib64/asterisk/modules/chan_ooh323.so
#4  0x00007ff23c06518e in ooMonitorCallChannels () from /usr/lib64/asterisk/modules/chan_ooh323.so
#5  0x00007ff23c14ed95 in ooh323c_call_thread () from /usr/lib64/asterisk/modules/chan_ooh323.so
#6  0x000000000057a1a8 in dummy_start ()
#7  0x00007ff2476eaaa1 in start_thread () from /lib64/libpthread.so.0
#8  0x00007ff248cb893d in clone () from /lib64/libc.so.6

Aha! Something about ooh323 which is a module responsible for H.323 which we really don’t need in 2019. Let’s deactivate the culprit and put this into /etc/asterisk/modules.conf:

noload => chan_ooh323.so

Restart asterisk and voila:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
25893 asterisk -11   0 4601m  64m  10m S 14.3  1.7   3:16.92 asterisk

14% CPU is about what Asterisk really consumes given the load. Problem solved. pstack and ps to the rescue!