How to find what is pegging CPU in linux kernel

Posted on

Problem :

The node of our hadoop cluster is running redhat5.3 2.6.18-194.17.4. (a old kernel version). We found some hosts are under 100% CPU utilization and specially all the CPU cores are on 100% sy%

top - 20:56:21 up 340 days, 22:28,  1 user,  load average: 2297.16, 2298.69, 2298.88
Tasks: 17923 total, 132 running, 17753 sleeping,   0 stopped,  38 zombie
Cpu(s):  0.2%us, 99.7%sy,  0.1%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  35840000k total, 33995836k used,  1844164k free,  2432312k buffers
Swap:        0k total,        0k used,        0k free, 12193444k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                                                 
 3362 eo        18   0     0    0    0 Z 10.0  0.0 101:32.83 java <defunct>                                                                                                                                                          
12818 eo        22   0 3896m 1.2g  18m S  7.4  3.6 728:05.05 java                                                                                                                                                                    
21396 qifei     19   0 26240  13m  812 R  6.1  0.0   9:48.80 top                                                                                                                                                                     
 1425 eo        18   0 3632m 1.0g  26m D  4.2  3.0  42:11.92 java                                                                                                                                                                    
 1398 eo        15   0     0    0    0 Z  4.2  0.0  41:09.95 java <defunct>                                                                                                                                                          
 1595 eo        18   0     0    0    0 Z  3.8  0.0  41:11.94 java <defunct>                                                                                                                                                          
 6079 root      25   0 93744  19m 3004 R  3.7  0.1  20:34.63 apolloHostComma                                                                                                                                                         
 6254 root      25   0  8068  456  380 R  3.7  0.0  20:28.19 date                                                                                                                                                                    
 2671 root      25   0 25004 3996 1404 R  2.5  0.0 265:33.27 apolloHostComma                                                                                                                                                         
 4573 root      25   0 23420 2352 1376 R  2.5  0.0  20:10.33 apolloHostComma                                                                                                                                                         
 4710 root      25   0 25400 4436 1404 R  2.5  0.0  19:50.97 apolloHostComma                                                                                                                                                         
 5047 root      25   0  174m  17m 5852 R  2.5  0.1  19:19.46 yum                                                                                                                                                                     
 5568 root      25   0 25136 4104 1404 R  2.5  0.0  19:36.23 apolloHostComma                                                                                                                                                         
 5649 root      25   0 24344 3296 1400 R  2.5  0.0  19:54.40 apolloHostComma                                                                                                                                                         
 6132 root      25   0 25004 4056 1404 R  2.5  0.0  19:26.55 apolloHostComma                                                                                                                                                         
 7084 snitch    25   0  8708  252  112 R  2.5  0.0  20:06.13 sh                                                                                                                                                                      
 7201 root      25   0  8368  716  584 R  2.5  0.0  19:27.99 ps                                                                                                                                                                      
 7749 root      25   0 27808 2840 1484 R  2.5  0.0  19:58.13 auth-sync.pl                                                                                                                                                            
 7975 root      25   0 31168 4000 1548 R  2.5  0.0  20:04.87 report                                                                                                                                                                  
 7977 root      25   0  9772  772  476 R  2.5  0.0  19:55.76 apollo-polling-                                                                                                                                                         
 8174 snitch    25   0  8708  708  588 R  2.5  0.0  19:52.57 sh                                                                                                                                                                      
 8307 eo        25   0 26008 3000 1480 R  2.5  0.0  19:49.94 perl                                                                                                                                                                    
 8583 root      25   0 25268 4296 1404 R  2.5  0.0  19:05.10 apolloHostComma                                                                                                                                                         
 9832 eo        18   0     0    0    0 Z  2.5  0.0  18:08.24 java <defunct>                                                                                                                                                          
 9856 eo        18   0 3454m  12m 7572 D  2.5  0.0  18:08.24 java                                                                                                                                                                    
 9882 eo        18   0     0    0    0 Z  2.5  0.0  18:24.09 java <defunct>                                                                                                                                                          
  666 root      25   0  174m  17m 5876 R  2.5  0.1  12:47.36 yum                                                                                                                                                                     
 1343 root      25   0 74820 1240  592 R  2.5  0.0 277:03.67 crond                                                                                                                                                                   
 1571 eo        18   0 3649m 563m  26m D  2.5  1.6  20:27.40 java                                                                                                                                                                    
 1601 eo        18   0     0    0    0 Z  2.5  0.0  21:15.44 java <defunct>                                                                                                                                                          
 2858 root      25   0 24872 3944 1404 R  2.5  0.0  20:30.74 apolloHostComma                                                                                                                                                         
 2881 root      25   0 53016  15m 1852 R  2.5  0.0  19:25.97 apolloHostComma                                                                                                                                                         
 3166 root      25   0 29396 4340 1452 R  2.5  0.0 264:38.79 RotateLogFiles.                                                                                                                                                         
 4392 root      25   0 29988 6980 1520 R  2.5  0.0  20:59.13 apolloHostComma                                                                                                                                                         
 4608 root      25   0 55224  15m 1804 R  2.5  0.0  20:46.56 apolloHostComma                                                                                                                                                         
 4624 root      25   0 24740 3808 1404 R  2.5  0.0  20:46.17 apolloHostComma                                                                                                                                                         
 4637 root      25   0 25004 4036 1404 R  2.5  0.0  20:46.43 apolloHostComma                                                                                                                                                         
 4681 root      25   0 28736 3608 1452 R  2.5  0.0  20:55.49 RotateLogFiles.                                                                                                                                                         
 4760 eo        18   0     0    0    0 Z  2.5  0.0  20:04.55 java <defunct>                                                                                                                                                          
 4979 root      25   0 74820  860  212 R  2.5  0.0  19:58.63 crond                                                                                                                                                                   
 5023 root      25   0 25484 2492 1472 R  2.5  0.0  19:41.18 auth-sync.pl                                                                                                                                                            
 5460 eo        25   0 23288 2220 1272 R  2.5  0.0  19:37.19 cron-babysit                                                                                                                                                            
 5551 eo        25   0 31916 6912 1608 R  2.5  0.0  19:36.55 cron-babysit                                                                                                                                                            
 5560 root      25   0 22496  696  532 R  2.5  0.0  20:42.10 report                                                                                                                                                                  
 5564 root      25   0  8708  244   92 R  2.5  0.0  19:36.86 SnitchAgentCont                                                                                                                      

From the first several rows of top output, it is not obvious to tell how CPU is consumed.

Sometimes, we did see kswapd0 is on top rows, this could be caused by the fact that we have no swap space.

It is impossible to print the command line of the java process with top, ps, or /proc//cmdline, because the console will hang if we do so.

My question is: How can we find out what is pegging the cpu in the kernel.

Solution :

The system has 17923 processes, out of which 132 in Running state.

The rate at which the running processes are scheduled is high enough to yield steady CPU load averages of almost 2300. That scheduling itself and in general managing the entire process list and the resources they use is likely the bulk of your 99.7% sy value – using far more CPU than for actually executing the running processes (the remaining 0.3% in us and ni combined).

I also see several zombies around – they might indicate some misbehaving programs, but they could also indicate that the system is so busy that it can’t even find time to cleanup defunct processes (which would also fall in the sy category, BTW).

You need to cleanup a large portion of those processes if you want to get any level of decent performance out of this machine.

Leave a Reply

Your email address will not be published. Required fields are marked *