Tuesday, February 24, 2009

Performance Optimization

Optimization is process of modifying application to minimize use of computer resources like CPU, memory, network, disk space, power etc. Aspects of optimizing application covered in this article are CPU usage and memory usage. There can be tradeoff between these two aspects of optimization so it is important to focus in the direction of required aspect. This article is supposed to be used as a manual for performance optimization. The process of optimization can be specific to the programming language, compiler, hardware, operating system etc. This article shall focus on general aspects and shall provide a few system specific aspects as example.
Optimizing design:
When developing a system design a thought should be given to resources available in the system. A given system can have many limitations a few indicative examples are give in the table below.
System Possible Design Consideration
Application/OS for laptopsPower, Laptops sometime runs on power supplied by batteries and application should minimize the power usage.
Application for handheld/ battery operated devices.Power
Embedded devices like Mobiles/mp3 playersMemory/Processing power.
Network based applicationNetwork bandwidth

Table: Showing the possible system limitation.

Writing Optimized code.
When writing code the focus should be in making code more readable. Readability shall reduce the bug count and hence development time. Writing optimized code shall not be a very good idea because optimization shall reduce readability of the code as well as it shall require more time to code same functionality. Moreover while writing code one never know on which area of the code is to be focused and most of the time the coding effort would be useless. So writing optimized code throughout shall never be a good idea for optimizing.

Compiler Optimization
All modern compilers have many ways to generate object code. Compilers can be instructed to optimize the object code generated by. Generally when compiling with no optimization compiler generate highly inefficient code (this would in turn reduce the compilation time). Compiler optimization is an easiest way to start the optimization. As an example how to go about compiler optimization I use gcc as my compiler.
GCC compiler has various optimization switches and optimization levels. An optimization level is simply a collection of a certain optimization switches. The optimization levels available in GCC are none, O1, O2, and O3 and -Os.
Optimization LevelFeaturesOptimization Type
None or -O0 No optimization, this is the default level and shall generate highly inefficient code. However with this mode the compilation time take is the least and debugging the code with tools like GDB is easiest. Optimization levels should be applied only to the production software.Easy debugging (no other optimization), less compilation time.
-O or -O1 Optimize without increasing the image size.Minimizing CPU usage (speed improvement) and image size.
-O2 All speed optimizations except inline of function. Minimizing CPU usage (speed improvement) and image size.
-O3 All possible optimization (with respect to CPU usage) including in-lining of smaller functions. The binary/image generated by this level is largest of all optimization level and debugging is extremely difficult. This level has only a marginal efficiency improvement over previous level. Minimizing CPU usage (on the cost of debugging and image/binary size)
-Os Optimize for the size of image/binary Minimizing image size (might have tradeoff with speed)

Table: Showing GCC Optimization levels

The optimization levels in GCC are provided as a quick tool, individual optimization switches or combination of optimization switches and optimization levels should be used to customize the compiler optimization. For more details please see the man page of GCC.

So what to do even if Compiler optimization doesnot help:

It is time to learn some new software rule. 90% of the CPU time is taken by 10% of the code. Yeah there is only a relatively small amount of code. So job of optimization shall become easy if one can locate this 10% code (the bottleneck). Now one would think to go and look for the code which might be taking more CPU but this idea shall almost never work. Now the question is how to find the bottlenecks? There are tools available in the market to find these bottlenecks. These tools can generate a table containing list of functions in the code and percentage of CPU taken by each function. These tools are called CPU profilers. CPU profiling tools are generally system dependant (However if a profiling tool is not available for a particular system a simulated application which reuses the application code can be developed to perform optimization on other system where optimization tools are available)
There is large number of such profiling tools. The table below gives list of few CPU profiling tools. These tools also depend upon the programming language and compiler.

SystemTool
VxWorks RTI scope tools (tm of RTI), bundle of CPU and memory profiler
Intel CPUVtune (trademark of Intel Inc.)
AMD (Windows and Linux)CodeAnalyst (trademark of AMD Inc.)
Linuxgprof
Table: Showing some CPU profiling tools


These and similar tools can be chosen depending on the environment, availability and affordability. Use of these tools requires running applications and the application should be run all expected environment where optimization might be required.

1 comment:

Anuradha said...

yes I think the we we approch the optimization problems is not as not so helpful. We must find the bottlenecks after we have done the code