On a recent project we rolled our our profiling tools.

Poor Man's Profiling

The first rule of optimization is don't do optimization. The second rule is to measure first. If you don't measure then you cannot possibly know if you've made an improvement or (in the worst case) made things worse.

On a recent project we had suspicions that certain parts of our application were running more slowly than we'd like. Rather than mess with expensive profiling tools we wrote a small utility class that would help us to gather statistics and make decisions about the best place to make changes.

We used log4net to record statistics as we gathered them so that we could parse the log file at a later time in different ways (and keep data from several runs for comparison). We created a separate "Profiling" logger so that it could be turned on and off independently of the rest of the applications logging. Here is our config

<configSections>
  <section name="log4net" type="log4net.Config.Log4NetConfigurationSectionHandler, log4net" />
</configSections>

<log4net>

  <root>
    <level value="OFF" />
  </root>

  <logger name="Profiling">
    <level value="ALL" />
    <appender-ref ref="ProfilerAppender" />
  </logger>

  <appender name="ProfilerAppender" type="log4net.Appender.RollingFileAppender,log4net">
    <param name="File" value="Profile.Log" />
    <param name="AppendToFile" value="true" />
    <rollingStyle value="Date" />
    <datePattern value=".yyyy.MM.dd" />
    <preserveLogFileNameExtension value="true" />
    <layout type="log4net.Layout.PatternLayout">
      <conversionPattern value="%date{HH:mm:ss,ff} [%-2thread] - %message%newline" />
    </layout>
  </appender>

</log4net>

Next up we need a class that can record how long something takes. To do this we want to start a StopWatch when the activity starts and log the time elapsed to our Profiling logger. We're going to do this with an IDisposable so that we can take advantage of the using keyword in C# but we don't want to do anything if the Profiling Log is turned off. Here's the full code:

class ProfileRegion : IDisposable
{
    private static readonly ILog Log = LogManager.GetLogger("Profiling");
    private static readonly IDisposable Dummy = new DummyRegion();

    private readonly Stopwatch _stopwatch = new Stopwatch();
    private readonly string _tag;

    private ProfileRegion(string tag)
    {
        _tag = tag;
        _stopwatch.Start();
    }

    public void Dispose()
    {
        _stopwatch.Stop();
        Log.InfoFormat("Exited {0}. Took {1} ms", _tag, _stopwatch.ElapsedMilliseconds);
    }

    public static IDisposable Named(string name)
    {
        if (Log.IsInfoEnabled)
            return new ProfileRegion(name);
        return Dummy;
    }

    private class DummyRegion : IDisposable
    {
        void IDisposable.Dispose() {  }
    }
}

When you start an operation that you would want timed you call ProfileRegion.Named() and pass in a unique name. The object you get back represents the lifetime of the operation being performed. As soon as the operation completes you can dispose of this lifetime object and the timing information will get saved out to the profiling logger. You can also stop this from happening by adjusting the level of the Profiling logger to be lower than INFO. If you do that the rest of your code may contain these profile region calls but effectively nothing will happen so there will be minimal overhead. Here is a quick (silly) sample of its usage:

        foreach (var i in Enumerable.Range(1, 1000))
        {
            using(ProfileRegion.Named("SomeOperation"))
            {
                LongOperation();
            }
        }

And the contents of the Profiling Log file after running this is as follows:

10:25:12,14 [1 ] - Exited SomeOperation. Took 500 ms
10:25:12,64 [1 ] - Exited SomeOperation. Took 500 ms
10:25:13,14 [1 ] - Exited SomeOperation. Took 499 ms
10:25:13,64 [1 ] - Exited SomeOperation. Took 499 ms
10:25:14,14 [1 ] - Exited SomeOperation. Took 499 ms
10:25:14,64 [1 ] - Exited SomeOperation. Took 499 ms
10:25:15,14 [1 ] - Exited SomeOperation. Took 499 ms
10:25:15,64 [1 ] - Exited SomeOperation. Took 499 ms
10:25:16,14 [1 ] - Exited SomeOperation. Took 499 ms
10:25:16,64 [1 ] - Exited SomeOperation. Took 499 ms
10:25:17,14 [1 ] - Exited SomeOperation. Took 499 ms
... elided for sanity reasons

Now we have some readings we can gather some insight. We do this with a LINQPad script that gathers each of the lines, aggregates the result based on the tag and determines the low, average and high values, along with the standard deviation and a count.

from line in File.ReadLines(@"C:\Work\Sandbox\Profiling\ProfilingSample\ProfilingSample\bin\Debug\Profile.Log")
let match = Regex.Match(line, @"Exited (.*)\. Took (\d+) ms$")
where match.Success
let tag = match.Result("$1")
let time = Int32.Parse(match.Result("$2"))
group time by tag into g
let avg = g.Average()
let stdDev = Math.Round(Math.Sqrt(g.Select (x => Math.Pow(x - avg, 2)).Average ()) / 1000d, 2)
let min = g.Min()
let max = g.Max()
select new
{
    Tag = g.Key, 
    MinimumTime = Math.Truncate((decimal)min)/1000, 
    AverageTime = Math.Truncate((decimal)avg)/1000, 
    MaximumTime = Math.Truncate((decimal)max)/1000,
    StdDev = stdDev, 
    TimesExecuted = g.Count(), 
    // To uncomment this line you need the sparklines extension found here: https://gist.github.com/wolfbyte/4671548
    //Sparkline = g.Select(x => x / 1000d).AsSparkLines(width: 1000, height: 100, low: min / 1000, high: max / 1000)
}

For an added benefit, if you grab the AsSparkLines extension method found at MyExtensions.linq, you can uncomment that last line and get some nice pretty graphs.

Armed with this information you can make some decisions. Are the averages on a particular region too high? Target that region for optimization. Is the max a very large outlier and if so is it the first execution (the sparklines are great for determining this)? If so then it may be worth performing a dummy "priming" run of the operation or delaying it's first execution. Are the sparklines jaggy (i.e. the execution time varies wildly based on input) or are they flat (you can use std dev for this too)?

Being able to capture this information at any time (including turning it on in production to capture data) has been an invaluable tool in our diagnostic arsenal. In our environment all of our remote calls go through a central pipeline and that pipeline records timing information for each remote operation on each end. This allows us to see which remote calls need to be improved upon (or are not meeting our SLAs). Speaking of SLAs it would be very easy to add timing limit information to ProfileRegions and have it issue Warning or Errors if they are breached. You can then configure log4net to email these to you or write to the event log or whatever you need.

If you find this useful, drop a comment below and tell me how you used/adapted it.

Posted by: Mike Minutillo
Last revised: 10 Dec, 2013 03:05 AM History

Comments

No new comments are allowed on this post.