Saturday, June 8, 2013

Async logging performance with log4j

We do it all the time:
log.debug("Log me");
This particular line does three things:
1. Saves the message to some storage,
2. Obfuscates the business logic,
3. Slows the execution down.
However, there are moments when the performance is crucial. Can async logging help us in that matter?
Traditional sync appenders are defined as follows:
<appender name="sync-file-appender" class="org.apache.log4j.DailyRollingFileAppender">
        <param name="file" value="F:/test-harness.log" />
        <param name="DatePattern" value="'.'yyyy-MM-dd" />
        <layout class="org.apache.log4j.PatternLayout">
            <param name="ConversionPattern" value="[%t] %d{HH:mm:ss,SSS} %-5p %l - %m%n" />
        </layout>
</appender
Async ones can be defined:
<appender name="async-appender" class="org.apache.log4j.AsyncAppender">
        <param name="BufferSize" value="500000"/>
        <appender-ref ref="sync-file-appender"/>
</appender>
How do they work? In order to clear this out let me explain it step by step.

Sync logging:
1. Thread1 (request handling thread) is picked from the pool.
2. Thread1 goes through the business logic.
3. Thread1 enters log.debug("Log me");.
4. Thread1 waits until the message is flushed.
5. Thread1 goes out from log.debug("Log me");.

Async logging:
1. Thread1 (request handling thread) is picked from the pool.
2. Thread1 goes through the business logic.
3. Thread1 enters log.debug("Log me");.
4. Thread2 (logging thread) is picked and it lives its own life. It is up to Thread2 to flush the message.
5. Thread1 goes out from log.debug("Log me");.
I think that points 4 and 5 can permuted - it is up to thread scheduler.

Sync appender blocks the thread until message is flushed and it uses the same thread, whereas async logger uses separate daemon thread and does not block. OK, to be more precise, async appenders can block as well but only if the in-memory buffer overflows. However, we can specify:
<param name="Blocking" value="false"/>
and then the messages which are redundant are simply dropped and not logged. How is it in practice? I checked the performance of logging on two different storage devices:
1. HDD: RAID0 with two old Seagate Barracudas 9.
2. SSD: OCZ Vertex 4 128GB, firmware: v.1.5.

with the code:
public static void main(String[] args) {
  Logger log = Logger.getLogger("sync-logger"); // and 
                                                // Logger log = Logger.getLogger("async-logger");
  
  long start = System.currentTimeMillis();
  for (int i = 0; i < 900_000; i++) {
   log.debug(longText);   
  }
  System.out.println(System.currentTimeMillis() - start);
}
The results:
'sync-logger'
SSD: 18420 ms
HDD: 22520 ms

'async logger'
SSD: 592 ms
HDD: 596 ms

The first execution with 'sync-logger' resulted in ~2.5GB log file. Async logger produced much smaller file. It is due to the fact that async logging thread is daemon. JVM exits when all of the non-daemon threads are dead. When it comes to server applications there is no such problem because there is usually some thread pool and JVM lives a long life.
Finally, the time consumed by logging will be almost the same in both approaches, but the client does not wait for flushing the log message, because it is handled by another thread in the case of async logging, thus the application can be more responsive. Async logging may be helpful when we deal with massive logging, e.g. we expect some big data burst from JMS queue and we want to log the incoming messages.

1 comment :