ERights Home elib 
Back to: Living with a Legacy On to: Java/ELib Coding Conventions

The Tracing
Interface


When tracing, you post trace messages with a particular level that refer to a particular subsystem. Tracing messages have two possible destinations: the log or the transient buffer. The same message can go to both the log and the trace buffer.

At present, the log goes to standard error. In the future, it will go to a disk file by default. (You will still be able to send it to standard error.)

The transient buffer is normally stored invisibly. You can dump it to a disk file or standard error. Unlike the log, older messages in the transient buffer are thrown away to make room for new ones.


Priority levels
Subsystems

Information for programmers

Controlling tracing at launch time

Output format

Controlling tracing through the Microcosm log window

(There will be a more complete javadoc for the Trace class someday.)


Priority levels

Here are the priority levels. The log and the transient buffer have separate priority thresholds. Trace messages below the threshold are ignored. The threshold can be set at program launch time. It can also be changed at runtime.

  • Error messages report on some internal error. They don't necessarily lead to the system stopping, but they might. Error messages always appear in both the log and the transient buffer - you can't set either threshold above error.
  • Warning messages are not as serious as errors, but they're signs of something odd.
  • World messages track the state of the world as a whole. They are the sort of things world operators ask for specifically, such as "can you tell me when someone connects." They should appear only occasionally, much less often than once per second. By default, messages of this priority (and higher) are stored in the on-disk log.

    If the code being traced is infrastructure code, it may be unaware of the "world" that uses it. Such code should use messages at this level when the programmer wants them always to go into a permanent log. So there shouldn't be too many of them, and they should record events of interest at some time arbitrarily distant.
  • Usage messages are used to answer the question "who did what up to the point the bug appeared?" ("Spock entered Azturf. Spock started trading with Kyra. Kyra gave Spock a diamond in exchange for a lump of coal. Kyra spoke.") They are also used to collect higher-level usability information. By default, messages of this priority (and higher) are stored in the transient buffer.

    Infrastructure code should use this level for messages that allow the programmer to figure out roughly how the program got to some point where it broke when someone else - probably an end user - ran it. These messages might make the difference between a non-reproducible bug you can't figure out and one you can.
  • Event messages describe the major actions the system takes in response to user actions. The distinction between this category and debug is fuzzy, especially since debug is already used for many messages of this type. However, event can be used to log specific user gestures for Ellen, and to log information for testers. I've found it useful to set the transient buffer's threshold to event and log enough information there to be able to figure out generally what the code was doing before it broke. If I need more detail, I rerun the program with the threshold set to debug.

    Programmers of infrastructure code use this level when they're running some of their code, want a bit more detail about what it's doing, but don't want to see all the nitty-gritty that debug level provides. Key difference between this level and usage is that the developer is setting this level to get messages that are useful in a debugging situation, or one where he or she is just trying the code out. (Testers might also run at the event level, since they expect to find bugs. usage level is for people who don't, but want to be able to provide info if they do.)
  • Debug messages provide more detail for people who want to delve into what's going on, probably to figure out a bug.
  • Verbose messages provide even more detail than debug. They're probably mainly used when first getting the code to work.

Subsystems

Some subsystems are predefined. They are: agent, comm, console, dgc (distributed garbage collector), dynamics, eruntime, file, fonts, gui, io, persist (persistence subsystem), plruntime (Pluribus runtime), repository, scene, sound (VoxChat, DirectSound), startup, timers, trace, ui (the classes involved in going from unum to framework and back), una, vat. You can also create your own subsystems at runtime.

Each subsystem can have its own priority threshold for both the log and the transient buffer.


Adding tracing statements to code

Using a predefined subsystem

To log a debug-level message about the Fonts subsystem, you would write:

if (Trace.fonts.debug) { Trace.fonts.debugm("Your message here"); }

The test prevents an expensive call when the threshold is higher than "debug". Notice that each priority level has its own tracing message: errorm(), debugm(), worldm(), and so on. The trailing 'm' (for "method") is there because the E compiler once had problems when fields and methods had the same name. Although those problems seem to have gone away, Danfuzz prudently suggested avoiding this edge case.

(For backwards compatibility, the "$" message is also supported - see below - but don't use it in this context. It is deprecated.)

If you're adding tracing statements to code that you know cannot be executed frequently, you can reduce clutter by writing:

Trace.fonts.warningm("Take a look at these: " + obj1 + ", " + obj2);

All of the tracing methods check the appropriate variable themselves, so the message will only be posted if warning is true. Note that every call will pay the overhead of constructing a string from the two strings and two objects, which is ridiculously expensive.

Creating your own subsystem

To create your own subsystem, write:

static private Trace subsystemTrace = new Trace("subsystem");

As with predefined subsystems, the threshold is defined at program launch time. Be careful to make the trace variable static; you don't need one per instance.

After creating it, you use your private Trace object just like a predefined one:

if (subsystemTrace.world) { subsystemTrace.worldm("Your message here"); }

Tracing a class

The Trace constructor can also take no arguments:

static private Trace tr = new Trace();

The resulting Trace object treats the current class as a subsystem. This was the only behavior of the previous version of Trace. In old code, you'll see calls like:

if (tr.error) { tr.$("Your message here"); }

"$()" is a synonym for "debugm()". (The difference between "debugm()" and "errorm()" matters slightly; see below.)

Note: this version of the trace constructor is currently not available. I'm leaving the documentation here in hopes that it will someday be re-enabled.

Timing support

In addition to the variables described above, there is a timing variable that can be used to log messages used in performance tuning. You use it the same way as other variables:

if (tr.timingm) { tr.timingm("Your message here"); }

Note that the variable has an associated timingm method that identifies the message as a timing message.

The timing variable is entirely independent of the other variables (error, etc.).

Tracing objects, including exceptions

All of the trace messages can take a second argument, an arbitrary Java Object. Most objects are printed with toString. Throwables are handled specially. They are printed as a stack traceback. Nested exceptions are handled (as with EthreadGroup.reportException).

You can achieve the same effect with calls of the form tr.levelReportException(Throwable t, String msg), where level is verbose, debug, event, usage, world, warning, or error. (Sorry about the inconsistent order of arguments: the perils of simultaneous development.)

We recommend that the code that disposes of an exception should log it. That way, exceptions are always logged exactly once. So any code that catches and does not rethrow an exception should look like this:

	try {
        } catch (Exception e) {
	    myTrace.errorm("Here's the stack trace for the bug: ", e);
	    ...
        }

(Note: this code is going to have to decide whether the exception represents a bug or not. An errorm() isn't appropriate if it's not a bug.)

If you catch an exception and throw a different exception, you should either log the discarded exception or throw a NestedThrowable with the original exception included. Either way, the exception will end up in the log.

Controlling tracing at runtime

The trace subsystem is controlled by giving it properties. The interface is TraceController.setProperty(String property, String value). The properties are given in the next major section.

Reporting bugs to users

error-level trace messages are always placed in the log. They do no good there unless someone looks at them. Your code can use two pairs of methods to direct a user's attention to the log.

The first pair will notify a user who has volunteered to look at bugs.

TraceController.notifyOptional();
tr.notifyOptional();

notifyOptional first checks if the user has volunteered. If so, it pops up a non-modal dialog box directing her to the location of the log.

The second pair unilaterally informs a user of a fatal error.

TraceController.notifyFatal();
tr.notifyFatal();

notifyFatal pops up a modal dialog box telling the user that a fatal bug has been discovered, informing her of the location of the log, and telling her that the product is about to exit. When she exits the dialog box, the product exits. This routine never returns.

What code should call these methods?

  • Any code that throws away an exception-representing-a-bug has by definition resolved (somehow) a nonfatal error. It should invoke notifyOptional():
	try {
        } catch (Exception e) {
	    myTrace.errorm("Here's the stack trace for the bug: ", e);
	    Trace.notifyOptional();
        }
  • The exception-catcher of last resort will call one of the two methods. Most likely, the default will be to call notifyOptional(), but it may call notifyFatal() for particular kinds of exceptions.
  • A method that finds a bug in itself and immediately fixes it should send a notifyOptional():
	if (myHashtable == null) { 
	   // darn, I wonder why that happened.
	   // doesn't matter, an empty one will work well enough.
	   myHashTable = new Hashtable();
	   // Log my state for debugging.
	   myTrace.errorm("null hash table?", this);
	   Trace.notifyOptional();
	}  
  • Code that knows it has found a fatal bug can call notifyFatal(), but it's probably better to just throw some guaranteed-to-be-fatal exception type. That way you won't have to write code like:
	int foo() {
	    ...
	    if (somethingFatallyHorribleHappened) {
		Trace.notifyFatal();
		return 0; // shut up the compiler.
	    }

You also give a caller a chance to dump its state to the trace log.

Using tracing effectively

It's common for in-house testers and other programmers to stumble across a bug and wonder what they did to provoke it. The result is all too often an unreproducible bug, an inexact description of the circumstances, and a log that doesn't contain much detail. And the programmer's stuck knowing there's a bug somewhere, but unable to fix it. (The Inspector is wonderful, but sometimes you need to know how you got to the hosed state it lets you inspect.)

The situation is worse with beta testers or customers. They are notorious for providing useless bug reports (when they provide any at all).

With the internal log, people be able to notice a bug, dump a better trace of what they did to get to that point, and give much more detail to the programmer. Except that right now (October 97), the internal log doesn't show much. For example, you can't tell from it that an avatar walked between regions.

So you can help your fellow programmers by:

  1. Adding usage level trace messages that log major (once per user action) events.
  2. Adding event level messages that act as "signposts", so that's you'll be able to tell from the trace roughly what path your code is taking. (Testers may often run with event-level tracing turned on for specific subsystems.)
  3. Being tolerant if testers ask you to upgrade, downgrade, or add tracing statements.
  4. Converting EStdio or System.out messages to trace messages.
  5. Converting old-style no-argument Trace constructors to use subsystem names, so that it's easier to change the tracing level for big chunks of code at once.

You can also help yourself more directly. In the past, I've had good luck with similar systems when I followed this discipline:

  • I got a bug report.
  • I spent way too much time finding the problem.
  • I added debugging code that would make similar problems obvious next time. This could be trace code, assertions, inspector-like hooks, whatever.

Over time, the code grew into something that practically shouted out its bugs.

Trace is not speedy because it makes us painfully reconstruct at runtime information readily available at compile time. (Think __LINE__ and __FILE__ in C++.) But it can put 385 messages/second into the buffer on a standard development machine, which should be sufficient.

This is important. I once had dinner with some people in a roughly similar product space, and they described their testing/build environment. Their Gordie-equivalent in charge of all infrastructure code described their tracing system with the loving tone of English nobility describing a devoted servant.


Launch time control of priority thresholds

Priority thresholds are first set at launch time by properties (either on the command line or in property files).

The default threshold applies to all subsystems that have not been given their own thresholds. To set the default threshold for the transient buffer and log (respectively), use:

TraceBuffer_default=level
TraceLog_default=level

Where level is one of "error", "warning", "usage", etc.

To set the threshold for a particular subsystem, use:

TraceBuffer_subsystem=level
TraceLog_subsystem=level

The subsystem name is not sensitive to case. Subsystem thresholds always take priority over the default threshold, whether they're higher or lower than it.

You can cause the subsystem to once again track the default threshold with:

TraceBuffer_subsystem=default
TraceLog_subsystem=default

Since a class is a subsystem, you can set the threshold with a property like this:

TraceBuffer_java.lang.String=debug
TraceLog_java.lang.String=debug

Note that the fully-qualified classname is required. (In this version, Pluribus and E classnames are not demangled. In a later version, they will be.)

Launch time control of timing

Timing (performance) messages can be turned on at launch time with properties of this form:

TraceBuffer_subsystem_timing=on
TraceLog_subsystem_timing=on

You can use "true" as a synonym for "on". Timing messages can be turned off with:

TraceBuffer_subsystem_timing=off
TraceLog_subsystem_timing=off

You can use "false" as a synonym for "off".

There is no way to turn timing on or off for all subsystems at once. That doesn't seem useful.

Controlling the log at launch time

By default, the log appears on standard error. To turn it off, use:

TraceLog_write=false

At some point (when more of Microcosm uses the tracing system), the default log will be a file. The default logfile is "ECLog.yyyymmddhhmmss.txt" in the current directory. (Note: the current directory can change while Microcosm is running. When there's some official source of default directory, the trace code will be changed to use it.) The date field records the time Microcosm started (strictly, the time when the tracing system was initialized). You can override the directory with:

TraceLog_dir=pathname

To change the log's tag from "ECLog" to something else, use:

TraceLog_tag=new

You can completely override the log name (to give you something like "myfoo.log.bar") with:

TraceLog_name=name

The directory still applies, unless name is an absolute pathname.

To write to standard error, use

TraceLog_name=-

(That's a dash.)

Note: during the interim while standard error is the default destination, use "TraceLog_tag=ECLog" to use the "ECLog.yyyymmddhhmmss.txt" logfile described above.

By default, logs have a limited size of 500,000 characters. Past that size, the log will, by default, be closed, renamed to ECLog.yyyymmddhhmmss.sequence.txt, and a new log will be opened. The sequence number is 0 if no file named ECLog.yyyymmddhhmmss.0.txt exists. Otherwise, it is N+1, where N is the largest sequence number that already exists. If you picked a different file format (using TraceLog_name), the sequence number is appended.

If, when tracing is started, a matching log exists (which should only happen if TraceLog_name was used), a new one is created and the old one is renamed as above. Whenever a log is renamed, the old and new names are logged as a world level message, so they will probably appear in the new log.

This behavior can be modified as follows:

TraceLog_size=new-maximum-size

// size is in characters.

TraceLog_size=default

// back to the default

TraceLog_size=unlimited

// the log never fills up until the file system does.

TraceLog_backups=1
TraceLog_backups=one

// there will be at most one backup file. It will have sequence number 0. If the backup file already exists, it will be overwritten.

TraceLog_backups=many

// return to default behavior, always backing up to a file with an unused sequence number.

 

Controlling the transient buffer at launch time

The buffer by default stores at most 500 messages. (Note: messages, not bytes.) When the 501'th message arrives, the first is discarded. The buffer size can be modified with:

TraceBuffer_size=new-maximum-size
TraceBuffer_size=default
TraceBuffer_size=unlimited

Normally, trace messages in the buffer are not displayed. Use the Microcosm log window to display them.


Output format

Log output is verbose. Here is what log output looks like:

=== 1997/09/22 13:52:09 (DirAndTag.main:DirAndTag.java:26) WLD
comm: That's all, folks - end of test.

Reading from left to right (not the way you usually scan logs), this means that:

  1. The message was posted at 13:52 on September 22, 1997. Dates are in local time. Note that the format is year/month/date. That makes logs sortable by date, and avoids confusion about whether 07/08/97 is a date in July or August.
  2. The trace message came from the method main in the class DirAndTag. The particular call was on line 26 of DirAndTag.java.
  3. The message was posted with priority world.
  4. The message is about the subsystem comm.
  5. The actual message is " That's all, folks - end of test."

If the trace call had been through a trace object created for a class, it would have looked like:

=== 1997/09/22 13:52:09 (DirAndTag.main:DirAndTag.java:26) WLD
ec.e.run.DirAndTag: That's all, folks - end of test.

Note that the classname-as-subsystem is fully qualified, but the method name is not (to save space).

For reference, the level abbreviations are:

ERR

Error

WRN

Warning

WLD

World

USE

Usage

EVN

Event

DBG

Debug

VRB

Verbose

 

TIM

Timing

 


Controlling tracing through the Microcosm log window

The Microcosm log window can control the trace subsystem. For a brief summary of commands, use

trace help

Changing the priority thresholds that control which messages are traced

Priority thresholds for the log are changed with commands of this form:

tracelog subsystem threshold

You can put a comma between the subsystem and the threshold. For the time being, "trace" is a synonym for "tracelog".

To change the default threshold, use:

tracelog default threshold

(Note: if you change the threshold to "debug", you'll regret it. It produces an enormous quantity of output.)

Thresholds for the buffer are changed with:

tracebuffer subsystem threshold

Timing messages

Timing messages can be turned on with:

tracebuffer subsystem_timing on
tracelog subsystem_timing on

"true" can be used instead of "on". Timing messages can be turned off with:

tracebuffer subsystem_timing off
tracelog subsystem_timing off

"false" can be used instead of "off".

Dumping the transient buffer

To dump the transient buffer's contents to standard error, use:

trace dump

(You can also use "tracebuffer dump".) To dump it to a file, use:

trace dump filename

The filename can be absolute or relative. If relative, it's relative to the current working directory (wherever that is). Don't put spaces in the filename.

Other controls

Any of the properties that can be set at launch time can also be set through the log window. For example to change the number of messages the trace buffer holds, use:

tracebuffer size 5000

Commands that change the name of the logfile take effect the next time it is opened. To force it to be reopened, use:

tracelog reopen true

 

 
Unless stated otherwise, all text on this page which is either unattributed or by Mark S. Miller is hereby placed in the public domain.
ERights Home elib 
Back to: Living with a Legacy On to: Java/ELib Coding Conventions
Download    FAQ    API    Mail Archive    Donate

report bug (including invalid html)

Golden Key Campaign Blue Ribbon Campaign