Instrumentation by Composition

Heroku provides many instrumentations for your app out of the box through our new Heroku developer experience.

We have open-sourced some of the tools used to instrument Heroku apps, but today’s focus will be on instruments, a Go library that allows you to collect metrics over discrete time intervals.

What is instruments?

Instrumentation is the art and science of measurement and control of process variables within a production system. Instruments attached to a system may provide signals used to operate the system like circuit breakers or to alert a human operator.

The instruments library allows you to collect and report metrics inside your application, such that you might be keeping track of requests made along with their latency, or the number of elements in a data structure.

rate := instruments.NewRate()
latencies := instruments.NewTimer(1024)
latencies.Time(func() {
  rate.Update(1)
  copy(make([]int, 10), rand.Perm(10))
})

To achieve this, it provides some these base instruments:

  • Counter holds a counter that can be incremented or decremented.
  • Rate measures the rate of events over time.
  • Reservoir measures the distribution of values in a stream of data.
  • Gauge returns last value recorded.
  • Derive measures the rate of events over time, based on the delta with previous recorded value.
  • Timer measures the distribution of the duration of events.

These instruments collect metrics over a specific time-window, and expect a single reader to request the current value of a metric. The value you obtain will only reflect observations made during the last time-window and not prior windows. This allows them to be reset and to be a really good fit for measuring performance characteristics.

Composability

An instrument can be composed on top of other instruments, as long as they respect one of two interfaces for collecting metrics: one for discrete values and one for sampled values.

// Discrete represents a single value instrument.
type Discrete interface {
  Snapshot() int64
}

// Sample represents a sample instrument.
type Sample interface {
  Snapshot() []int64
}

These two simple interfaces allow you to create your own instruments, on top of the built-in ones.

As an example, let us collect HTTP request sizes made on a upload endpoint. In order to do that, we will create a custom instrument that will act as an http.Handler and collect request size through a Reservoir instrument.

type RequestSizes struct {
  r *instruments.Reservoir
  http.Handler
}

func NewRequestSizeHandler(h http.Handler) *RequestSizes {
  return &RequestSizes{
    r:       instruments.NewReservoir(0),
    Handler: h,
  }
}

func (rs *RequestSizes) Update(r *http.Request) {
  rs.r.Update(r.ContentLength)
}

func (rs *RequestSizes) ServeHTTP(w http.ResponseWriter, r *http.Request) {
  rs.Update(r)
  rs.Handler.ServeHTTP(w, r)
}

func (rs *RequestSizes) Snapshot() []int64 {
  return rs.r.Snapshot()
}

We can now encapsulate any http.Handler and collect the size of the requests it receives:

func upload(w http.ResponseWriter, r *http.Request) {
  w.WriteHeader(200)
}

func main() {
  sizes := NewRequestSizeHandler(http.HandlerFunc(upload))
  http.Handle("/upload", sizes)
  http.ListenAndServe(":8080", nil)
}

Base instruments are also taking advantage of composition, the Derive instrument itself is composed on top of the Rate instrument and the Timer instrument is built on top of the Reservoir instrument.

Reporting metrics

Each instrument returns raw values, utility functions are provided to extract quantile, mean and variance values out of it.

// Retrieve raw values.
s := sizes.Snapshot()
// Latencies 95th percentile.
q95 := instruments.Quantile(s, 0.95)
// Obtain the mean value.
m := instruments.Mean(s)

It also provides an optional registry to ease the collection of multiple metrics and a built-in logfmt reporter:

registry := reporter.NewRegistry()
registry.Register("requests-size", sizes)
// Report in logfmt format every minute.
go reporter.Log("dyno.1", registry, time.Minute)

But instruments is not opinionated about how metrics should be reported, it could also be avoided altogether by using values directly to trigger a circuit-breaker, a health-check, etc.

It will still give you the flexibility to choose what and how you send metrics to something like Graphite, Librato, InfluxDB, or similar.

You can define which instruments and which value you sent them to avoid overwhelming the underlying system with useless data or for cost control:

registry := instruments.NewRegistry()
for k, m := range registry.Instruments() {
  switch i := m.(type) {
  case instruments.Discrete:
    s := i.Snapshot()
    report(k, s)
  case instruments.Sample:
    s := i.Snapshot()
    p95 := instruments.Quantile(s, 0.95)
    report(fmt.Sprintf("%s.p95", k), p95)
    p99 := instruments.Quantile(s, 0.99)
    report(fmt.Sprintf("%s.p99", k), p99)
  }
}

If you want to give it a try today, contribute or report bugs, then check out the project on Github.

More from the author

Browse the archives for engineering or all blogs Subscribe to the RSS feed for engineering or all blogs.