← Back to home

Metrics first when writing go services

Classifying failure cases up front in metrics will help you quickly write clean code. This isn't Test Driven Development so much as it is Failure Driven Development.

This post has been archived.

Go metrics. Go fast

When writing a server in Go, it can help to start by thinking of what you’re going to measure, ie. metrics. Of course you’re going to measure failures and successes, but there’s only one way to succeed. There are probably half a dozen ways to fail. Classifying them up front in metrics will help you quickly write clean code. You’ll also be able to test it a little more readily.

Points of Failure

Assuming the TCP/HTTP connection is established, and the request reaches your server, what are the further points of failure when handling a request?

Failure to deserialize the request.
Failure to validate the deserialized data.
Failure to establish a connection to the database.
- Timeout w/ connection.
- Bad credentials.
- Database not up.
Failure to insert the data into the database.
- Index duplicate.
- Improperly formatted data.
- Catch-all Database error.
Failure to write to the request.
Failure to find an existing connection, dropped by client.

This is a non-exhaustive list, but it captures the important failures. Some error cases are subsets of others, which helps us distinguish which will be handled at resource, service, and DAO levels. It also helps us distinguish what level of detail we want to expose to the client of our server. We’ll capture metrics for both, but a client doesn’t care about the specific reason for failure. For example, from a client’s perspective it doesn’t matter why the database was down, just that it was.

Writing Metrics

Every failure produced at a given layer should directly map to a failure at the level above it. With this in mind, writing our methods for each layer becomes much simpler, and pairs well with how Go encourages error handling: methods/layers/services that can fail should return a value-error pair.

Diagram of layers in a REST application. Layer to layer, time moving down.

Resource

Inbound: Failure to deserialize the request.
Outbound: Failure to write to the response.
Outbound: Failure to find an existing connection / connection dropped by client.
Outbound: Service failure.

Service

Inbound: Failure to validate the deserialized data.
Outbound: DAO failure.

DAO

Inbound: Failure to establish a connection to the database.
Inbound: Failure to insert the data into the database (restrictions, etc)
Outbound: Failure to scan data returned from database.
Outbound: Database failure.

Each layer has more clearly defined contracts with the next. We can now write code where we handle all failures, returning the proper response, while recording the failure layer by layer. In this example I’m using crude metric names, but you get the idea.

// Resource
http.HandleFunc("/user", func(w http.ResponseWriter, r *http.Request) {
      id  := r.URL.Query().Get("id")
      err := validateUserId(id)
      if err != nil {
            recordMetricAndReturn("UserResource.user", status.StatusBadRequest)
            return
      }
      user, err := userService.GetUser(id)
      if err != nil {
            recordMetricAndReturn("UserResource.user", status.StatusServiceUnavailable)
            return
      }
      err = validateUser(user)
      if err != nil {
            recordMetricAndReturn("UserResource.user", status.StatusNotFound)
            return
      }
      writeSuccessfulReturn(&w, user)
})

// Service
func (userService UserService) GetUser(id string) (User, error) {
      user, err := userService.userDAO.GetUser(id)
      if err != nil {
            return recordMetricAndServiceReturn("UserService.GetUser", service_status.Error)
      }
      err = validateUser(user)
      if err != nil {
            return recordMetricAndServiceReturn("UserService.GetUser", service_status.NotFound)
      }
      return user, nil
}

// DAO
func (userDAO UserDAO) GetUser(id string) (User, error) {
      var user User
      row, err := userDAO.DB.Query(FindUserByIdSQLQuery).Row()
      if err != nil {
            return recordMetricAndDAOReturn("UserDAO.GetUser", dao_status.NotFound)
      }
      err = row.Scan(&user.Id, &user.FirstName, &user.FamilyName, &user.CreateDate, &user.LastModifiedDate)
      if err != nil {
            return recordMetricAndDAOReturn("UserDAO.GetUser", dao_status.DBError)
      }
      return user, nil
}

Does this code look too simple? Mission accomplished!

If this seems like a roundabout way to write code, it is! But it’s an exercise that forces you to think more clearly about what the purpose of a layer is. If you need to add features to a resource, service, or DAO - and you will! - it’s easier to see the contracts with the other layers. If you find yourself writing code that produces a new type of failure that you didn’t have before, odds are you’re writing code in the wrong place! Define a layer’s purpose, and you’ll define the boundaries where it breaks, and that’s where you’ll record metrics.

2018-04-20