Skip to content

Latest commit

 

History

History
210 lines (158 loc) · 8.45 KB

DESIGN.md

File metadata and controls

210 lines (158 loc) · 8.45 KB

Errata

Background

Users interact with software via interfaces; these interfaces can be graphical, network-based, function calls, etc. When we design software systems, we not only need to make the software work but also handle error conditions that are expected, and even unexpected. The clarity and completeness of error handling in a system is what can delight and enable a user, or deeply frustrate or impede them if implemented poorly.

Users don't expect software to work all the time, and when it breaks they need to be communicated with in a way that explains what went wrong and what they can do about it.


Let's consider an example:

We have an eCommerce store for musical instruments. A user, Anathi (based in South Africa and speaks isiXhosa), would like to buy a pair of drumsticks. In this store's database the particular drumsticks Anathi is interested in was input incorrectly, which is causing the "add to cart" action to fail.

Here's how a typical interaction occurs in modern software:

  1. Anathi clicks "Add to cart" on her chosen product
  2. the system generates a NullPointerException due to a null value for this product's shipping_notes field
  3. the exception is caught, and the API server responds with a 500 Internal Server Error error
  4. Anathi sees the following message:
500 Internal Server Error

java.lang.NullPointerException: Cannot read field "shipping" because "this.product.notes" is null
  at com.store.product.Cart.addItem(Cart.java:37)
  1. Anathi has no idea what this means, and calls Customer Support
  2. Customer Support is also unsure what this error means, and contacts one of the engineers
  3. the engineer cannot track down this particular error in the logs, and gives up
  4. Customer Support informs Anathi that there's nothing they can do and apologises
  5. Anathi is frustrated, and goes to another online store for the sticks

Anathi really wanted this product, and we don't want to lose the sale; our values are completely aligned. Error handling can make or break a great user experience.

As software engineers, we can and should do better than this for our users and business.

Solution

errata's goal is to make errors easier to define, raise, handle, and most importantly understand.

Let's take the scenario defined above in the Background, and see how we might improve it:

  1. Anathi clicks "Add to cart" on her chosen product
  2. the system generates a NullPointerException due to a null value for this product's shipping_notes field
  3. the exception is caught, and a new exception AddToCartException is thrown with this product's SKU and wraps the original exception
  4. the exception is logged, along with a UUID reference "beafdad1-3770-48a1-bba7-887126ccf504":
com.store.AddToCartException: Failed to add product "drumstick-1": ref "beafdad1-3770-48a1-bba7-887126ccf504"
    at Program.main(Program.java:9)
Caused by: java.lang.NullPointerException: Cannot read field "shipping" because "this.product.notes" is null
    at com.store.product.Cart.addItem(Cart.java:37)
  1. the API server responds with a 500 Internal Server Error error, and a friendly error message in the locale of her browser
  2. Anathi sees the following message:
Code: add_to_cart_error (500 Internal Server Error)
Reference: beafdad1-3770-48a1-bba7-887126ccf504

Message (English):
There was a problem adding your item to your cart
Please contact Customer Support and provide this error code: "beafdad1-3770-48a1-bba7-887126ccf504"

Message (isiXhosa):
Bekukho ingxaki ekufakeni into yakho kwinqwelo yakho
Nceda uqhagamshelane neNkxaso yoMthengi kwaye unikeze le khowudi yempazamo: "beafdad1-3770-48a1-bba7-887126ccf504"
  1. Anathi contacts Customer Support with the error code
  2. Customer Support looks up this error using the errata Web UI, and determines that they need to hand this off to an engineer
  3. an engineer locates the error log, and knows exactly which product has an issue
  4. an engineer fixes the product and notifies Customer Support
  5. Customer Support notifies Anathi, who is then able to buy the product

We saved the sale and delighted the customer! Errors are inevitable; providing a clear means to solving these errors is crucial in the software we build.

Let's dive deeper into the core ideas described above, as well as some additional features of errata:

  • error definition and usage
  • error handling
  • error tracking
  • internalisation (i18n)
  • search & self-service

Error Definition & Usage

errata makes use of a YAML file with an application's list of various errors defined. The format is very straightforward, and makes very few assumptions about or demands your application.

errata comes with a CLI tool called eish (errata interactive shell, pronounced "eɪʃ") which generates code based on the error definitions.

There is a sample application available which demonstrates this.

The errors are defined in errata.yml file, and here is what's generated by eish. Let's look at a simple example, the definition of invalid_email:

...
errors:
  ...
  invalid_email:
    message: Please provide a valid email address
    cause: Given email address is invalid
    categories: [ login ]
    labels:
      http_response_code: 400
      shell_exit_code: 1

The code that gets generated is:

...

const (
	...
	ErrInvalidEmail        = "invalid_email"
	...
)

var list = map[string]Error{
	...
	
	ErrInvalidEmail: {
		Code:       ErrInvalidEmail,
		Message:    "Please provide a valid email address",
		Cause:      "Given email address is invalid",
		Solution:   "",
		Categories: []string{"login"},
		Labels: map[string]string{
			"http_response_code": "400",
			"shell_exit_code":    "1",
		},

		translations: map[string]Error{},
	},
	
	...
}

func NewInvalidEmailErr(wrapped error) Error {
	return NewFromCode(ErrInvalidEmail, wrapped)
}

...

Error Handling

When an application error occurs, a lot of valuable context is generated (file, line number, stack trace, etc) which should not be discarded as it's essential for debugging. All errata errors allow for the original error to be "wrapped", providing access to it for logging or inspection.

Here's an example from the sample application sample/http/server.go:

	r, err := json.Marshal(&s)
	if err != nil {
		return "", errata.NewResponseFormattingErr(err)
	}

Here is the corresponding definition of the error:

  response_formatting:
    message: Failed to format response body
    categories: [ internal ]
    labels:
      http_response_code: 500

Notice the http_response_code label? We can add arbitrary values in the labels field, and use them in our application.

Error Tracking

Errors may need to be tracked for effective debugging. In a large multi-tenant application, it may become difficult or even impractical to determine the cause for an error without a unique code associated to each error occurrence.

The generated code contains a UUID() function on the Error type, which can be used to log with the error itself for tracking purposes.

Internationalisation (i18n)

Applications that need to service many languages and cultures will degrade their UX by only displaying errors in a single language. errata supports i18n by allowing you to configure per-locale overrides of the message, cause and solution fields of error definitions.

Search & Self-service

errata provides a Web UI which can be used to search errors and explore additional details of each error.

This is crucially important for users' self-service, and many of the world's largest APIs already provide this:

See also