Of Routes and Resources

Understanding URLs and web resources in yada

by Malcolm Sparks

Published 2017-03-20

This post is about one of libraries in JUXT's Clojure stack. If you'd like to receive training in our libraries from the authors, we run a dedicated 'full-stack' training courses. The next one is on 7th of September, 2017.

Routes

In 1992, the Uniform Resource Locator (URL) was invented, building on foundations that had already been laid.

The Internet Protocol (IP) established the means of addressing hosts, while the Domain Name System (DNS) established a human-friendly federated registry of these addresses.

Upon these foundations the URL added the means to address individual pieces of information, such as text documents, images and other data.

Just as the international postal address system broke down barriers to postal communication, the URL broke down barriers to worldwide information access by computers and, by extension, their users.

What differentiated the URL from previous addressing notations was that URLs could be written in a practically short string of characters, short enough to be used within text as links and often short enough to be remembered. This became the critical property as it allowed the creation of hypermedia systems, documents that can link to other documents and media.

Combining this system with an application to navigate these links and display the documents and media retrieved from them, was the key piece that ushered in the World-Wide Web we know today.

Requests

Once we have the means of addressing resources with URLs, we can send requests to them. The protocol is declared at the start of the URL itself, though HTTP (and HTTPS) are commonly used.

With HTTP, a URL doesn't provide sufficient information to create a request. A request also contains other details which give meaning to the request:

  • The request's method
  • The representations the requestor is capable of accepting
  • Security credentials
  • Identification

Resources

The target of an HTTP request is called a "resource" … Each resource is identified by a Uniform Resource Identifier (URI)

RFC 7231, section 2

A resource is an individual entity on the web, addressed with a URL.

But what is a resource really? Resources are merely some state combined with some properties, such as age, version, classification, confidentiality and representations.

Identification versus Semantics

... One design goal of HTTP is to separate resource identification from request semantics

RFC 7231, section 2

The URL identifies resources, rather than determining how a resource should respond to a request (semantics).

When it comes to designing web libraries and frameworks, I think it is important to respect this separation between identification and semantics.

Many web libraries choose to blur the lines between the URL and the request.

For example, in Clojure, developers using the popular Compojure routing library commonly use the request's method as a determinant for reaching the handler:

(ns hello-world.core
  (:require [compojure.core :refer :all]
            [compojure.route :as route]))

(defroutes app
  (GET "/" [] "<h1>Hello World</h1>")
  (route/not-found "<h1>Page not found</h1>"))

This design steers us towards thinking about the web as a set of operations rather than a set of resources.

I feel this operational view of the web is somewhat a relic of the way we thought about network applications before the web arrived. Many of those who were experienced in the distributed computing of 'remote' procedure calls, 'distributed' objects, CORBA or SOAP, saw the web as yet another canvas on which to project the same distributed applications.

The problem with operations is that we have to define, a-priori, the semantics of each individual operation. We cannot tell, unless we know beforehand, whether the operation is safe to call multiple times, whether the result of an operation will be the same every time we call it, whether the result can be cached, and if so for how long. Many years of building distributed applications have told us that these questions are critical for scaleability. The insight of Roy Fielding was to reduce the types of operation that could be performed on the web to a minimal set with universally established semantics.

The web is a huge, surviving, growing proof that the architecture on which it is built is sound and scales. These principles are not academic but at the very core of how the web works today: reverse proxies, caches, CDNs all contribute to make the user experience of the web what it is today.

Restoring cohesion for web resources

If we are to fully exploit the advantages of the web for our own applications, it's necessary to re-align our coding models to its architecture.

One change is to model a web resource as a single cohesive stateful entity, with multiple methods.

Wikipedia defines cohesion like this:

In computer programming, cohesion refers to the degree to which the elements of a module belong together –

There are numerous reasons to restore cohesion to the concept of a resource in our web applications.

One reason is that a resource's state can change with a POST or a PUT method. These methods obviously have a consequential effect on any subsequent GET requests. For one thing the Last-Modified and/or ETag response headers will be different, besides the response body.

If we break apart a resource into separate operations, handled by separate parts of our appliciation, we must either code for this explicitly (and encode implicit tight couplings across the code-base), or simply ignore the job of producing reliable response headers in the first place (as many programmers do). Either our code hygiene suffers, or the web does.

That's why I consider it harmful to code web applications as a set of operations and why we need to bring back cohesion to our models of resources in our programs. That's the reason yada models resources as records, rather than the Ring approach of modelling web handlers as functions. Functions are a great building block for writing programs, but on their own are an inadequate approach for coding web resources.

Routing

In RFC 1738, URLs consist of the protocol (scheme), optional auth, host, port and url-path.

It is up to the application how to treat the url-path of a URL, and whether or not to encode any information in it.

For the practical purposes of developing applications, it often helps to encode some kind of hierarchical structure into a URL's url-path, particularly when the application has to serve a number of URLs. This helps with good coding practices of writing modular code comprised of small units of functionality.

There is then the task of parsing a request's URL path and handing off to the handler (the code tasked with processing the request and creating the response). We often call this task 'routing'.

Many web libraries, seek to make this task easy and convenient by letting the developer specify segments of the URL's path in their routing code.

Let's see this in Spring (from https://spring.io/guides/gs/rest-service/)

@RestController
public class GreetingController {

    private static final String template = "Hello, %s!";
    private final AtomicLong counter = new AtomicLong();

    @RequestMapping("/greeting")
    public Greeting greeting(@RequestParam(value="name", defaultValue="World") String name) {
        return new Greeting(counter.incrementAndGet(),
                            String.format(template, name));
    }
}

The "/greeting" path is declared in an annotation on the handler. It's not all bad, because we can process annotations in Java back into a single data structure, but we're stuck with making the controller classes the authoritative source of the URL structure rather than a more function independent data structure.

Elixir's Phoenix framework takes a similar approach and encourages the path of a URI to be entangled with the code that will serve:

scope "/admin", as: :admin do
  resources "/reviews", HelloPhoenix.Admin.ReviewController
end

But the problem is, once the URL paths have been deconstructed and embedded into code, they are difficult to change (an implicit commitment is made to keep the path hierarchy fixed).

Another important concern is that the path hierarchy information cannot easily be used by other parts of the application for other purposes.

Instead, if we can first define one data structure for our all the paths in our URI hierarchy, we can utilize the data in a host of different ways.

While one of these might fulfill the classic role of routing to a code handler, another might walk the routing structure and create a report, another might use it to form URLs (reverse routing), another might use it to generate a Swagger spec, yet another might use it to post-process resources to add security.

It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures.

Alan Perlis

Reciprocal routing

Reciprocal routing is where the program can use the data structure to create URLs to other resources it might want to embed in the response.

This is particularly useful in the creation of hyperlinks in HTML responses and when developing Hypermedia APIs which use URLs as links between resources, allowing machines to navigate the web in the same way as humans do with browsers.

<!-- Human readable link -->
<a href="/foo/zip">Here is a link</a>

<!-- Machine readable link -->
<link title="RSS feed" href="https://juxt.pro/blog/rss.xml"/>

bidi

Let's finish up with a discussion of JUXT's bidi library, which provides routing, reciprocal routing, and other functions on one data structure, a simple tree.

For example:

["/foo" [["/bar" :bar] ["/zip" :zip]]]

The rules for creating this tree structure are fairly easy to remember, once you understand them:

  • A route is a pair.
  • The first item of the pair is a pattern to match against the URL's path.
  • The second item of the pair can be a terminal (such as a code handler, or in this case a simple keyword).
  • The second item can also be a collection of route structures, recursively.

The route structure's design is basic so can sometimes be cumbersome to author by hand (for this, bidi has a verbose author-friendly mode designed by another long-time member of the JUXT crew, Stathis Sideris).

Yet the basic nature of bidi's structure is deliberate as it is designed to simplify the task of writing functions that operate over it.

bidi comes with a function to take a URL and return a target:

(resolve-handler routes "/foo/bar") => :bar

Or reciprocally, to take a target (and some parameters if necessary), and return a URL.

(unresolve-handler routes :bar) => "/foo/bar"

With yada, we can identify resources with the :id entry, and use yada's href-for and related functions to form URIs to those resources.

{:id :bar
 :methods
  {:get
    {:response
      (fn [ctx]
        (hiccup
          [:a {:href (yada/href-for :zip)} "Go to zip!"])}}}}

With another function, bidi can also walk a route structure and extract all the possible routes into a sequence (route-seq).

With bidi, we can define our URL structures in data, avoiding any entanglement with request semantics. At the same time we can use yada to create resources that model the web properly, creating hyperlinks between resources in our web applications and APIs.

Conclusion

bidi and yada continue to evolve as separate libraries in order to maintain the separation in HTTP of resource identification from request semantics.

As is so often the case, when we keep things separate, things stay simple.

(If you'd like to learn more about bidi and yada, why not book a place on our training course next week!)

submit to reddit