Package puri

PURI - Portable URI Library - is a portable Universal Resource Identifier library for Common Lisp programs.

About This Package

General
Supported Platforms
Overview
Differences between PURI and NET.URI
The URI API definition
Parsing, escape decoding/encoding and the path
Interning URIs
Implementation notes
Examples

General

Authors
Franz, Inc <http://www.franz.com>
Kevin Rosenberg

Version
The documentation is for version 1.5.5.

Homepage
http://puri.b9.com/

Mailing List
No mailing list available.

Download
puri-1.5.5.tar.gz or puri-1.5.5.zip

Source Code
Browse PURI Download Site http://files.b9.com/puri/
Git Repository http://git.b9.com/?p=puri.git;a=summary

Documentation
This documentation is generated with a fork of the project ATDOC. The documentation contains the content from the official homepage at http://puri.b9.com/ and the references from this homepage. In particular, Franz's unmodified documentation is included in the documentation. Franz's documentation is included in the file uri.html and part of the distribution.

License
PURI is free software licensed under the LLGPL.

Dependencies
PURI does not depend on other libraries.

Supported Platforms

  • AllegroCL
  • CLISP
  • CMUCL
  • Lispworks
  • OpenMCL
  • SBCL

Overview

This is portable Universal Resource Identifier library for Common Lisp programs. It parses URI according to the RFC 2396 specification. It is based on Franz, Inc's opensource URI package and has been ported to work on other Common Lisp implementations. It is licensed under the LLGPL which is included in the distribution.

URIs are a superset in functionality and syntax to URLs (Universal Resource Locators) and URNs (Universal Resource Names). That is, RFC2396 updates and merges RFC1738 and RFC1808 into a single syntax, called the URI. It does exclude some portions of RFC1738 that define specific syntax of individual URL schemes.

In URL slang, the scheme is usually called the "protocol", but it is called scheme in RFC1738. A URL "host" corresponds to the URI "authority". The URL slang "bookmark" or "anchor" is "fragment" in URI lingo.

Broadly, the URI facility creates a Lisp object that represents a URI, and provides setters and accessors to fields in the URI object. The URI object can also be interned, much like symbols in Common Lisp are. This document describes the facility and the related operators.

Aside from the obvious slots which are called out in the RFC, URIs also have a property list. With interning, this is another similarity between URIs and Common Lisp symbols.

A regression suite is included which uses Franz's open-source tester library. This library is ported for use on other Common Lisp implementations. PURI completes 126/126 regression tests successfully.

Franz's unmodified documentation file is included in the file uri.html.

Differences between PURI and NET.URI

  • PURI uses the package PURI while NET.URI uses the package NET.URI
  • To signal an error parsing a URI, PURI uses the condition :uri-parse-error while NET.URI uses the condition :parse-error. This divergence occurs because Franz's parse-error condition uses :format-control and :format-arguments slots which are not in the ANSI specification for the parse-error condition.

The URI API definition

Symbols naming objects (functions, variables, etc.) in the uri module are exported from the PURI package.

URIs are represented by CLOS objects uri. Their slots are:
  • scheme
  • host
  • port
  • path
  • query
  • fragment
  • plist
The host and port slots together correspond to the authority (see RFC2396). There is an accessor-like function, uri-authority, that can be used to extract the authority from a URI. See the RFC2396 specifications pointed to at the beginning of the 1.0 Introduction for details of all the slots except plist. The plist slot contains a standard Common Lisp property list.

All symbols are external in the PURI package, unless otherwise noted. Brief descriptions are given in this document, with complete descriptions in the individual pages.

The class of URI objects.

This accessor function returns the value of the associated slot of the uri object.

This accessor function returns the value of the associated slot of the uri object.

This accessor function returns the value of the associated slot of the uri object.

This accessor function returns the value of the associated slot of the uri object.

This accessor function returns the value of the associated slot of the uri object.

This accessor function returns the value of the associated slot of the uri object.

This accessor function returns the value of the associated slot of the uri object.

Returns the authority of uri object. The authority combines the host and port slots.

Defined methods: if the argument thing is a uri object, return it; create a uri object if possible and return it, or signal an error if not possible.

Returns true if thing is an instance of class uri.

Copies the specified uri object. See the description page for information on the keyword arguments.

Print to stream the printed representation of uri.

Parse the string thing into a URI uri object.

Return an absolute URI, based on uri, which can be relative, and base which must be absolute.

Converts uri into a relative URI using base as the base URI.

Returns the parsed representation of the path of uri.

The class of URN objects.

This accessor function returns the value of the associated slot of the urn object.

This accessor function returns the value of the associated slot of the urn object.

Parsing, escape decoding/encoding and the path

The method uri-path returns the path portion of the URI, in string form. The method uri-parsed-path returns the path portion of the URI, in list form. This list form is discussed below, after a discussion of decoding/encoding.

RFC2396 lays out a method for inserting into URIs reserved characters. You do this by escaping the character. An escaped character is defined like this:
escaped = "%" hex hex
hex = digit | "A" | "B" | "C" | "D" | "E" | "F" | "a" | "b" | "c" | "d" | "e" | "f"    
In addition, the RFC defines excluded characters:
"<" | ">" | "#" | "%" | <"> | "{" | "}" | "|" | "\" | "^" | "[" | "]" | "`"    
The set of reserved characters are:
";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | ","     
with the following exceptions:
  • within the authority component, the characters ";", ":", "@", "?", and "/" are reserved.
  • within a path segment, the characters "/", ";", "=", and "?" are reserved.
  • within a query component, the characters ";", "/", "?", ":", "@", "&", "=", "+", ",", and "$" are reserved.
From the RFC, there are two important rules about escaping and unescaping (encoding and decoding):
  • decoding should only happen when the URI is parsed into component parts;
  • encoding can only occur when a URI is made from component parts (ie, rendered for printing).
The implication of this is that to decode the URI, it must be in a parsed state. That is, you can't convert %2f (the escaped form of "/") until the path has been parsed into its component parts. Another important desire is for the application viewing the component parts to see the decoded values of the components. For example, consider:
http://www.franz.com/calculator/3%2f2    
This might be the implementation of a calculator, and how someone would execute 3/2. Clearly, the application that implements this would want to see path components of "calculator" and "3/2". "3%2f2" would not be useful to the calculator application.

For the reasons given above, a parsed version of the path is available and has the following form:
([:absolute | :relative] component1 [component2...])    
where components are:
element | (element param1 [param2 ...])     
and element is a path element, and the param's are path element parameters. For example, the result of
(uri-parsed-path (parse-uri "foo;10/bar:x;y;z/baz.htm"))     
is
(:relative ("foo" "10") ("bar:x" "y" "z") "baz.htm")     
There is a certain amount of canonicalization that occurs when parsing:
  • A path of (:absolute) or (:absolute "") is equivalent to a nil path. That is, http://a/ is parsed with a nil path and printed as http://a.
  • Escaped characters that are not reserved are not escaped upon printing. For example, "foob%61r" is parsed into "foobar" and appears as "foobar" when the URI is printed.

Interning URIs

This section describes how to intern URIs. Interning is not mandatory. URIs can be used perfectly well without interning them.

Interned URIs are like symbols. That is, a string representing a URI, when parsed and interned, will always yield an eq object. For example:
(eq (intern-uri "http://www.franz.com")
    (intern-uri "http://www.franz.com"))    
is always true. (Two strings with identical contents may or may not be eq in Common Lisp, note.)

The functions associated with interning are:

Make a new hash-table object to contain interned URIs.

Returns the object into which URIs are currently being interned.

Returns true if uri1 and uri2 are equivalent.

Intern the xuri object specified in the uri-space specified. Methods exist for strings and uri objects.

Unintern the uri object specified or all URI objects (in uri-space if specified) if uri is t.

Bind var to all currently defined URIs (in uri-space if specified) and evaluate forms.

Implementation notes

  • The following are true:
    (uri= (parse-uri "http://www.franz.com/")
          (parse-uri "http://www.franz.com"))

    (eq (intern-uri "http://www.franz.com/") (intern-uri "http://www.franz.com"))
  • The following is true:
    (eq (intern-uri "http://www.franz.com:80/foo/bar.htm")
        (intern-uri "http://www.franz.com/foo/bar.htm"))        
    (I.e. specifying the default port is the same as specifying no port at all. This is specific in RFC2396.)
  • The scheme and authority are case-insensitive. In Common Lisp, the scheme is a keyword that appears in the normal case for the Lisp in which you are executing.
  • #u"..." is shorthand for (parse-uri "...") but if an existing #u dispatch macro definition exists, it will not be overridden.
  • The interaction between setting the scheme, host, port, path, query, and fragment slots of URI objects, in conjunction with interning URIs will have very bad and unpredictable results.
  • The printable representation of URIs is cached, for efficiency. This caching is undone when the above slots are changed. That is, when you create a URI the printed representation is cached. When you change one of the above mentioned slots, the printed representation is cleared and calculated when the URI is next printed. For example:
      user(10): (setq u #u"http://foo.bar.com/foo/bar") 
      #<uri http://foo.bar.com/foo/bar> 
      user(11): (setf (net.uri:uri-host u) "foo.com") 
      "foo.com" 
      user(12): u 
      #<uri http://foo.com/foo/bar> 
      user(13):         
    This allows URIs behavior to follow the principle of least surprise.

Examples

  uri(10): (use-package :net.uri)
  t
  uri(11): (parse-uri "foo")
  #<uri foo>
  uri(12): #u"foo"
  #<uri foo>
  uri(13): (setq base (intern-uri "http://www.franz.com/foo/bar/"))
  #<uri http://www.franz.com/foo/bar/>
  uri(14): (merge-uris (parse-uri "foo.htm") base)
  #<uri http://www.franz.com/foo/bar/foo.htm>
  uri(15): (merge-uris (parse-uri "?foo") base)
  #<uri http://www.franz.com/foo/bar/?foo>
  uri(16): (setq base (intern-uri "http://www.franz.com/foo/bar/baz.htm"))
  #<uri http://www.franz.com/foo/bar/baz.htm>
  uri(17): (merge-uris (parse-uri "foo.htm") base)
  #<uri http://www.franz.com/foo/bar/foo.htm>
  uri(18): (merge-uris #u"?foo" base)
  #<uri http://www.franz.com/foo/bar/?foo>
  uri(19): (describe #u"http://www.franz.com")
  #<uri http://www.franz.com> is an instance of #<standard-class net.uri:uri>:
   The following slots have :instance allocation:
    scheme        :http
    host          "www.franz.com"
    port          nil
    path          nil
    query         nil
    fragment      nil
    plist         nil
    escaped       nil
    string        "http://www.franz.com"
    parsed-path   nil
    hashcode      nil
  uri(20): (describe #u"http://www.franz.com/")
  #<uri http://www.franz.com> is an instance of #<standard-class net.uri:uri>:
   The following slots have :instance allocation:
    scheme        :http
    host          "www.franz.com"
    port          nil
    path          nil
    query         nil
    fragment      nil
    plist         nil
    escaped       nil
    string        "http://www.franz.com"
    parsed-path   nil
    hashcode      nil
  uri(21): #u"foobar#baz%23xxx"
  #<uri foobar#baz#xxx>    

Exported Symbol Index

*strict-parse*, Variable  (undocumented)
copy-uri, Function
do-all-uris, Macro
enough-uri, Generic Function
intern-uri, Generic Function
make-uri-space, Function
merge-uris, Generic Function
parse-uri, Function
render-uri, Function
unintern-uri, Function
uri, Generic Function
uri, Class
uri-authority, Function
uri-fragment, Generic Function
uri-host, Generic Function
uri-p, Generic Function
uri-parse-error, Condition  (undocumented)
uri-parsed-path, Generic Function
uri-path, Generic Function
uri-plist, Generic Function
uri-port, Generic Function
uri-query, Generic Function
uri-scheme, Generic Function
uri-space, Function
uri=, Generic Function
urn, Class
urn-nid, Generic Function
urn-nss, Generic Function