Ben Clifford Technical Blog: August 2010

29 August, 2010

nominal, ordinal, interval, ratio

Decision by Objectives (DBO) classes applications of numbers into a hierarchy of four different types, each building on the properties of the previous one.

The concepts are not unfamiliar at all to a mathematician, but they are approaching from a management direction rather than from pure mathematics, and I think that is an interesting different perspective.

The first application of numbers is nominal. In this application, numbers are used to name things. DBO gives examples from horse racing: The number on a jockey or a horse is a nominal use. It identifies a particular jockey or horse; but it is meaningless to (for example) add two jockey numbers together, or divide one by another. In Maths, that means it is possible to permute the numbers in any way without losing any information. The numbers, ℕ, are used purely as a set of distinct objects without structure or relation. We could equally well use any other sufficiently large set of symbols (such as letters).

The next application of numbers is ordinal. In this application, numbers are used to order things. In the horse racing example, the finishing order of the horses is an ordinal application. Some horse finishes first, another finishes second, yet another finishes third. In Maths, there is a linear order between the numbers, and we have a relation < or ≤ that makes sense. We can compare numbers in this application, with a notion of bigger or faster or stronger). But it still doesn't make sense to (for example) add or subtract those numbers, or to make statements like "horse X finished 3 places ahead of horse y".

The third application of numbers is interval. It makes sense to subtract interval numbers from each other. The authors are a little vague in DSO about whether they think it makes sense to add interval numbers. I think in some applications it does (how much chocolate did you eat? how much chocolate did I eat? we can compute: how much chocolate did we eat together?) but in some it does not (what is the time now? what time did the meeting start? we can compute: how long has the meeting been going on (subtraction), but adding the time now and the meeting start time doesn't make sense). When it does make sense to add, the numbers form a group with operation +. When it only makes sense to take the difference, then the numbers form something like a torsor - a group without identity.

The final application is ratio. These is use of numbers where ratios make sense - for example, a horse might finish in half the time of another, meaning that the ratio of the time of the first horse to the second horse is 1/2. It makes sense to multiply and divide these numbers; and ratios can also be used to scale interval numbers. Mathematically, we have a (meaningful use of a) field, ℝ or ℚ.

The point of them making these distinctions in DSO was to point out that just because you have some information in your decision making represented as numbers, it doesn't always make sense to do things like taking the mean, or summing, or whatever.

22 August, 2010

rsync --fake-super

Every now and then I discover new options on old utilities.

One I am very happy to have discovered in rsync is the --fake-super option.

Scenario:

I have machine A. I want to back up (some portion of) the file system onto machine B. I want to include permissions and ownership (for example, because I am backing up /home.

I can run rsync on machine A as root from a cron job. OK. But then (traditionally) it needs root access to machine B in order to set permissions and ownersip of the files it creates. I can't have it connect to machine B as some normal user because of that. Additionally, the user-id and group-id name/number spaces on both machines need to match up somewhat so that users on machine B don't get access to files they shouldn't have access to.

--fake-super changes that. When rsync tries to change the permission or ownership of a file and finds that it cannot do that, it instead stores that information in extended attributes on that file. So now access to machine B can be through some normal user account without special privileges.

A downside is that if some user has an account on both sides, they don't get full privilege access to the backups of their own files.

Another use I found for this is on my laptop under OS X, where one of my external hard-drives is now mounted with an option to prevent user and group IDs being changed, or makes them ignored somehow (presumably for a better experience when using the hard-drive on multiple machines).
Incremental rsync backups were experience an inability to change group ownership on files, which mean that instead of being hard-linked (using --link-dest) they were being copied afresh each time. This was fixed by --fake-super too - instead of changing ownership on the external HD filesystem, they're added to the extended attributes.

18 August, 2010

fractions and proof-irrelevance IV - forgetting proofs

Continuing from previous posts on implementing fractions in Agda.

We've tried to force our representations of fractions to have a canonical form, but requiring the inclusion of a proof of that canonical form breaks canonicity.

So we need a different approach.

Data.Rational does it something like this, which I found hard to understand at first. Trying to understand isCoprime here was the motivation for this sequence of posts:

record ℚ : Set where
  field
    numerator     : ℤ
    denominator-1 : ℕ
    isCoprime     : True (coprime? ( ∣ numerator ∣ ) (suc denominator-1))

isComprime is different. Rather than directly expressing a proposition (Coprime numerator (suc denominator-1)) as the type of isCoprime, we instead use a slightly different proposition (coprime?) and wrap it in True.

coprime? has type Decidable Coprime. It is a decidable version of the coprime proposition. A normal proposition type has values that are proofs of that proposition. If I give you a proof, then you know the proposition is true. However, if I don't give you a proof, you don't know that the proposition is false. Perhaps its true but I haven't given you a proof. A decidable proposition type has values that are either proofs that the proposition is true, or proofs that the proposition is false, wrapped in yes or no respectively. coprime? is a function that decide whether two numbers are coprime or not, and return a corresponding proof.

True is a function that maps a decidable proposition type into another type: either ⊤ (when the proposition is true) or ⊥ (when the proposition is false).

⊤ (top) is the unit type, that has only one member value. ⊥ (bottom) is the empty type with no member values at all.

Both of those seem a bit useless to a traditional programmer: if I have a variable of the unit type ⊤, it can only have one value. So then I'll always know what that value is - its pretty much a constant. So why have a variable? And if I have a variable of the empty type ⊥ then I can't ever invent a value. If that variable is an input parameter, I can never even call my program, because I can't give input parameters. But it turns out this weird behaviour is exactly right for Data.Rational.

Say I'm going to construct a half. I'm going to pass numerator = 1 and denominator = 2. We also need to pass an isCoprime parameter.

To start with, what type of parameter is that?

That depends on the value of numerator and denominator (that's what dependent types are). Once we know the numerator and denominator, we can evaluate the expression True (coprime? ( ∣ numerator ∣ ) (suc denominator-1)) to find out the type of isCoprime. 1 and 2 are coprime, so coprime? evaluates to yes _, and so True (yes _) evaluates to ⊤.

So we need to give a value for coprime that is of type ⊤. ⊤ only has one value, so thats the value that we'll put in isCoprime. The resulting structure has no proof of coprimality in it. If two fractions have the same numerator and denominator, then they'll always have the same value of isCoprime. So we've got equality of equivalent fractions!

But now what happens if we try to construct a non-canonical fraction, for example 3/6 ?
Once we have specified a numerator of 3 and a denominator of 6, we can figure out the type of isCoprime. 3 and 6 are not coprime, and True (no _) will return ⊥.
So we need to provide a value of type ⊥. There are no values in ⊥. So there is no way we can call the constructor in this case, because we can't come up with a value for each parameter.

So now the goal is achieved - a data structure for representing rationals, where equal rationals are definitionally equal.

My goal in picking apart Data.Rational like this was two-fold: i) I wanted to use rationals, and wanted to understand what was going on; ii) I want to implement my own canonical forms (for finitely-generated abelian groups) and understanding how its done for rationals seems like a step in the right direction.

fractions and proof-irrelevance III - proofs are too isomorphic to values

Continuing from my earlier post about fractions and proof-irrelevance...

In the previous post, we added a parameter to the rational constructor to require the numerator and denominator of rationals to be coprime, in order that we can only construct canonical rationals, so that we get equality that behaves "right" - when two fractions are the same, they will be equal in the eyes of Agda.

But the way we did it doesn't work quite right:

The Curry-Howard correspondence connects propositions and proofs on one side, and datatypes/sets and values/elements of sets on the other side:

Side A	Side B
	Programmer view	Maths view
Proposition	Data type	Set
Proof	Value	Element

When we added the coprimality test to the rational constructor, we added a parameter of type Coprime numerator denominator. That corresponds to the proposition that the numerator and denominator are coprime. Its a parameter so it will have a value when it is used to construct some rational, and that value will be a proof that the numerator and denominator are co-prime.

So now think about constructing a half. We'll put in numerator 1, and denominator 2, and some proof, pr1, that {1,2} are co-prime.

Now somebody else builds a half. They'll put in numerator 1, and denominator 2, and some proof, pr2, that {1,2} are co-prime.

Now I'm going to compare my half to that other person's half to see if they are equal. They're equal if all the constructor parameters are equal. The numerator, 1, matches. The denominator, 2, matches. But we have a third parameter now, the coprimality proof. Do they match? Does pr1 = pr2 ? In general, no, that isn't the case. There could be many different proofs of the same proposition.

So we've ended up with a canonical form for the numerator and denominator, but we've just pushed the problem into a different field of the data structure, and we still don't have a canonical data structure.

Somehow we want to insist that the coprime proposition is true, but without different proofs of that causing different representations of fractions. That is proof irrelevance.

Read on for part IV, the final part

fractions and proof-irrelevance II - equality of fractions

This continues from the first in a series of posts reviewing the implementation of Data.Rational in Agda.

We've got a datatype for representing fractions, ensuring that the denominator is strictly positive. So now any element of the mathematical set ℚ can be represented by an element in our datatype ℚ; and vice-versa any element of our datatype ℚ represents an element of the mathematical set ℚ.

Sometimes, we want to be able to compare fractions to see if they are the same or not. It's fairly straightforward to define an equivalence relation between two fractions, using integer multiplication, something like this:

p ≃ q  =  P.numerator ℤ* Q.denominator ≡ Q.numerator ℤ* P.denominator

But sometimes we want to use equivalence where we don't get to specify the equivalence relation. One example of that is type-checking: say we want to use a fraction (for some purpose) as part of a type specification. In that case, we need equivalent fractions to be definitionally equal: they must have been constructed with the same constructor, and the parameters to the constructor must be the same.

The numerator, denominator representation described in the previous post breaks down here: we can represent a half as numerator = 1, denominator = 2; but equally well as numerator = 3, denominator = 6. Yet, 1 ≠ 3 and 2 ≠ 6 so the two representations are not definitionally equal.

One way of solving this is by forcing the representations to be in some canonical form. For fractions, one such form is when the numerator and denominator are co-prime so that there are no shared factors. This way, 1/2 is the canonical representation of a half. 3/6 = (3*1) / (3*2) is not, because 3 is a factor of both the numerator and denominator.

So maybe we could demand that when you call the constructor for a fraction, you have to prove that the numerator and denominator are co-prime, by defining something like this:

record ℚ : Set where
  field
    numerator     : ℤ
    denominator-1 : ℕ
    isCoprime     : Coprime ( | numerator | ) (suc denominator-1)

That way, whenever we construct an element of ℚ, we'll know that the numerator and denominator are co-prime, because we give a proof of the proposition "Coprime numerator denominator" (see that we would add one onto denominator-1 to get the denominator).

Read on for part III...

fractions and proof-irrelevance I - sane denominators

This begins a series of posts reviewing the implementation of rationals in Agda.

I'll talk about broken ways of building the rationals ℚ in Agda, how its desirable to be able to compare rationals for equality, how having proofs in the language can get in the way of that, and how to cope with that.

This is all based on the code I found in the Data.Rational standard library module, which I have been trying to understand as I want to use both the module itself, and its techniques, in a scientific units-of-measurement library that I am writing.

Rationals

A rational (or a fraction) can be described as an integer (the numerator) divided by a strictly positive integer (the denominator).

It's not hard to build an Agda data type to represent a fraction as a pair of two integers:

data ℚ : Set where
    field
      numerator : ℤ
      denominator : ℤ

This implemenation doesn't require the denominator to be a positive integer, though. So we can represent "invalid" fractions. Most importantly, we can represent a fraction with denominator 0 this way, which is nonsensical.

So lets store the denominator minus 1 as a natural number (0, 1, 2, ...) like this:

record ℚ : Set where
  field
    numerator     : ℤ
    denominator-1 : ℕ

The denominator minus 1 can be 0 or higher, which means the denominator can be 1 or higher - a positive integer as desired.

On the downside, now every time we want to compute with the denominator, we have to compute it first by adding one to the denominator minus 1.

This is a pattern I found strange (and still find unintuitive) in Agda. Rather than storing something in a format that you want (the denominator), you instead store it in a format that makes correctness work.

Read on for part II...

15 August, 2010

acetate slides

This pdf shows an interesting style of PDF slide generation that seems to be "acetates + digital camera". I guess they were presented on acetates to begin with and scanned in? Not so weird as I first though, which was that they were drawn on whiteboard and photographed... but still novel compared to powerpoint.

08 August, 2010

Erdős number

There's this concept of an Erdős number for an academic author: take the graph of academic authors where two authors are joined together if they have written a paper together; then your Erdős number is the number of hops from you to the currently deceased mathematician Paul Erdős.

I first read about him a decade ago in The Man Who Loved Only Numbers but I've never got round to actually computing my number 'till now.

Borja Sotomayor shares a co-author with me (Ian Foster), so my number is less than or equal to his. He is number 6, so my number is ≤6, through this chain:

Ben Clifford → Ian T. Foster → Patrick H. Worley → Robert S. Schreiber → John Russell Gilbert → Roger C. Entringer → Paul Erdös

(p.s. its hard to type the ő in Erdős. Its not ö, although it looks like it)

Ben Clifford Technical Blog