Polarities (Part 3)

Azimuth 2024-11-05

I’m talking about ‘causal loop diagrams’, which are graph with edges labeled by ‘polarities’. Often the polarities are simply + and - signs, like here:

But polarities can be elements of any monoid, and last time I argued that things work even better if they’re elements of a rig, so you can not only multiply them but also add them.

In fact, I argued that it’s even better if polarities are elements of a ‘hyperfield’. But what’s a hyperfield, and why are these good?

First, what’s a hyperring? Briefly and roughly, it’s a ring where addition can be multivalued. For example the set

\mathbb{S} = \{+,0,-\}

is a hyperring. We multiply things in the obvious way: e.g.

- \times - = -

because negative times negative is positive. Whenever possible, we also add things in the obvious way. For example,

- \boxplus - = -

because the sum of two negative things is negative. Note that we write addition using the funny symbol \boxplus since we don’t want people to see equations like

+ + + = +

and go insane. But there’s an ambiguous case: adding + and -. When you add a positive thing and a negative thing, it could be positive, negative or zero! Here is where it comes in handy to let addition be multivalued. We write this as

+ \boxplus - = \{+, 0, -\}

The idea of an algebraic gadget where addition can be multivalued goes back to Frédéric Marty, who introduced ‘hypergroups’ in 1934. In the 1950s, Marc Krasner developed the theory of ‘hyperrings’ and ‘hyperfields’. They took a while to catch on… but by now people have discovered you can do good math with them! As Oleg Viro wrote in 2010:

Krasner, Marshall, Connes and Consani and the author came to hyperfields for different reasons, motivated by different mathematical problems, but we came to the same conclusion: hyperrings and hyperfields are great, very useful and very underdeveloped in the mathematical literature… Probably, the main obstacle for hyperfields to become a mainstream notion is that a multivalued operation does not fit to the tradition of set-theoretic terminology, which forces to avoid multivalued maps at any cost. I believe the taboo on multivalued maps has no real ground, and eventually will be removed.

I discovered them, and also this quote, through the blog articles of Matt Baker, who has been using them in combinatorics. And I think they can be useful in applied math, for computing with qualitative information!

What’s especially nice is that \mathbb{S} is not just a hyperring: it’s a ‘hyperfield’, meaning that you can also divide by anything except 0. As you’d expect, it works like this:

+ \div + = + + \div - = - - \div + = - - \div - = +

0 \div + = 0 0 \div - = 0

Furthermore, any field is a hyperfield, so \mathbb{R} is a hyperfield, and there is a homomorphism of hyperfields

p \colon \mathbb{R} \to \mathbb{S}

defined so that

p(x) = \left\{ \begin{array}{ccc} + & \text{if} & x > 0 \\  0 & \text{if} & x = 0 \\  - & \text{if} & x < 0 \end{array} \right.

This is really cool! Recall that a rational function in several variables is any ratio of polynomials, like

\displaystyle{ \frac{x^3 y - x^2 + 1}{y^4 + 2x y} }

Since the map p \colon \mathbb{R} \to \mathbb{S} preserves addition, subtraction, multiplication and division, we can take any equations involving rational functions of real variables, and turn them into equations between rational functions of variables valued in \mathbb{S}, in a completely systematic way by applying p!

That is, we can turn ‘quantitative’ equations involving real numbers, and turn them into ‘qualitative’ equations where all we care about is whether the numbers are positive, negative or zero! And this process is quite well-behaved.

But let me back up a bit and say what hyperrings and hyperfields are.

Hyperrings and hyperfields

To define a ring we usually first define an abelian group, since addition makes any ring into an abelian group. Similarly, to define a hyperring we should first define an ‘abelian hypergroup’. Unfortunately people call this a ‘canonical hypergroup’—I’m not sure why.

I’ll lead up to the definition rather slowly. First, in our canonical hypergroup we want to allow addition to be many-valued… but not undefined. Thus, for any set G let P_\ast(G) be the collection of nonempty subsets of G. Then addition in a canonical hypergroup will be a function

\boxplus \colon G \times G \to P_\ast(G)

But what about expressions like a + (b + c)? If b + c is actually a subset of A what does it mean to add a to that? Luckily this is no problem: we can always extend addition

\boxplus \colon G \times G \to P_\ast(G)

to an operation on nonempty subsets, which we denote by the same symbol:

\boxplus \colon P_\ast(G) \times P_\ast(G) \to P_\ast(G)

Here’s how we do it! Suppose A and B are nonempty subsets of G. Then we define

\displaystyle{ A \boxplus B = \bigcup_{a \in A, b \in B} a \boxplus b }

That is, A \boxplus B consists of all possible values that we can get from adding an element of A to an element of B. Similarly, we define

\displaystyle {a \boxplus B = \bigcup_{b \in B} a \boxplus b }

and similarly for A \boxplus b.

Now we’re ready for the definition of canonical hypergroup:

Definition. A canonical hypergroup is a set G with a map

\boxplus \colon G \times G \to P_\ast(G)

obeying:

1) the commutative law: a \boxplus b = b \boxplus a for all a,b \in G.

2) the associative law: (a \boxplus b) \boxplus c = a \boxplus (b \boxplus c) for all a,b,c \in G.

3) the unit law: 0 \boxplus a = \{a\} = a \boxplus 0 for all a \in G.

4) the existence and uniqueness of additive inverses: for every a \in G there exists a unique element b \in G such that 0 \in a \boxplus b and thus 0 \in b \boxplus a. We call this element -a.

5) compatibility of addition and subtraction: a \in b \boxplus c if and only if c \in b \boxplus (-a).

Notice that while addition is multivalued, taking negatives is not! Similarly, in our addition of ‘hyperring’, only addition will be multivalued. You can imagine being more general, but people don’t do that—and this seems fine for the applications I have in mind.

Definition. A hyperring is a set R with:

a) an addition map \boxplus \colon R \times R \to P_\ast(R)

b) a multiplication map \cdot \colon R \times R \to R

c) elements 0,1 \in R

such that:

1) \boxplus \colon R \times R \to P_\ast(R) makes R into a canonical hypergroup with unit element 0

2) \cdot \colon R \times R \to R makes R into a monoid with unit element 1

3) The distributive laws hold: a \cdot (b \boxplus c) = (a \cdot b) \boxplus (a \cdot c) and (a \boxplus b) \cdot c = (a \cdot c) \boxplus (b \cdot c) for all a,b,c \in R

4) The zero laws hold: a \cdot 0 = 0 = 0 \cdot a for all a \in A

Finally:

Definition. A hyperfield is a hyperring R such that every nonzero element a \in R has a unique element b \in R with ab = 1 = ba. We write this element b as a^{-1}.

Quotient hyperrings and hyperfields

Here is a nice way to get lots of hyperrings. Let R be a ring and let R^\times be its group of units, i.e. the set of elements with multiplicative inverses, made into a group using multiplication. Let G \subseteq R^{\times} be any subgroup such that r \cdot G = G \cdot r for all r \in R. Let R/G be the set of equivalence classes of elements of R, where r \sim s if and only if r = s g for some g \in G. Thanks to the condition r G = G r, we can multiply equivalence classes by

[r] \cdot [s] = [r \cdot s]

On the other hand, we define the sum of equivalence classes [r] and [s] to be the set of all equivalence classes [r + s]. You can check that this makes R/G into a hyperring called a quotient hyperring of R.

In particular, if R is a hyperfield so is R/G.

There’s always a map from any hyperring R to any of its quotient hyperrings R/S. But I forgot to define maps! A map from a hyperring R to a hyperring S is a function

f \colon R \to S

with

f(r \boxplus r') \subseteq f(r) \boxplus f(r')

and

f(r \cdot r') = f(r) \cdot f(r')

for all r, r' \in R. With this definition, it’s easy to see that the function

p \colon R \to R/S

sending any r \in R to its equivalence class [r] \in R/G is a map of hyperrings.

The Krasner hyperfield

For example, let R be the field \mathbb{R}. If we take G to be the group of all nonzero elements of \mathbb{R} then the quotient hyperring \mathbb{R}/G has just two elements. One is the equivalence class of all nonzero reals, while the other is the equivalence classes containing just 0. If we call these two equivalence classes 1 and 0, we get

1 \boxplus 1 = \{0,1\}

because the sum of two nonzero reals can be zero or nonzero.

So, we get a 2-element hyperfield called the Krasner hyperfield

\mathbb{K} = \{0,1\}

If we use elements of the Krasner hyperfield as polarities, what can these polarities mean? One guess is that 1 means has an effect while 0 means does not have an effect.

But this seems useless for causal loop diagrams, since we can use plain old graphs to convey the same information—right? If we want to indicate that X has an effect on Y, we draw an edge from X to Y. Otherwise we just don’t draw an edge from X to Y.

But wait. Causal loop diagrams using the Krasner hyperfield convey more information than plain old graphs! There’s a difference between an edge labeled by 0 and no edge at all.

So, there must be some better interpretation of the polarities 0 and 1 in the Krasner hyperfield. What is it? There may be more than one. What are they—can you help me here?

The sign hyperfield

Here’s an example I understand better: this one seems very important. Start with the field \mathbb{R} again, but now take G to be the group of all positive elements of \mathbb{R}. Now the quotient hyperring \mathbb{R}/G has three elements. One is the equivalence class of all positive reals, another is the equivalence class of the negative reals, and the third is the equivalence classes containing just 0. If we call these equivalence classes +, 0 and - we get the sign hyperfield

\mathbb{S} = \{+, 0, - \}

In this hyperfield, only case where addition is many-valued is

+ \boxplus - = \{+,0,-\}

because the sum of a positive real and a negative real can be positive, negative or zero. I’ve already mentioned the map

p \colon \mathbb{R} \to \mathbb{R}/G = \mathbb{S}

and now we know, from our general theory, why this is a map of hyperrings!

Next I should say how we use hyperfields (or more general hyperrings) in causal loop diagrams. But not today!