• Coding
  • [Exercise] Implementing Sets with functions

Here's a variation on a exercise I recently came across recently. We're going to work on representing sets of integers. What's a set?

From Wikipedia:
In computer science, a set is an abstract data structure that can store certain values, without any particular order, and no repeated values.
It's worth repeating that in our exercise, we are dealing with sets of integers, to keep it simple. Accordinlgy, by this definition, we can consider {1, 3, 23, -5, 32, 4} as a set. However, not all sets have a finite number of elements. For instance, we can consider the set of all the negative integers. That makes it difficult to represent with our traditional collections.

For this reason we will represent our sets with a function that tests if an element is part of the set or not. Mathematically speaking, this function is called the characterstic of our set. For instance, in Lua, here's how we'd represent our set of negative integers:
set_of_negative_ints = function(x) return x <= 0 end
And here's how we would represent the singleton {5}:
singleton_5 = function(x) return x == 5 end
The goal of the exercise is to implement common operations on sets, using this functional representation.

Basic operations
  • union(setA, setB): returns a set containing all the elements in setA and setB.
  • intersection(setA, setB): returns a set containing all the common elements between setA and setB.
  • diff(setA, setB): returns a set containing all the elements of setA which are not in setB.
  • symmetic_diff(setA, setB): returns a set containing all the elements in setA or setB but not in both.
Working with predicates

A predicate is a function that returns a boolean. In our case we will call p a predicate that accepts one integer as argument. Here are some functions to implement to make sets work with predicates:
  • filter(s, p): returns a set containing all the elements of s where p(element) is true.
  • all(s, p): returns true if applying p on every element of s returns true every time.
  • any(s, p): returns true if there exists at least one element of s such as p(element) evaluates to true.
Note: If you're having problems (I am, so far) implementing all(), you could try something simpler to move on with subsequent exercises. Indeed, the only obvious way to do this is to test p on each element of the set. In order to do so, it could be simpler to set a boud high enough (e.g. 1000) and test p on each element going from -bound to +bound. However you should still be able to implement any() in terms of all().

Bonus question
Using the functions we've already defined, implement the function map(s, f) where f is a function that accepts one integer as argument and returns an integer. The map() function should return a set that is the result of applying f on every element of the set s.

For instance:
s = {1, 3, -8}
f = function(x) return x*2 end

assert(map(s, f) == {2, 6, -16})
Hint
  • Keep in mind that in our representation, a set is a function.
  • The implementation of each of these functions (except maybe all()) is really short, possibly a one-liner.
Nice exercise I wanted to work on most of these functions a while ago, so now is the time. But there is one thing I didn't understand. You mentioned infinite sets in the beginning of your post, for the operations to implement in this exercise the functions have to work on infinite sets too?

For example the way I see it, input sets would be represented as a pair(start and end) and for example for the intersection function set A would be [5,infinity) and set B [7,-50] so the resultant set would be {5,6,7} ? Or is it just on finite sets input into the function as arrays?
Ayman, the representation of sets is already defined. A set is a function against which you can test if a given element is part of the set or not. Re-read my post, the 3rd paragraph starting with "For this reason...".

I was afraid I wasn't being clear enough. Let me try to rephrase this: A set is a function that takes an int as argument and return a boolean. So the Union function takes 2 functions as arguments and returns a function. All these functions have the same signature:
Int => Boolean.
You're not asked to come up with a representation for the sets. This is already decided. Your job is to implement the functions according to the decided representation.

The implicit goal of this exercise is to make you realize how blurry the limit between code and data can be.

I hope I'm being clear enough, but if I'm not, please feel free to ask!
{ma, mb} = {0, 100};
EmptySet[] := False;
SetUnion[s_, r_] := Or[s@#, r@#] &;
SetIntersection[s_, r_] := And[s@#, r@#] &;
SetDifference[s_, r_] := And[s@#, Not[r@#]] &;
SetSymDifference[s_, r_] := Xor[s@#, r@#] &;
SetEvaluate[s_, a_, b_] := Select[Table[i, {i, a, b}], s];

SetFilter[s_, p_] := SetEvaluate[And[s@#, p@#] &];
SetAll[s_, p_] := SetFilter[s, p] == SetEvaluate[s];
SetAny[s_, p_] := SetFilter[s, p] != SetEvaluate[EmptySet[]];
s = Mod[#, 2] == 0 &;
r = Mod[#, 3] == 0 &;

SetEvaluate[s]
SetEvaluate[r]

SetEvaluate[SetUnion[s, r]]
SetEvaluate[SetIntersection[s, r]]
SetEvaluate[SetDifference[s, r]]
SetEvaluate[SetSymDifference[s, r]]

p = Mod[#, 12] == 0 &;

SetFilter[s, p]
SetAll[s, p]
SetAny[s, p]
{0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100}
{0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42, 45, 48, 51, 54, 57, 60, 63, 66, 69, 72, 75, 78, 81, 84, 87, 90, 93, 96, 99}

{0, 2, 3, 4, 6, 8, 9, 10, 12, 14, 15, 16, 18, 20, 21, 22, 24, 26, 27, 28, 30, 32, 33, 34, 36, 38, 39, 40, 42, 44, 45, 46, 48, 50, 51, 52, 54, 56, 57, 58, 60, 62, 63, 64, 66, 68, 69, 70, 72, 74, 75, 76, 78, 80, 81, 82, 84, 86, 87, 88, 90, 92, 93, 94, 96, 98, 99, 100}
{0, 6, 12, 18, 24, 30, 36, 42, 48, 54, 60, 66, 72, 78, 84, 90, 96}
{2, 4, 8, 10, 14, 16, 20, 22, 26, 28, 32, 34, 38, 40, 44, 46, 50, 52, 56, 58, 62, 64, 68, 70, 74, 76, 80, 82, 86, 88, 92, 94, 98, 100}
{2, 3, 4, 8, 9, 10, 14, 15, 16, 20, 21, 22, 26, 27, 28, 32, 33, 34, 38, 39, 40, 44, 45, 46, 50, 51, 52, 56, 57, 58, 62, 63, 64, 68, 69, 70, 74, 75, 76, 80, 81, 82, 86, 87, 88, 92, 93, 94, 98, 99, 100}

{0, 12, 24, 36, 48, 60, 72, 84, 96}
False
True
SetMap[s_, f_] := SetEvaluate[Compose[s, FunctionInverse[f]]];
@geek, quick question: what does Select return? How come you can == and != two instances?
Mathematica will always puzzle me.
@arithma: How can you show that any/all is a halting problem? Also, it's fun to implement one in terms of the other.
Filter is the same as Intersection, but it's arguably only an implementation detail in this exercise. From an API point of view they're not the same.

Anyway, show us something for map(). It's the only interesting one.
map(s,f).
What if f: x -> x%2
U = [0, 1, 2, ... 9]
s = x > 5 = [6, 7, 8, 9]
map(s,%2) = [0,1]
If we assume we do have an "any" function, f is a mapping (int->int) and s is a set (int->bool) then:
reverse((int -> bool), int) -> (int -> bool)
reverse(f,x) = (i -> f(i) == x)
map(s,f) = x -> any(and(s,reverse(f,x)))
If the predicate for any/all are indeed computable (which I admit is the only sensible option to begin with), I figure the only problem for implementing any/all are the infinities involved.

You can subvert those two problems by returning, in each case, a function that acts on sets so that computation is deferred until the actual set you need to compute against is available. This would make it more of an "expression builder".
3 months later
@Rahmu, I agree with Arithma that any/all are very probably equivalent to the halting problem. You're asking the computer to determine whether the output of a function will *always*(in the case of all, for any, if it will ever) be true when run against a procedurally generated set of potentially infinite size.

Let me put it this way. If you could implement "any" then your program would have to be able to prove Fermat's last theorem. How so? Consider the following function in python:
def divide(n):
    # divide n into 4 parts
    # We assume len(str(n))%4 == 0, we can generalize to cases that do
    # not have this property but this should make the point well enough
    s = str(n)
    l = len(s)/4
    return map(int, (s[:l], s[l:l*2], s[l*2:l*3], s[l*3:l*4]))  

def f(n):
    a,b,c,exp = divide(n)
    return a**exp+b**exp == c**exp

Example usage:

>>> f(3452) # 3^2+4^2=25==5^2 => return True
True
>>> f(3453) # 3^3+4^3=93!=5^3 => return False
False
>>> 

Now, consider the following function based on f:

def g(n):
    a,b,c,exp = divide(n):
    if exp <= 2:
        return False
    return f(n)

And the following set(basically Z):

def Z(n):
    return True

Now, to have any(Z,g) respond with the correct answer, that is, False, it would have to prove Fermat's last theorem. Somehow, I doubt any of us would be able to write such a program.