[Exercise] Improving performance through caching

xterm · Aug 6, 2012

Give this a try, I didn't give it much test (wrote it in the browser)

def cached(f):
    cache = {}
    def _decoed(*args):
        if args in cache:
            return cache[args]
        else:
            cache[args] = f(*args)
            return cache[args]
    return _decoed

Could easily be a one liner too, I'll change it later.

Joe · Aug 6, 2012

xterm, your solution is similar to mine with one notable difference:

def cached(fn):
    fn.db = {}
    def cachedfn(n):
        # If the answer is already in the db, return this value.
        if n in fn.db:
            answer = fn.db[n]
        else:
            # Else calculate it and save the value in the db
            answer = fn(n)
            fn.db[n] = answer
        return answer

    return cachedfn

I though the cache dictionary has to be an attribute of the function object. I tried your solution and it works. I am not sure how your cache dictionary remains persistent throughout every execution of your function. I must be missing something.

Also, your use of *args is better than my stupid single argument cheapness.

xterm · Aug 6, 2012

rahmu wroteI am not sure how your cache dictionary remains persistent throughout every execution of your function. I must be missing something.

Actually, yours may be the proper way of doing it (setting the cache on the function itself). Mine may conflict with other functions if the args are the same!

Try it again on another function (factorial) and see if it conflicts when both are running one after the other:

fib(10)
factorial(10)

In terms of how its persisting, I think it's in globals(), i'll check tonight.

xterm · Aug 6, 2012

Ok the persistence seems straight forward, the decorator creates the inner function the first time and simply reuses it everytime fib(n) is called that's why cache lingers.

I'll update this when i test if it conflicts.

Edit:

Apparently there's no conflict, it seems each function that gets decorated gets its own returned function with its own scope.

geek · Aug 6, 2012

Excuse the Mathematica but this is one of the many reasons I love this language

Fib[0] = 0;
Fib[1] = 1;
Fib[n_ /; n > 1] := Fib[n] = Fib[n - 1] + Fib[n - 2];

By the way, this particular form of caching is called memoization.

Joe · Aug 6, 2012

@geek: can you please at least explain that piece of code?

Also, thank you for pointing out the name of the technique. It allowed me to find this code, using a decorator class instead of a function.

Fischer · Aug 6, 2012

http://code.activestate.com/recipes/577479-simple-caching-decorator/

xterm · Aug 6, 2012

Here we go, took me a while but was finally able to compact it (mind you i'd prefer the previous version over this one)

def cached(f, cache={}):
  return lambda *a: cache[a] if a in cache else cache.update({a: f(*a)}) or cache[a]

This is the same as the previous implementation the notable difference is:
1- The returned function is no longer named.
2- In python all mutator functions should return None so if the cache of a parameter doesn't exist, cache.update() is updating the dictionary and returning None. None or cache[a] will return cache[a].

Ayman · Aug 7, 2012

Interesting exercise, indeed self caching programs(Memoization) are really useful. I did a similar implementation as the ones above but in Javascript I used the decorator pattern too. Trying to think of a way without necessarily using decorators.

/*Fibonacci calculation function */
function fib(n)
{
	if(n < 2)
		return n;
	else 
		return fib(n-1)+fib(n-2);
}

/* Memoization function */
function memoize(f)
{
	var cache = {};

	/* puts arguments into the cache and forwards the call to f */
	function wrapper(args)
	{
		if(!(args in cache))
		{
			cache[args] = f.call(this,args);
		}
		return cache[args];
	}

	/* method to clear the cache when needed */
	wrapper.clearCache = function()
	{
		cache = {};
	}
	
	return wrapper;
}

/* Usage */
fib = memoize(fib);
console.log(fib(10));

@rahmu regarding your Fibonacci function, as far as I know fib(0)=0 and fib(1)=1, in yours fib(0) would return 1 and result in a larger final answer.

Jjsaade · Aug 7, 2012

just for fun (does not do tha job, but a nice c++ template+macros (vs20100

#include <iostream>
#include <map>

using namespace std;

template <class T, typename R>
class CacheThis{
	map<R, R> cMap;	
	T* _f;
public:
	CacheThis(T *func){
		_f = func;
	}
	R Call(R a){
		map<typename R,typename R>::iterator it=cMap.find(a);
		if(it == cMap.end()){
			cMap[a] = (*_f)(a);
		}
		return cMap[a];
	}
};

template<typename R>
struct Function
{
	typedef R (*FPTR)(R);
};
typedef Function<unsigned int>::FPTR UINT_FUN_PTR;


#define CALL_FN(F,val) CacheThis<UINT_FUN_PTR,unsigned int> v(&F); cout<<v.Call(val);


unsigned int fib(unsigned int n)
{
	if(n==0) return 0;
	else if(n==1) return 1;
	else return fib(n-1)+fib(n-2);
}

void main()
{
	UINT_FUN_PTR myfunctionptr = &fib;
	CALL_FN(myfunctionptr,40);
}

Joe · Aug 7, 2012

@Ayman: Nice catch! I fixed it.

@xterm: The cache variable lingers on because of the creation of the closure. And to the best of my knowledge (and that of the guy who very nicely answered my question) there's no way of accessing the variable from outside the _decoed function.

Also, what does this mean?

2- In python all mutator functions should return None so if the cache of a parameter doesn't exist, cache.update() is updating the dictionary and returning None. None or cache[a] will return cache[a].

xterm · Aug 7, 2012

rahmu wroteAlso, what does this mean?

It means in python, functions that mutate the object they are called on should return None and not another object or the object itself.

cache.update() is changing cache, so it returns None.

result = cache.update(...)
assert result is None

If update doesn't change cache, it should return a new object that was updated. E.g.:

newcache = cache.update({'a', 1})

assert 'a' in newcache # passes
assert 'a' not in cache # passes

P.S: This is not a language feature or anything but simply a followed approach when building such functions.

Edit: what you quoted should be reformatted sorry:

2- if the cache of a parameter doesn't exist, cache.update() is updating the dictionary and returning None. None or cache[a] will return cache[a]. (In python all mutator functions should return None)

Ssaeidw · Aug 7, 2012

This was an interesting excercise, I ended up on the Haskell wiki and they have this really elegant solution:

memoized_fib :: Int -> Integer
memoized_fib = ((map fib' [0..]) !!)
    where
        fib' 0 = 0
        fib' 1 = 1
        fib' n = memoized_fib (n - 1) + memoized_fib (n - 2)

The list generated by map fib' [0..] is available to the fib' function as a cache, If you ask for memoized_fibs n, at the point where fib' n is calculated, the list contains [fib' 0, ..., fin' n - 1] so that the computation of fib' n is a simple indexing and addition.

Using a list for the cache isn't the most efficient way of doing it, though.

arithma · Aug 8, 2012

@saeidw: This is indeed an interesting one. What does the !! do?
Is there anyway to know how the computation is carried out?
Since this is returning an indexable list, will it be the case that the computation will run it from 0 till the indexed number or will it only compute the necessary numbers?
Imagine another series where:
S[0] = 1
S[1] = .5
S[n] = S[n-2] * 2

If we need to compute S[16] we only need to compute S where i is even and 0 <= i < 16.

geek · Aug 20, 2012

@rahmu: Functions That Remember Values They Have Found
@arithma: !! returns the nth element of a list, e.g., `[0 ..] !! n` returns n. Only the necessary _expressions_ are evaluated (lazy evaluation), i.e., only `s i` where i is even and less than 16 for `s 16`.

Joe · Jan 2, 2013

rahmu wroteThe cache variable lingers on because of the creation of the closure. And to the best of my knowledge (and that of the guy who very nicely answered my question) there's no way of accessing the variable from outside the _decoed function.

Today I found a way to access these variables in Python. Each function object has a __closure__ attribute that holds the variables.
The variable __closure__ is a tuple of cell objects.

Case study
Let's consider the code at hand:

def cached(f):
    cache = {}
    def _decoed(*args):
        if args in cache:
            return cache[args]
        else:
            cache[args] = f(*args)
            return cache[args]
    return _decoed

@cached
def fib(n):
    if n <= 1:
        return n
    else:
        return fib(n-1)+fib(n-2)

Let's read inside the closure
We can access the cache directory of the function like this:

>>> fib.__closure__
(<cell at 0x14b5210: dict object at 0x14ad7e0>, <cell at 0x14b5328: function object at 0x7fbc9fa9f628>)

As you can see, the cell 0 holds a dict object. We can inspect the variable it contains using the attribute cell_contents:

>>> fib.__closure__[0].cell_contents
{}

Of course the dict is still empty. If we call fib(5) we expect to find 6 values inserted:

>>> fib(5)
5
>>> fib.__closure__[0].cell_contents
{(0,): 4, (1,): 1, (2,): 1, (3,): 2, (4,): 3, (5,): 5}

Let's write inside the closure
~~This one scares me a bit. You can write and modify the values inside the cells.~~
EDIT: Turns out you can modify the values inside the dict, because dicts are mutable. But you cannot change the content of a cell.
Like this:

>>> fib.__closure__[0].cell_contents[(5,)] = -3 # modifying the value inside the cache dict
>>> fib(5)
-3
>>> fib(6)
0