Exercise - Team Balancing

xterm · 2011-01-28T11:14:23+00:00

Given - 10 players (Name, Rating) where name is text and rating is a positive integer. Your program should - Split the players into two teams of 5 with th...

MSD

arithma wroteThat's why I ported the 64 bit generator to C#. For a given seed, it generates the exact same sequence of pseudo-random numbers.
So by sharing a seed (and any additional steps to shape the input to the algo), we can avoid having to upload random numbers to the internets.
I checked your code and you're still using the 32bit mersene twister.

random() % 5000 should be the formula you use to get each sequence number.
We'll make sure everything is in place tonight.

Both versions (64 and 32) are in the project but I am using the 64 bit one, and the latest push includes your suggestions of % 5000

 mt19937.sgenrand(100);
.
.
.
_Players[i] = new Player { Name = "Player" + i.ToString(), Rating = (long)(mt19937.genrand() % 5000) };

arithma

Here are the first 10 numbers from the sequence:

123113 - 123114, I got it down partitioned to this in no time. Let's try to find another meaty seed and formula.

I used 10,000 this time instead of 5,000 for the range. The number of elements is 30.
This one proved just a little bit more difficult but ultimately I got to the theoretically optimal solution. Does your algo do that?
The 10 first elements from the sequence generated here are:

MSD

arithma wroteHere are the first 10 numbers from the sequence:
518
1091
42
1965
1638
4596
3829
2824
2712
2614
123113 - 123114, I got it down partitioned to this in no time. Let's try to find another meaty seed and formula.

I used 10,000 this time instead of 5,000 for the range. The number of elements is 30.
This one proved just a little bit more difficult but ultimately I got to the theoretically optimal solution. Does your algo do that?
The 10 first elements from the sequence generated here are:
2559
930
6175
6621
8386
4565
5337
219
6201
719

You could have given me the seed instead of me having to figure it out :)
Anyways, I got a 50 / 50 split in about 20 milliseconds. Seed is 574, range is 10,000, and there are 30 players:

Generated 30 players in 1.9532 milliseconds

100 %
Split 30 players into 2 15 player teams in 20.5078 milliseconds.

and here is the result split:

Team 1: (50.00%)
Name: Player7, Rating: 219
Name: Player9, Rating: 719
Name: Player19, Rating: 998
Name: Player0, Rating: 2559
Name: Player5, Rating: 4565
Name: Player6, Rating: 5337
Name: Player22, Rating: 5847
Name: Player2, Rating: 6175
Name: Player8, Rating: 6201
Name: Player3, Rating: 6621
Name: Player14, Rating: 6857
Name: Player23, Rating: 7746
Name: Player13, Rating: 8012
Name: Player4, Rating: 8386
Name: Player25, Rating: 9811
Team 2: (50.00%)
Name: Player12, Rating: 28
Name: Player1, Rating: 930
Name: Player21, Rating: 2489
Name: Player28, Rating: 3026
Name: Player26, Rating: 3111
Name: Player24, Rating: 3841
Name: Player27, Rating: 4581
Name: Player17, Rating: 4708
Name: Player18, Rating: 5288
Name: Player16, Rating: 6388
Name: Player10, Rating: 7753
Name: Player15, Rating: 9081
Name: Player29, Rating: 9490
Name: Player20, Rating: 9532
Name: Player11, Rating: 9808

How much did it take you?

arithma

Actually I've messed up my code. It's returning around a difference of 37 at best. But I can run it 10 times in 20milliseconds so I have some space for extra hardening.

arithma

Here are my results. The algo does not always return the optimal solution, but it's fast enough nonetheless.

719
930
998
2559
3111
4581
5337
6201
6388
6621
6857
7746
8386
9808
9811
-----
28
219
2489
3026
3841
4565
4708
5288
5847
6175
7753
8012
9081
9490
9532

49055*1000000/2337919 = 20982microseconds

MSD

So, given the evidence, mine is more accurate since it got a best possible solution, and faster by a thousand times?

arithma

Neither. 20982microseconds = 20.982milliseconds. The difference between the two partitions up there is 1 which means it's optimal.

MSD

arithma wroteNeither. 20982microseconds = 20.982milliseconds. The difference between the two partitions up there is 1 which means it's optimal.

I guess I did not notice the unit difference, also my percentages had only 2 decimal points, so my bad. Now, let's test more demanding scenarios, shall we?
How about seed is 574, range is 100,000, and there are 1000000 players?

arithma

Dude, whenever the number of players is larger than the range, it means the probability to find an exact match to balance a folded partition is maximal (very close to 1).

This one is tricky: 50 numbers between 0 and 16777215; seed is 574.
I got it down to 457. I am using an assisted algorithm, so I am hand picking the results, and running it in batches to get to get a feel of the kind of results expected.

By the way, I read a little about this problem, and it's proved to be NP-Complete. Definitely not for the faint of heart.

MSD

arithma wroteDude, whenever the number of players is larger than the range, it means the probability to find an exact match to balance a folded partition is maximal (very close to 1).

This one is tricky: 50 numbers between 0 and 16777215; seed is 574.
I got it down to 457. I am using an assisted algorithm, so I am hand picking the results, and running it in batches to get to get a feel of the kind of results expected.

By the way, I read a little about this problem, and it's proved to be NP-Complete. Definitely not for the faint of heart.

OK, using your parameters:

Generated 50 players in 1.9532 milliseconds

100 %
Split 50 players into 2 25 player teams in 14.6484 milliseconds.

Team 1: (50.00%)
Name: Player41, Rating: 966519
Name: Player46, Rating: 1686585
Name: Player18, Rating: 1803468
Name: Player6, Rating: 1959387
Name: Player34, Rating: 2182364
Name: Player0, Rating: 2598634
Name: Player19, Rating: 3177323
Name: Player43, Rating: 3632802
Name: Player22, Rating: 3861007
Name: Player35, Rating: 4436305
Name: Player8, Rating: 6540726
Name: Player7, Rating: 8330499
Name: Player11, Rating: 8696668
Name: Player21, Rating: 8845574
Name: Player12, Rating: 9612253
Name: Player14, Rating: 10764907
Name: Player3, Rating: 10908201
Name: Player4, Rating: 11330586
Name: Player1, Rating: 12616905
Name: Player5, Rating: 13587405
Name: Player2, Rating: 14496970
Name: Player24, Rating: 14675376
Name: Player15, Rating: 15038036
Name: Player9, Rating: 15388684
Name: Player10, Rating: 16149803
Team 2: (50.00%)
Name: Player30, Rating: 54122
Name: Player26, Rating: 237176
Name: Player20, Rating: 357562
Name: Player16, Rating: 1550413
Name: Player36, Rating: 1648267
Name: Player13, Rating: 4009062
Name: Player40, Rating: 4222232
Name: Player23, Rating: 4438936
Name: Player32, Rating: 5783323
Name: Player49, Rating: 6502405
Name: Player48, Rating: 6508508
Name: Player28, Rating: 7087411
Name: Player25, Rating: 7807041
Name: Player38, Rating: 8156271
Name: Player33, Rating: 8222133
Name: Player29, Rating: 10248995
Name: Player39, Rating: 10606561
Name: Player45, Rating: 11022305
Name: Player37, Rating: 12884937
Name: Player44, Rating: 13238860
Name: Player31, Rating: 13630745
Name: Player17, Rating: 16134598
Name: Player27, Rating: 16157036
Name: Player47, Rating: 16263071
Name: Player42, Rating: 16516868

Could you post those resources about this problem?

Joe

Are you benchmarking on separate computers?

arithma

I guess we are. I am only interested in the notion of performance rather than concrete comparison. As in does it take milliseconds, tens, hundreds of milliseconds, seconds, minutes, hours, does not halt in the foreseeable future.

@MSD: This is the article I was referring to earlier. I hope you'd post the difference between the partitions rather than the percentages. I will post the partition tonight.

As I said before (I think) this algorithm I am using now needs my interference for different problem sizes because I need to manually tune some iterations to get reasonable results. I will post something soon. Do note however that I lost my obsession with this problem, thankfully.

Joe

arithma wroteI guess we are. I am only interested in the notion of performance rather than concrete comparison. As in does it take milliseconds, tens, hundreds of milliseconds, seconds, minutes, hours, does not halt in the foreseeable future.

Then maybe it'd be nice if you could print your specs. Again, not for comparison (you run at the milliseconds level), but to give a general idea. After all there is a wide selection of processors available today, ranging from mobiles ARM or Intel Atom, to 4 core processors like the i7 or the Quad Core.

Knowing your processing power would help us get a better idea of your times.

MSD

Difference is 1840.
@arithma, could you post your results?

arithma

Sure, I'll post it at night.

arithma

Here we go. Got it down to 129.

8222133 237176 8696668 966519 9612253 1648267 10606561 1803468 10908201 2182364 11330586 3177323 12884937 3861007 13587405 4222232 14496970 4438936 15038036 6502405 16134598 6540726 16157036 7807041 16516868 
54122 8330499 357562 8845574 1550413 10248995 1686585 10764907 1959387 11022305 2598634 12616905 3632802 13238860 4009062 13630745 4436305 14675376 5783323 15388684 6508508 16149803 7087411 16263071 8156271 
129
1397*1000000/2337910 = 597microseconds

Note that the really small times are due to the fact that it's an assisted algorithm. I had to run it 9 times to get a decent result.

By the way, the 2337910 corresponds to my CPU frequency. You can get it by calling the Win32 API function: QueryPerformanceFrequency. All the other factors in a PC are not relevant since the problem size does not touch RAM.

MSD

Are these the solution results? Is the first line the first team and the second line the second? I cannot see the 129 difference you mentioned.

arithma

@MSD: Good eye. I was outputting the wrong array. Here is the real thing with better results (difference of 5):

1686585, 1803468, 1959387, 2182364, 3177323, 3632802, 4222232, 4438936, 5783323, 6502405, 6508508, 7087411, 7807041, 8222133, 8696668, 8845574, 9612253, 10248995, 10764907, 12884937, 13630745, 14675376, 16134598, 16263071, 16516868, 
54122, 237176, 357562, 966519, 1550413, 1648267, 2598634, 3861007, 4009062, 4436305, 6540726, 8156271, 8330499, 10606561, 10908201, 11022305, 11330586, 12616905, 13238860, 13587405, 14496970, 15038036, 15388684, 16149803, 16157036, 
5
104391*1000000/2337919 = 44651microseconds

arithma

Here's the code: http://pastebin.com/APUgUrYB
I placed some decent comments on the important sections of the code. The C++ code is unobfuscated (largely by removing the bigger iterator infested arithmetic formulas where I used the * operator for multiplication and dereferencing an iterator) now. You can still skim and understand what's happening, I hope.

Getting 144421 for 10 numbers with the 574 seed. This could very well be a nonoptimal result and verifiable to be not using a combination generator (something like your first suggestion).

Joe

Here's a oneliner that solves the problem. It takes a list of scores/ranking as an input and returns a 2-tuple with balanced scores.

Also, it's not really a oneliner when you're using itertools.permutation() :P

def oneline(l):
    tsz = len(l)/2 # the size of each team, half of the total array size

    # Behold the mighty one liner
    return sorted((((x[tsz:], x[:tsz]), abs(sum(x[:tsz]) - sum(x[tsz:])))
                   for x in prmt(l)),key=iget(1))[0][0]

« Previous Page Next Page »