[Exercise] Finding the pair

Joe · Jan 26, 2011

Now let's go back to more simple, "Google-style" questions. This question has actually been asked in a Google interview.

You have an array of numbers, and a sum. Find how many pairs in the array sum to x!

Choose your language. As usual, the more exotic the better.
Yalla!

Padre · Jan 27, 2011

OK here is my solutions, it's not the best, but it works.

void sort(int *ary, int size)
{
	int i;
	int j;
	for (i=0;i<size;i++)
		for (j=i;j<size;j++)
		{
			if (ary[i]<ary[j])
			{
				int tmp=ary[i];
				ary[i]=ary[j];
				ary[j]=tmp;
			}
		}
}


void findPairs(int *ary, int Size, int sum)
{
	int i=0;
	int j=Size-1;
	while (i<j)
	{
		if (ary[i]+ary[j]==sum) printf("PAIRS: {%i,%i}\n",ary[i],ary[j]);
		if (ary[i]+ary[j]<sum)
		{
			j--;
		}else
		{
			i++;
		}
		
	}
}
int main()
{

	int ary[]={8,2,3,7,5,5,14};
	sort(ary,7);
	for (int i=0;i<7;i++)
		printf("%i ",ary[i]);

	printf("\nFinding Pairs for sum of: %i \n",10);
	findPairs(ary,7,10);
	return 1;
}

Padre · Jan 27, 2011

i did use bubble sort because it's not a sorting contest :P

result

14 8 7 5 5 3 2
Finding Pairs for sum of: 10
PAIRS: {8,2}
PAIRS: {7,3}
PAIRS: {5,5}

Zusynoid-x · Jan 27, 2011

it's very "brute force" i mean it works but i doubt its the most optimal ... still working on mine

Jjsaade · Jan 27, 2011

@Padre: why did you sort?

<?php
$ar = array(8,2,3,7,5,5,14);
$sum = 10;
$l = count($ar);
for($i =0; $i<$l; ++$i){
	if($ar[$i] > $sum) continue;// cannot be paired with anything into sum
	for($j =$i+1; $j<$l; ++$j){
		if($ar[$j] > $sum) continue;// cannot be paired with anything into sum
		if($ar[$i] + $ar[$j] == $sum) echo "Found Pair: ".$ar[$i]." , ".$ar[$j]."<br/>";
	}
}
?>

Padre · Jan 27, 2011

well, sorting algos can be really optimized, and you can actually sort while inserting and what have you, and taking it from there is quite easy and goes in one pass. otherwise, i think u'll have to test them all against each other. didn't want to do that.

i'll try to find another solution on lunch break :)

xterm · Jan 27, 2011

Me no care about optimization. Here it is in groovy, you can copy the code and paste it here and execute. You can replace '10' with the sum.

(1..10).subsequences().findAll { it.size() == 2 && it[0] + it[1] == 10 }

Exotic enough?

arithma · Jan 27, 2011

Is the absolute optimal solution exotic enough?

arithma · Jan 27, 2011

Here's a little solution that took me a while to cook. It's pretty tricky to get right, but I do believe this was good effort towards a highly optimized machine for solution (C++0x, compiles well with Visual Studio 2010, I bet gcc would breathe through as well):

PS: It's long because I painstakingly decided to write up something close to asymptotic optimal performance.
The basic idea is to start a binary search for the the first number equal to or larger than the sum divided by two.
- If that number is found (which if multiplied by two gives the sum), it will be handled as a special case.
- Otherwise
a - do a binary search for the complement of that number. (If found, add the counts by multiplication and some additional binary searches for counting).
b - If not found, seed a new value for a "higher" value, using the current "lower" value.
Repeat till all numbers are considered.

There's still a bug with negative numbers, I think. [edit]Just confirmed that it does not mess up with negatives[/edit]

#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;

int count_pairs_equal_to_sum(vector<int> v, int sum){
	int count = 0;
	sort(v.begin(), v.end());

	auto upper = lower_bound(v.begin(), v.end(), sum / 2);
	if(*upper * 2 == sum){
		auto upper_next = upper_bound(upper, v.end(), *upper);
		auto n = upper_next - upper;
		count += n*(n-1)/2;
		upper = upper_next;
	}
	auto bottom = upper;

	while(upper != v.end()){
		auto bottom_prev = bottom;
		bottom = lower_bound(v.begin(), bottom, sum - *upper);
		if(*upper + *bottom == sum){
			count += (upper_bound(bottom, bottom_prev, *bottom) - bottom) * (upper_bound(upper, v.end(), *upper) - upper);
		}
		if(bottom == v.begin()){
			return count;
		}
		upper = lower_bound(upper, v.end(), sum - *(bottom-1));
	}
	return count;
}

int main(){
	int arr[] = {0, 1, 2, 2, 3, 4, 5, 6, 7, 8, 9, 10};
	vector<int> v(arr, arr+sizeof(arr)/sizeof(int));
	cout << count_pairs_equal_to_sum(v, 8) << endl;
	return 0;
}

Note: There's still some ironing in progress. I found some boogies in the results. [edit2]Ok, corrected. Forgot to safe guard with a return if we never enter into the while loop.[/edit2]

arithma · Jan 28, 2011

@ZeRaW: Your solution assumes that there are no negative numbers. It's not stated in the problem statement though that all integers are positive.
@Padre: Your if statment (i + j < sum) is interesting. Quite keen. The main drawback however is the fact that it has to go linearly through the array instead of split each range (top and bottom) at every step. I kind of removed tracking the bottom range to an extent, and relied solely on the top.

We have some very different implementations in this thread.
This exercise is a success.

Ayman · Jan 28, 2011

Here is my 2 minute classic solution

       int[] nums = {1,8,7,5,9,3,2};
		int target = 10;  
		
		for(int i = 0; i < nums.length; i++)
		{
			for(int j = (i+1); j <nums.length;j++)
			{
				if(nums[i] + nums[j] == target)
					System.out.println(nums[i] + " and " + nums[j]); 
			}
		}

Not exotic at all but it just works, I don't have enough time to think of something better just wanted to contribute to the thread. I really loved arithma's creative solution and xterm's super exotic solution, groovy seems really cool btw, I have to give it a try.

xterm · Jan 28, 2011

AymanFarhat wrotegroovy seems really cool btw, I have to give it a try.

You already do know groovy :-)

Take any java program you have, rename the file to .groovy and run it.

P.S.: On a more serious note, I don't mind giving you a run through groovy. You live close by, we can schedule a session. (Everyone else is more than welcome as well)

MSD · Jan 28, 2011

Here is my implementation, it makes use of a Black Red Tree (NGenerics library) for performance and makes a single pass on the list of input integers. I tested a list of up to 100,000 entries randomly generated and got results around 3.5 seconds, here is the code in C#:

using System;
using System.Collections.Generic;
using System.Text;
using System.Linq;
using System.IO;
using NGenerics.DataStructures.Trees;

namespace ConsoleApplication1
{
    class Program
    {
        public class Pair
        {
            public int First { get; set; }
            public int Second { get; set; }
        }
        public static LinkedList<Pair> GetPairs(int[] input, int sum)
        {
            RedBlackTree<int, List<int>> indexBuffer = new RedBlackTree<int, List<int>>();
            LinkedList<Pair> resultPairs = new LinkedList<Pair>();
            for (int i = 0; i < input.Length; i++)
            {
                List<int> compliments, toAddTo;
                bool foundCompliment = indexBuffer.TryGetValue(input[i], out compliments);

                if (foundCompliment)
                    foreach (int index in compliments)
                        resultPairs.AddLast(new Pair { First = i, Second = index });

                int addIndex = sum - input[i];
                if (!indexBuffer.TryGetValue(addIndex, out toAddTo))
                    indexBuffer.Add(addIndex, (toAddTo = new List<int>()));
                toAddTo.Add(i);

            }
            return resultPairs;
        }
        private static Random rnd = new Random(DateTime.Now.Millisecond);
        public static int[] GetRandomInput(int size, int min, int max)
        {
            int[] result = new int[size];
            for (int i = 0; i < size; i++)
                result[i] = rnd.Next(min, max);
            return result;
        }
        
        static void Main(string[] args)
        {
            DateTime start = DateTime.Now;
            int[] input = GetRandomInput(100000, -200, 200);
            LinkedList<Pair> pairs = GetPairs(input, 55);
            DateTime end = DateTime.Now;
            Console.WriteLine("Found {0} pairs in {1} seconds.", pairs.Count, end.Subtract(start).TotalSeconds);
            Console.Read();
        }
    }
}

A heads up for whoever wants to mess with this code on larger input data: This solution eats up a lot of memory and once you hit a limit, things get ugly, I am still trying to figure out what makes it use that much memory. I ran the tests on a Desktop with 4 GB memory.

arithma · Jan 28, 2011

This algorithm returns in a fraction of a second. It's made to benchmark head to head with MSD's code. I think there must be something fundamentally ruining your memory usage and speed.
An example output for my algo is (this is for a million numbers of input, distribution of numbers apply):

1068470710
Time = 2216milliseconds

#include <iostream>
#include <vector>
#include <algorithm>
#include <cstdlib>
using namespace std;

int count_pairs_equal_to_sum(vector<int> v, int sum){
	int count = 0;
	sort(v.begin(), v.end());

	auto upper = lower_bound(v.begin(), v.end(), sum / 2);
	if(upper == v.end())
		return count;

	if(*upper * 2 == sum){
		auto upper_next = upper_bound(upper, v.end(), *upper);
		auto n = upper_next - upper;
		count += n*(n-1)/2;
		upper = upper_next;
	}
	auto bottom = upper;

	while(upper != v.end()){
		auto bottom_prev = bottom;
		bottom = lower_bound(v.begin(), bottom, sum - *upper);
		if(*upper + *bottom == sum){
			count += (upper_bound(bottom, bottom_prev, *bottom) - bottom) * (upper_bound(upper, v.end(), *upper) - upper);
		}
		if(bottom == v.begin()){
			return count;
		}
		upper = lower_bound(upper, v.end(), sum - *(bottom-1));
	}

	return count;
}

int main(){
	//int arr[] = {1, 2, 2, 2, 2, 2, 2};
	auto start = clock();
	vector<int> v;
	for(int i = 0; i < 1000000; i++){
		v.push_back(rand()%400-200);
	}
	int c = count_pairs_equal_to_sum(v, 55);
	auto end = clock();
	cout << count_pairs_equal_to_sum(v, 55) << endl;
	cout << "Time = " << (end - start)*1000/CLOCKS_PER_SEC << "milliseconds" << endl;
	return 0;
}

MSD · Jan 28, 2011

Other than the fact that you are comparing your count function with my GetPairs method which actually returns a list of Pairs, thus it is obvious a count will be faster, still your approach decreases the size of the data to deal with as the program progresses which is faster than my approach of dealing with the same tree from start till end.

arithma · Jan 28, 2011

First of all, your code will return more pairs than there should be with an input like [1 2 2 2 3] and a target sum of 4. It will return 9 pairs instead of 3 pairs that refer to the indices of the two's.

Secondly, there's another optimization in there. Blocks of a repeated single integer are dealt with in bulk. These blocks are found using the already set bounds, using a binary search.

After converting my code to generate a list instead of a count, I was surprised to have runtimes of up to 48seconds.
However compiling in release mode did bring it down to something like this (for an input of 100*1000):

10685371
Time = 1135milliseconds

This is the changed code

#include <iostream>
#include <vector>
#include <algorithm>
#include <cstdlib>
#include <tuple>
#include <ctime>
#include <list>
using namespace std;

list<tuple<int, int>> get_pairs_equal_to_sum(vector<int> v, int sum){
	list<tuple<int, int>> result;
	sort(v.begin(), v.end());

	auto upper = lower_bound(v.begin(), v.end(), sum / 2);
	if(upper == v.end())
		return result;

	if(*upper * 2 == sum){
		auto upper_next = upper_bound(upper, v.end(), *upper);
		auto n = upper_next - upper;
		tuple<int, int> t(*upper, *upper);
		for(int i = 0; i < n*(n-1)/2; i++){
			result.push_back(t);
		}
		upper = upper_next;
	}
	auto bottom = upper;

	while(upper != v.end()){
		auto bottom_prev = bottom;
		bottom = lower_bound(v.begin(), bottom, sum - *upper);
		if(*upper + *bottom == sum){
			int count = (upper_bound(bottom, bottom_prev, *bottom) - bottom) * (upper_bound(upper, v.end(), *upper) - upper);
			tuple<int, int> t(*bottom, *upper);
			for(int i = 0; i < count; i++){
				result.push_back(t);
			}
		}
		if(bottom == v.begin()){
			return result;
		}
		upper = lower_bound(upper, v.end(), sum - *(bottom-1));
	}

	return result;
}

int main(){
	//int arr[] = {1, 2, 2, 2, 2, 2, 2};
	auto start = clock();
	vector<int> v;
	for(int i = 0; i < 100 * 1000; i++){
		v.push_back(rand()%400-200);
	}
	auto c = get_pairs_equal_to_sum(v, 55);
	auto end = clock();
	cout << c.size() << endl;
	cout << "Time = " << (end - start)*1000/CLOCKS_PER_SEC << "milliseconds" << endl;
	return 0;
}

MSD · Jan 29, 2011

I have not tested suggested input with code above, but I have worked on the code a bit to increase performance using ArrayList instead of LinkedList among other things and here is the code which as I stated before is not the most performant anyways. The following code works with input suggested:

using System;
using System.Collections.Generic;
using System.Text;
using System.Linq;
using System.IO;
using System.Collections;

namespace FindThePair
{
    class Program
    {
        public class Pair
        {
            public int First { get; set; }
            public int Second { get; set; }
        }
        public static ArrayList GetPairs(int[] input, int sum)
        {
            Dictionary<int, ArrayList> indexBuffer = new Dictionary<int, ArrayList>();
            ArrayList resultPairs = new ArrayList();
            for (int i = 0; i < input.Length; i++)
            {
                ArrayList compliments, toAddTo;
                bool foundCompliment = indexBuffer.TryGetValue(input[i], out compliments);

                if (foundCompliment)
                    foreach (int index in compliments)
                        resultPairs.Add(new Pair { First = i, Second = index });

                int addIndex = sum - input[i];
                if (!indexBuffer.TryGetValue(addIndex, out toAddTo))
                    indexBuffer.Add(addIndex, (toAddTo = new ArrayList()));
                toAddTo.Add(i);

            }
            return resultPairs;
        }
        
        private static Random rnd = new Random(DateTime.Now.Millisecond);
        public static int[] GetRandomInput(int size, int min, int max)
        {
            int[] result = new int[size];
            for (int i = 0; i < size; i++)
                result[i] = rnd.Next(min, max);
            return result;
        }

        static void Main(string[] args)
        {

            DateTime start = DateTime.Now;
            //int[] input = GetRandomInput(400000, -200, 200);
            int[] input = { 1, 2, 2, 2, 3 };
            ArrayList pairs = GetPairs(input, 4);
            int count = pairs.Count;
            DateTime end = DateTime.Now;
            Console.WriteLine("Found {0} pairs in {1} seconds.", count, end.Subtract(start).TotalSeconds);
            Console.Read();
        }
    }
}

Ayman · Jan 29, 2011

xterm wroteYou live close by, we can schedule a session. (Everyone else is more than welcome as well)

Hopefully after the exams I can pay you a visit, greatly appreciated man :)

Joe · Jan 8, 2012

My solution takes advantage of python's dictionaries.
It consists of building a hash map tracking the number of occurrences of each member in the list. It will then test the existence of dict[total-i].

The script also takes into account multiple occurrences of correct pairing.

#!/usr/bin/python

total = 13
a = [4,5,6,1,3,8,3,6]


# dict {  value: nbr_of_ocurrences}
# in the case of my random array 'a' here's the dict
# I build:
#
# {
#   1: 1, 
#   3: 2, 
#   4: 1, 
#   5: 1, 
#   6: 2,
#   8: 1
# }

d = {}
for i in a:
    if self.has_key(i):
        self[i] += 1
    else:
        self[i] = 1

for i in d.keys():
    if i < total/2 and d.has_key(total-i): #Note you can only divide by 2 if all the members are positive integers
        for first in range(d[i]):
            for second in range(d[total-i]):
                print (i, total-i)

xterm · Jan 9, 2012

for i in a:
    if self.has_key(i):
        self[i] += 1
    else:
        self[i] = 1

can simply be written as:

for i in set(a):
    d[i] = a.count(i)