Read a file of text, determine the n most frequently used words, and print out a sorted list of those words along with their frequencies.
can you provide the text? or should we just get a random one?
it doesn't really matter you can chose randomly. I should be able to test it on a test of my liking.
Easy solution in python
#!/usr/local/bin/python2.7

class WCount:

    def __init__(self, filename=None):
        self.filename = filename
        self._dict = {}

    def wordcount(self):
        with open (self.filename, "r") as f:
            try:
                while True:
                    _l = f.next().strip("\n")
                    for word in _l.split():
                        self.addcount(word)
            except:
                self.pprint()


    def addcount(self, word):
        if self._dict.has_key(word):
            self._dict[word] += 1
        else:
            self._dict[word] = 1 

    def pprint(self):
        for word in sorted(self._dict, key=self._dict.get, reverse=True):
            print "{}\t{}".format(word, self._dict[word])
    

def main():
    import sys 

    w = WCount(sys.argv[1])
    w.wordcount()

if __name__ == "__main__":
    main()
well here is mine!!!
words = input("Pease insert your text: ").replace(",", " ").replace(".", " ").replace(":", " ").replace(";", " ").replace("!", " ").replace("?", " ").lower().split()

print(words)

listOfWords = []
freqOfWords = []

for i in words:
    if i not in listOfWords:
        listOfWords.append(i)
        freqOfWords.append(words.count(i))

freqAndWords = []

for i in range(len(freqOfWords)):
    freqAndWords.append(str(freqOfWords[i]) + " = " + listOfWords[i])

freqAndWords.sort()


for i in range(len(freqAndWords)):
    print(freqAndWords[-(i+1)])
PERL!
#!/usr/bin/perl

use strict;
use warnings;

my ($n, $f) = @ARGV;
my @words;
my %unique;
open IN, "<", $f or die;
push(@words, split(' ', lc $_)) while (<IN>);

foreach my $word (@words) {
    $word =~ s/[[:punct:]]//g;
    if (grep {$_ eq $word} keys(%unique)) { $unique{$word}++; }
    else { $unique{$word} = 1; }
}

my @freqs = sort {$b <=> $a} values(%unique);
foreach my $i (0 .. $n - 1) {
    while (my ($k, $v) = each(%unique)) {
	if ($v == $freqs[$i]) { print "$k: $v\n"; last; }
    }
}
I'm sure it can be condensed into a one-liner but I'm not smart enough to do it.

By the way, I recognized this problem from this article on Donald Knuth and Dough McIlroy. McIlroy's solution is absolutely brilliant, and shows the real power of the Unix style.
Sort[Tally[StringSplit[StringReplace[ToLowerCase@Import["lipsum.txt"], RegularExpression["\\W"] -> " "]]], #1[[2]] > #2[[2]] &] // TableForm
geek, what language is this? Its use of square brackets weirds me out ...
Ruby
text = File.open("/path/to/file", "r").read
words = text.split
bucket = {}
words.each do |w|
    bucket[:"#{w}"].nil? ? bucket[:"#{w}"] = 1 : bucket[:"#{w}"] += 1
end
bucket.sort_by { |word, fq| fq }
rahmu wrotegeek, what language is this? Its use of square brackets weirds me out ...
It smells like Mathematica. Correct me if I'm wrong.
<?php
$f='filename.txt';
$h=fopen($f, 'r');
$words = explode(" ",strtolower(fread($h, filesize($f))));
fclose($h);
foreach($words as $word){
	@$freq[$word]?$freq[$word]++:$freq[$word]=1;
}
arsort($freq);
var_dump($freq);
?>
samer wroteIt smells like Mathematica. Correct me if I'm wrong.
It is, nice one geek :)
the sorting should be by frequency or by word?
<?php
$text = file_get_contents("./file.txt");
$words = explode(" ", $text);
$index = array();
foreach($word in $words) {
    $tWord = strtolower(trim($word)); // remove needless spaces and change to lower case
    if (!empty($tWord)) {
       isset($index[$tWord])? $index[$tWord] ++ : $index[$tWord] = 1 ;
    }
}
ksort($index);
?>
<pre>
<? print_r($index); ?>
</pre>
no c++?
#include <iostream>
#include <fstream>
#include <string>
#include <algorithm>
#include <map>
using namespace std;
void main()
{
	fstream file("file.txt",ios_base::in);
	string word;
	map<string, int> dictionary;
	map<string, int>::iterator p;
	while(!file.eof()){
		file>>word;
		transform(word.begin(), word.end(), word.begin(), tolower);
		p = dictionary.find(word);
		if(p!= dictionary.end()) p->second++;
		else dictionary.insert(make_pair(word,1));
	}
	file.close();
	for (map< string, int>::const_iterator iter = dictionary.begin();iter != dictionary.end(); ++iter )
      cout << iter->first << '\t' << iter->second << '\n';
}
@ZeRaW: Your solution doesn't sort the output, it merely stores all the issues in a dictionary.
Also, it seems that it bugs on certain case. Try putting the following in the file:
file.txt
j jj j
output:

j 3
jj 1
rahmu wrote@ZeRaW: Your solution doesn't sort the output, it merely stores all the issues in a dictionary.
Also, it seems that it bugs on certain case. Try putting the following in the file:
file.txt
j jj j
output:

j 3
jj 1
stl map sorts by key so it sorts alphabetically by word, this was my first question, are you sorting by frequency or alphabetically :)

in anycase my output for j jj j is: [vs 2010]
j       2
jj      1
I am not sure what compiler you are using, but the default behavior of stl::map is to sort.

Smaller code :)
void main()
{
	fstream file("file.txt",ios_base::in);
    string word;
    map<string, int> words;
    while(file>>word){
       words[word]++;
    }
    file.close();
    for (map<string, int>::const_iterator iter = words.begin();iter != words.end(); ++iter )
      cout << iter->first << '\t' << iter->second << '\n';
}
So all the solutions are the same: create a dictionary {word, nboccurrences} then sort by value and print.
(I'm not sure about geek's solution, I don't really speak .. mathematica ?)

Basically, we're all relying on the "sort()" functions provided by our favorite languages. Where's the challenge?


In the spirit of "thinking outside the box", I tried to come up with something different. I store the words in a dictionary that has "numbers" as the keys, and the values will be "lists containing the corresponding words".

Here's the code, notice mainly the "addcount" function. The rest is syntaxic sugar.
#!/usr/local/bin/python2.7

class WCount:

    def __init__(self, filename=None):
        self.filename = filename
        self._dict = {}

    def wordcount(self):
        with open (self.filename, "r") as f:
            try:
                while True:
                    _l = f.next().strip("\n")
                    for word in _l.split():
                        self.addcount(word)
            except:
                self.pprint()


    def addcount(self, word):
        _tmpindex = 0
        for n in self._dict.keys():
            if word in self._dict[n]:
                _tmpindex = n

        if _tmpindex != 0:
            self._dict[_tmpindex].remove(word)
        else:
            self._dict[_tmpindex] = []

        if not self._dict.has_key(_tmpindex + 1):
            self._dict[_tmpindex + 1] = []

        self._dict[_tmpindex + 1].append(word)

    def pprint(self):
        for num in sorted(self._dict.iterkeys(), reverse=True):
            if len(self._dict[num]) != 0:
                print "{}\t{}".format(num, self._dict[num])
    

def main():
    import sys 

    w = WCount(sys.argv[1])
    w.wordcount()

if __name__ == "__main__":
    main()
The code still needs a lot of polishing, but it basically implements my idea. I've never studied algorithms before, so if anyone has any criticism, I'm all ears (well, eyes).

Maybe we can come up with any better solution? Are there any Knuths in the house? ;)


EDIT Just tested it, no time difference for small files. For very large files, my second solution is extremely slow. Maybe there's something wrong with my implementation. Or my reasoning. Probably both.
Map is self balanced binary search tree all oprations are O(log n)

@ rahmu i am not sure about your logic you are trying to avoid sorting
But in the implementation the append/remove operations could be costly.
As a c++ implementation i would pre-allocate a large enough frequency array
And minimize append/remove to address operations i ll try to implement something :)
Fast implementation for rahmu's algorithm:
Some notes:
- using stl::map as a hash table
- preallocating some values to avoid dynamic allocations
- I am pretty sure this performs slowly for large files (and is expensive)
#include <iostream>
#include <fstream>
#include <string>
#include <algorithm>
#include <map>
#include <vector>
#include <list>
using namespace std;
void main()
{
	string word;
	const size_t maxSize = 4000;//max of frequency per word
	size_t maxCurrent = 0;
    fstream file("file.txt",ios_base::in);
	map<string,size_t> words;
	vector<list<string>> arrayList(maxSize);
    while(file>>word){
        transform(word.begin(), word.end(), word.begin(), tolower);
		arrayList[words[word]].remove(word);
		words[word]++;
		arrayList[words[word]].push_back(word);
    }
    file.close();
	//print
	for (size_t i = 1; i < arrayList.size(); ++i){
		
		if(arrayList[i].size() != 0){
			cout <<"Frequency: "<< i <<'\t'<<" Values: ";
			for (list<string>::const_iterator iter = arrayList[i].begin();iter != arrayList[i].end(); ++iter )
				cout<<*iter<<", ";
			cout<<endl;
		}
	}
}
As expected for a large file (lorem ipsum 1.52MB) my first implementation takes 0.2-0.3 seconds and this one 0.8-0.9 seconds.