So all the solutions are the same: create a dictionary {word, nboccurrences} then sort by value and print.
(I'm not sure about geek's solution, I don't really speak .. mathematica ?)
Basically, we're all relying on the "sort()" functions provided by our favorite languages. Where's the challenge?
In the spirit of "thinking outside the box", I tried to come up with something different. I store the words in a dictionary that has "numbers" as the keys, and the values will be "lists containing the corresponding words".
Here's the code, notice mainly the "addcount" function. The rest is syntaxic sugar.
#!/usr/local/bin/python2.7
class WCount:
def __init__(self, filename=None):
self.filename = filename
self._dict = {}
def wordcount(self):
with open (self.filename, "r") as f:
try:
while True:
_l = f.next().strip("\n")
for word in _l.split():
self.addcount(word)
except:
self.pprint()
def addcount(self, word):
_tmpindex = 0
for n in self._dict.keys():
if word in self._dict[n]:
_tmpindex = n
if _tmpindex != 0:
self._dict[_tmpindex].remove(word)
else:
self._dict[_tmpindex] = []
if not self._dict.has_key(_tmpindex + 1):
self._dict[_tmpindex + 1] = []
self._dict[_tmpindex + 1].append(word)
def pprint(self):
for num in sorted(self._dict.iterkeys(), reverse=True):
if len(self._dict[num]) != 0:
print "{}\t{}".format(num, self._dict[num])
def main():
import sys
w = WCount(sys.argv[1])
w.wordcount()
if __name__ == "__main__":
main()
The code still needs a lot of polishing, but it basically implements my idea. I've never studied algorithms before, so if anyone has any criticism, I'm all ears (well, eyes).
Maybe we can come up with any better solution? Are there any Knuths in the house? ;)
EDIT Just tested it, no time difference for small files. For very large files, my second solution is extremely slow. Maybe there's something wrong with my implementation. Or my reasoning. Probably both.