LebGeeks

A community for technology geeks in Lebanon.

You are not logged in.

#1 September 20 2010

Joe
Member

Exercise - WordCount

This will be the first of a series of Programming exercises I'm counting on proposing on the forum. The idea is simple: I propose a subject, anyone who wants to participate can post his code. It is not a competition. The idea is to get code review from other developers. So here goes:

WordCount

You are asked to create a program that counts the number of words in a text file.

Definition of a word: The longest continuous sequence of alphanumerical characters. Words can be separated by space character (" "), \t, \r or \n.

For the sake of simplicity, we won't go into the validation of the input, considering that it will always be a valid text file (as opposed to a bitmap for example).

Oh and one last thing, program should be coded in C.

I'm gonna start working on it, and post my code as soon as I'm done. Do not hesitate to post your code, questions or whatever.

Next exercise will be Java :)

Have fun.

N.B: It is not a competition, don't try to post your code super fast. Take your time to develop a pretty (I want to say artistic) solution.

Offline

#2 September 21 2010

rolf
Member

Re: Exercise - WordCount

<?php

$file = "somedir/somefile.txt";
$count =str_word_count(file_get_contents($file));
echo "$file contains $count words";

?>

Then autogenerate C code from that using HipHop.

Sorry, couldn't resist :-)
When I will have free time, later this week hopefully, I'll get into the real problem, because I want to get into C.

Offline

#3 September 21 2010

mrmat
Member

Re: Exercise - WordCount

Here's mine. Not much of a C/C++ guy, but concepts are all the same.

By the way rahmu, why not disallowing the use of libraries, such as RegEx etc...

#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>



bool isDelimiter (char Chr){
int i;
char word_delimiters[] = {'\n', '\r', '\t',' ',EOF};

    for(i=0;i<sizeof(word_delimiters);i++)
    {
        if (Chr == word_delimiters[i]) {
            return true;
        }
    }
return false;
}


int getWordsCount() {
    FILE *fp;
    int curChar,prevChar;
    int NOW = 0;

    fp = fopen("C:\\words.txt", "r");
    prevChar = '\n';


    while((curChar= fgetc(fp))) {

        if (isDelimiter(curChar) && !isDelimiter(prevChar))
       {
          NOW++;
       }

       if (curChar == EOF) {
       break;
       }

      prevChar = curChar;
    }


    fclose(fp);
   return NOW;
}

int main()
{


printf("Number of Words = %i",getWordsCount());


    return 0;
}

The above code assumes that non alphanumeric chars such as brackets etc... are part of a word. But can be changed, by adding them to the [word_delimiters] array, or the whole code can be changed to check whether the character is alphanumeric or not, rather than checking for a delimiter.

Last edited by mrmat (September 21 2010)

Offline

#4 September 21 2010

Joe
Member

Re: Exercise - WordCount

Nice one mrmat!

I have simply one remark: I would suggest you stay away from stdbool.h and the bool type when coding in C. It was introduced as part of C99, and is still not included in all compilers. As a general rule, I tend to ignore this library, and consider that bool have been introduced with C++ (which is true to a certain extent).

Also, I consider the line

prevChar = '\n'

to be misleading. I understand that it's meant for the first iteration of the loop, but some clearer code should be considered. Typically, this is one of those times where I would use a do ... while loop instead of a simple while.

All in all good code. I'm glad I got quality code as a response to my first exercise.

I'll post mine tomorrow, not using bool, you'll tell me then what you think ;-)

Offline

#5 September 22 2010

mrmat
Member

Re: Exercise - WordCount

Thanks for the bool warning.

I'm aware  that the line

prevChar = '\n'

is sort of misleading. But the main purpose is to set the beginning of the file to a delimiter, so it wouldn't be counted as a word, have a look closely at the code and you'll understand. Tried using do ... while but as far as i remember it failed somewhere. i'm sure that a better approach can be used.

Anyway waiting impatiently to see what you'd come up with

Last edited by mrmat (September 22 2010)

Offline

#6 September 22 2010

Joe
Member

Re: Exercise - WordCount

Here's my code:

#include <stdio.h>
#include <stdlib.h>

enum {out, in}; /* In this case enum is more appropriate than bool */

void wordCount (FILE* myFile)
{
    int myChar, wordNumber, state;

    /* Initializing variables */
    myChar = 0;
    wordNumber = 0;
    state = out; /* Starting out of a word */

    while ((myChar = fgetc(myFile)) != EOF)
    {
        printf("%c", myChar);
        if (myChar == ' ' || myChar == '\t' || myChar == '\n' || myChar == '\r')
        {
            state = out;
        }
        else if (state == out) /* We enter a new word */
        {
            state = in;
            wordNumber++;
        }
    }
    
    printf("\n%d word(s)\n", wordNumber);
}


int main (int argc, char *argv[]) 
{
    FILE* myFile = NULL;

    myFile = fopen(argv[1], "r");

    if (myFile != NULL) /* Testing if fopen worked */
    {
        wordCount (myFile);
        fclose(myFile);
    }
    else if (argc == 1) /* Is the file path supplied ? */
    {
        printf ("No file supplied as an argument.");
    }
    else
    {
        printf("There was a problem accessing the file. Please make sure the file exists and is available.");
    }

    return 0;
}

It doesn't use bool, instead uses enum, which have many advantages (the most important of which being improving source code readability). I kow that enum was added to the original C later on. I wonder if it follows the C89 norm.

Instead of enums, defines could be used.

Also, in this code, file is supplied as a command line argument, allowing you not to change the source code (and recompile), every time you need to change files.

I hope the code is clear enough. Do not hesitate to give any remarks and criticism, I'm listening :)

Offline

#7 September 22 2010

Padre
Member

Re: Exercise - WordCount

wouldn't have written it this way, but i guess it works :)

Last edited by Padre (September 22 2010)

Offline

#8 September 22 2010

Joe
Member

Re: Exercise - WordCount

Show us some code :)

Offline

#9 May 3 2012

Joe
Member

Re: Exercise - WordCount

My first one liner!

# filename is a string with your file's name.
with open(filename) as f: reduce (lambda x, y: x+y, [len (x.split()) for x in f])

The lambda could (should) be replaced by a call to operator.add() but the import statement would've made this not be a one liner

Also it's stupid that you cannot just use '+'.

Offline

#10 May 3 2012

xterm
Moderator

Re: Exercise - WordCount

>.<

len(open(filename).read().split())

Off-topic: Your code reminds me of when I got so excited with reduce, that I used it to join a list of strings instead of simply using join.

Offline

#11 May 3 2012

Joe
Member

Re: Exercise - WordCount

ARRRGH you win!

:)

Off-topic: Yeah, I guess reduce tends to do that to you. You want to use it everywhere, just because you can.

Offline

#12 May 4 2012

NuclearVision
Member

Re: Exercise - WordCount

I'm not into objective-C, but if i was i would write a code that only counts spaces and separation signs.
I can write a python code.
"Soit l'expression", "How are you mate?" let s be the number of spaces, and n the number of words, n=s+1, it is noticeable. So that my code in Python2 will be:

expression = raw_input("Enter the expression: ")
i = map(str,list(expression))
def words(list):
    s=list.count(" ")
    return s+1
print words(i)

or

from urllib2 import *
expression = read("%file%path%")
i = map(str,list(expression))
def words(list):
    s=list.count(" ")
    return s+1
print words(i)

Last edited by NuclearVision (May 4 2012)

Offline

#13 May 4 2012

Joe
Member

Re: Exercise - WordCount

hmmm are you sure?

How many words do you count in the following sentence :)

Hello          Lebgeeks!

Offline

#14 June 19 2014

Johnaudi
Member

Re: Exercise - WordCount

while(str.Contains("  ")) str = str.Replace("  ", " ");
Int n_words = str.Trim().Split(' ').Length;

I could have used lambda, but I'm really bad at understanding them. I will once I finish reading a few.

Last edited by Johnaudi (June 19 2014)

Offline

#15 June 19 2014

NuclearVision
Member

Re: Exercise - WordCount

j=0;s='your text  here   '+'.'
for i in xrange(0,len(s)+1):
  try:
   if s[i]!=" " and s[i+1]==' ':
      j+=1
  except IndexError: pass
print j
      

Offline

#16 June 19 2014

xterm
Moderator

Re: Exercise - WordCount

Johnaudi, NuclearVision,

The requirements state that you need to handle \n \r \t as well.

Offline

#17 June 19 2014

Johnaudi
Member

Re: Exercise - WordCount

xterm wrote:

Johnaudi, NuclearVision,

The requirements state that you need to handle \n \r \t as well.

Alright:

while(str.Contains("  ") || str.Contains('\n') || str.Contains('\r') || str.Contains('\t')) {
  str = str.Replace("  ", " ") = str.Replace("  ", '\n') = str.Replace("  ", '\r') = str.Replace("  ", '\t');
}
Int n_words = str.Trim().Split(' ').Length;

Last edited by Johnaudi (June 19 2014)

Offline

#18 June 19 2014

NuclearVision
Member

Re: Exercise - WordCount

from string import replace
j=0;s='your text  here   '+'.'
for i in ["\t","\r","\n"]:
    s=replace(s,i," ")
for i in xrange(0,len(s)+1):
  try:
   if s[i]!=" " and s[i+1]==' ':
      j+=1
  except IndexError: pass
print j
      

Thanks xterm I missed it :)

Offline

#19 June 20 2014

xterm
Moderator

Re: Exercise - WordCount

If you want to manually iterate over the given string, there's no reason in doing two passes (replace then split) over the content. It would be better if you modified your logic to collect the word count in the first loop. Otherwise;

The python version as seen above would be:

len(text.split())

It's exact C# implementation would be:

text.Split(
    new char[]{' ', '\t', '\r', '\n'},
    StringSplitOptions.RemoveEmptyEntries
).Length;

Offline

#20 June 20 2014

NuclearVision
Member

Re: Exercise - WordCount

Thanks xterm I didn't know that split ignores space characters!

Offline

#21 June 20 2014

Johnaudi
Member

Re: Exercise - WordCount

Thanks, didn't know we can split through char array.

Offline

#22 February 17 2016

Joe
Member

Re: Exercise - WordCount

I'm learning Go and I'm leveraging the regexp module:

package main

import (
	"fmt"
	"io/ioutil"
	"log"
	"regexp"
)

func get_words_from(text []byte) [][]byte {
	words := regexp.MustCompile("\\w+")
	return words.FindAll(text, -1)
}

func main() {
	filename := "main.go"

	data, err := ioutil.ReadFile(filename)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Printf("%d\n", len(get_words_from(data)))
}

Offline

Board footer