Hello geeks.

Today's exercise will be a little easier so that more people can participate. At the same time, you will notice that it will ask a lot of skills if you want to do things right. So without further ado, let's get started:

File Stats

This exercise is a variation on the first exercise we had on the forum: Word Count.

This time, you are asked to give more tricky data on your file namely:

* Number of characters.
* Number of characters without the space char.
* Number of words.
* Number of paragraphs.

As a reference, you can test your program on this file.

Feel free to create a GUI or deploy a webapp that does that, you don't have to be limited to the console. Oh, and you're free to use exotic languages. Bonus points for who does it in Erlang :D

Optional development for Unix developers:

The file should be sent as a command line argument, and your program should also have command line options:

-o or --output: specify a name for an output file.
-v or --verbose: shows the commands on the screen.
-h or --help: display a help.
Do the *nx'es have some library support to handle those command line arguments?
Yes. Well I don't know about other Unices, but most Linux distributions come equipped with the <getopt.h> library. The library is part of the GNU C library, so I don't think you'd find it by default on other Unices. I know Solaris has its own proprietary implementation of a <getOpt.h>. That's as far as I know.

It's somewhat complicated to use, but believe me it is easier than having to parse that command line yourself :)
mmm I couldn't get the paragraphs count right... I'm using C# and checking whether the character is a '\n' or '\r', but it doesn't seem to work...
Does the . (dot) counts as a word ?
string allString = "";
int spacesCount = 0;
int carriageReturn = 0;
int multipleCR = 0;
string path = "YourFilePath";

int AllCharacters = 0;
int excludingSpaces = 0;
int Words = 0;
int Paragraphs = 0;

private void ReadFile()
{
    StreamReader sr = new StreamReader(path);
    allString = sr.ReadToEnd();
}

private void Display()
{
    string trimmed = allString.Trim();
    int nmb = allString.Length - trimmed.Length; // The number of leading and trailing white spaces...

    // 1 // All Characters...
    AllCharacters = allString.Length;

    // 2 // Calculating the number of spaces... // Including Leading and Trailing White Spaces...
    for (int i = 0; i < allString.Length; i++)
    {
        if (allString[i] == ' ')
            spacesCount++;

        #region Carriage Return...

        if (allString[i] == 13) // Ascii Representation of the Carriage Rerutn
        {
            // Check if next character is a CR too. If so, skip increment i by 1...
            carriageReturn++;

            if (allString[i + 2] == 13) // +2, since the new paragraph is denoted by \r\n. (2 cons. CR = \r\n\r\n
            {
                multipleCR++;
                i++; // Skip the next character...
            }
        }
        #endregion
    }

    int paragCount = carriageReturn + 1 - multipleCR;
    int wordCount = spacesCount - nmb + paragCount;

    // 2 // 
    excludingSpaces = (allString.Length - spacesCount).ToString(); // Could have used : allString.Replace(" ", "").Length.ToString();

    // 3 // +1 to include the last word...
    Words = wordCount.ToString();

    // 4 // Paragraphs Count...
    Paragraphs = paragCount.ToString();
}
And Rahmu, i couldn't use the test file you included.



I won't try again in few minutes, i have a morning class tomorrow. and i desperately need to sleep.

Edit: Results for the file you included:

All Characters: 2924
Excluding WhiteSpaces: 2532
Word Count: 400
Paragraphs: 8
Didn't bother much with the requirements, so I'll fail this. Just built something real quick to get something close to Georges's result.

P.S.: You won't be able to test this on the groovy web console because GAE disables fetchurl, but if you have groovy installed, just throw it in the console.
html_data = "http://pastebin.com/raw.php?i=0AG377K8".toURL().text.split('.dtd">')[1]
data = new XmlParser().parseText(html_data).depthFirst().pre.text().trim()

chars          = data.length()
chars_no_space = data.findAll { c -> c != ' ' }.size()
words          = data.split(' ').size()
paragraphs     = data.split('\n\n').size()
Result:

chars : 2910
chars_no_space : 2518
words : 393
paragraphcs : 8
private void calculateStats()
        {
            int totalChars = 0;
            int numNoSpace = 0;
            int numWords = 0;
            int numParags = 0;

            if (string.IsNullOrEmpty(_contents))
            {
                MessageBox.Show("File was empty!", "Error!");
            }
            else
            {
                foreach (char c in _contents)
                {
                    totalChars += 1;
                    if (c != ' ') numNoSpace += 1;
                }

                numWords = _contents.Split(new string[] { " " }, StringSplitOptions.None).Count();
                numParags = _contents.Split(new string[] { "\n\r" }, StringSplitOptions.None).Count(); 

                NumCharsTB.Text = totalChars.ToString();
                NumWordsTB.Text = numWords.ToString();
                NumParagsTB.Text = numParags.ToString();
                NoSpaceTB.Text = numNoSpace.ToString();
            }
Produces the same results like xterm.

@Georges, are you sure about the Word Count?
/* 
 * Word Count version 0.2
 * 
 * Changelog: 
 *      - Added character and paragraph count
 *      - Command line arguments. 
 * 
 * 
 * author:          Joe "rahmu" Hakim Rahme
 * last modified:   21/01/2010
 * 
 */


#include <stdio.h>
#include <stdlib.h>

enum {out, in}; /* In this case enum is more appropriate than bool */

void wordCount (FILE* myFile){

    /* Initializing variables */
    int myChar = 0;
    int wordNumber = 0;
    int charNumber = -1; /* Can't really understand the bug, so I initialize at -1 ... for now */
    int parNumber = 0; 
    int state = out; /* Starting out of a word */
    int parState = out;

    while ((myChar = fgetc(myFile)) != EOF){
        charNumber++;

        if (myChar == ' ' || myChar == '\t'){ 
            state = out;
        }

        else if (myChar == '\n' || myChar == '\r'){
            state = out;
            parState = out;
        }

        else if (state == out){ /* We enter a new word */
            state = in;
            wordNumber++;
        }

        if (parState == out){
            parState = in;
            parNumber ++;
        }
    }
    
    fprintf(stdout, "%d word(s)\n", wordNumber);
    fprintf(stdout, "%d character(s)\n", charNumber);
    fprintf(stdout, "%d paragraphs(s)\n", parNumber/2);
}


int main (int argc, char *argv[]) 
{
    FILE* myFile = NULL;

    myFile = fopen(argv[1], "r");

    if (myFile != NULL) /* Testing if fopen worked */
    {
        wordCount (myFile);
        fclose(myFile);
    }
    else if (argc == 1) /* Is the file path supplied ? */
    {
        fprintf (stderr, "No file supplied as an argument.");
    }
    else
    {
        fprintf(stderr, "There was a problem accessing the file. Please make sure the file exists and is available.");
    }

    return 0;
}
5 years later
Go code showing:
  • File manipulation
  • CLI argument parsing
  • regex
package main

import (
	"fmt"
	"io/ioutil"
	"log"
	"os"
	"regexp"
)

func get_characters_count(text []byte) int {
	return len(text)
}

func regex_non_space_count(text []byte) int {
	chars := regexp.MustCompile("[^ ]")
	return len(chars.FindAll(text, -1))
}

func get_words_count(text []byte) int {
	words := regexp.MustCompile("\\w+")
	return len(words.FindAll(text, -1))
}

func get_paragraph_count(text []byte) int {
	paragraphs := regexp.MustCompile("\n\n")
	return len(paragraphs.FindAll(text, -1))
}

func main() {
	if len(os.Args) > 2 {
		log.Fatal("Wrong number of CLI arguments")
	}

	data, err := ioutil.ReadFile(os.Args[1])
	if err != nil {
		log.Fatal(err)
	}
	fmt.Printf("chars count:\t%d\n", get_characters_count(data))
	fmt.Printf("non ' ' count:\t%d\n", regex_non_space_count(data))
	fmt.Printf("words count:\t%d\n", get_words_count(data))
	fmt.Printf("parag count:\t%d\n", get_paragraph_count(data))
}