Computer Systems Organization (Spring 2018)

Lab-1: Basic C Programming

Segment-1 due: 2/11
Segment-2 due: 2/18

The goal of this lab is to get your hands dirty doing some basic C programming and debugging. We have structured the lab in two segments. In the first segment, you write simple C functions in programs that we have already structured in skeleton code for you. In the second segment, you write complete C programs by yourself from the scratch.

Before you start the lab:

Set up your VM environment by following instructions here..
Set up your lab repository by following instructions here.

Programming style:

In this and subsequent labs, you will be graded for style and we will deduct up to 20% of total lab points for bad style based on our subjective evaluation. Please read this style guide and follow the advice.

Segment-1: mini-exercises (Due: 2/11)

Obtain and update your lab files by doing:

$ cd cso18-labs
$ git fetch upstream
$ git merge upstream/master

In segment-1, you shall complete the C exercises found in files part1.c, part2.c, ..., part7.c.

File modification

For the mini-exercises in subdirectory mini/, the only files that you should modify are part1.c, part2.c, ..., part7.c and part7.h. You may read other files but you should not modify them..

It is recommended that you complete and test each exercise individually. For example, suppose you have completed the exercise in part1.c. Test by typing the following:

$ make
$ build/part1
variable should be 5, actually is 3!
Aborted

From the above error message, you can see that it did not pass the test. Debug and try again. Complete and test each part individually before moving on to the next one.

Once you've passed the tests for all 7 exercises in mini/, you can double check your test-passing status by typing:

$ ./grade-lab
basic [part1.c]: 
  set_to_five: OK
  array_sum: OK
bubble sort [part2.c]: 
  swap: OK
  bubble_sort: OK
prime sieves [part3.c]: 
  initialize_array: OK
  mark_multiples: OK
  prime_number_sieves: OK
  find_smallest_divisor: OK
point structure [part4.c]: 
  set_point: OK
  point_dist: OK
linked list [part5.c]: 
  list_insert: OK
  list_end: OK
  list_size: OK
  list_find: OK
  list_remove: OK
bitwise operators [part6.c]: 
  get_exponent_field: OK
  clear_msb: OK
  bit_at_index: OK
binary tree [part7.c]: 
  preorder_traveral: OK
  inorder_traversal: OK
  inorder_traversal_with_dup: OK
Score: 100/100

The above shows the example output of a successful full test.

Saving changes while you are working on Lab1

You should frequently save your work to protect against laptop failures and other unforeseen troubles. You save the changes by first "committing" them to your local lab repo and then "pushing" those changes to the repo stored on github.com

$ git commit -am "saving lab1 changes"
$ git push origin

Note that whenever you add a new file (you do not add any new file in Phase-1, but will do so in Phase-2), you need to manually tell git to ``track it'' with git add. Otherwise, the file will not be committed by git commit. After you've pushed your changes with git push, they are safely stored on github.com. Even if your laptop catches on fire in the future, those pushed changes can still be retrieved. However, you must remember that git commit by itself does not save your changes on github.com (it only saves your changes locally). So, don't forget to type that git push origin.

Handin Procedure

To handin your files, simply commit and push them to github.com

$ git commit -am "hand in lab1, segment-1"
$ git push origin

We will fetching your lab files from Github.com at the specified deadline and grade them.

Debugging the lab

Below are some advice on how to debug common problems encountered in doing this lab:

Remember to recompile changed code. Whenever you've changed a file, always type make to re-compile before executing again.
Write your own simple test code. Don't rely solely on the lab's testing infrastructure (i.e. ./grade-lab) to test the correctness of your code. Except for part7, we do not distribute the source of the test code (hence you will only see cso18-labs/clab/mini/static/part1_harness.o and not cso18-labs/clab/mini/static/part1_harness.c). This makes it hard to debug. So you should write your own tester.

Let's say you want to test the array_sum function in file part1.c. To write your own test code, create a file (e.g. called test-part1.c) with a main function that invokes array_sum in various ways to test its correctness.
An example test-part1.c might look something like this.
Compile your test code by typing:
```
$ gcc -std=c99 test_part1.c part1.c
```
Read part7_harness.c test code. For part7, we distribute the source of the tester code in part7_harness.c. Read this file carefully to debug your part7 code.
Learn to use gdb. It is absolutely essential that you learn to use gdb. This tool is a must for helping you handle ``segmentation fault'', which you'll see a lot when doing labs. Let's say you have a segfault on part1. Invoke gdb by typing:
```
$ gdb build/part1
(gdb) bt
```
bt is the gdb command to print the stack trace which tells you where the problematic execution occurs and how the program got there. Recitation 2 has covered the use of gdb. There are also many tutorials online on gdb.

Challenge question for part7 (2 out of 8 total lab points)

In part7.c, you've implemented a sorted binary tree and tested it using small inputs (up to 100 inputs). Now test it with very large inputs, e.g. by typing build/part7 -n 10000, which attempts to insert the same 10000 strings in a tree (and doing it in two different orders). Read part7_harness.c and run the code to investigate its behavior. Answer the following questions.

Q1: Run the test with different large number of strings. Does inserting the same set of strings in the two different order incur different runtime? Why?
Q2: Run the test with increasingly large number of strings. At which number does the program eventually fail? Does performing insertion in different order both fail? If not, why does one fail and not the other?

Write your answer in a file called challenge.md and add it to the repo git add challenge.md. Commit and push with git commit -am "complete challenge"; git push origin

Segment-2: Write C from scratch (Due: 2/18)

In segment-2 of this lab, we provide no crutches (no skeleton code, no Makefile, no grading testers) to make you get comfortable with the overall experience of writing C programs and testing them on UNIX. You are to write three programs from scratch.

All the programs you are to write should be reside in the subdirectories under cso18-labs/clab/scratch/. However, unlike in segment-1, you will notice that all sub-directories are completely empty.

Segment-2 Exercise 1: Averaging columns in a *.csv file

Put all your files for this exercise in cso18-labs/clab/scratch/avgcsv directory.

Write a C program to parse a *.csv file where fields are separated by the ';' character and print out the average of each column (which is an integer), separated by the ';' character. You may assume that the number of fields in a *.csv file is no more than 1000

Your executable file must be named avgcsv. It should take as argument the name of a csv file and outputs the average of each column, separated by ';'.

For example, suppose the contents of the example.csv are as follows:

$ cat example.csv   
10;25;56
22;10;100

Then, the expected output of the command ./avgcsv example.csv should be:

$ ./avgcsv example.csv
16.0;17.5;78.0

We expect you to write a Makefile to generate the avgcsv executable file from your source code. Recitation 1 teaches you how to write a Makefile.

You need to read from a file. Refer to [Kernighan and Ritchie] Chapter 7.5 on how to do this.

We do not provide you with any tests and will test your program under a few *.csv files of our own choosing during grading time. We highly encourage you to write unit tests for your program.

Once finished, add relevant source files and Makefile to your git repo by typing git add *.c *.h Makefile. Commit and push git commit -am "avgcsv"; git push origin

Segment-2 Exercise 2: Find the ten most frequent words

Put all your files for this exercise in cso18-labs/clab/scratch/wordcount directory.

Write a C program to find the ten most frequently used words in a given file. Your executable file must be named wordcount. It should take as argument the name of a file and outputs ten lines, one for each of the ten most frequently used words in the file in sorted order. If two words w1, w2 have the same occurance counts and w1 is alphabetically smaller than w2, then w1 precedes w2. Each line shows the occurance counts, followed by a whitespace, and then followed by the corresponding word.

For example, suppose the contents of the example.txt are given below:

$ cat example.txt
A potential victory for Donald J. Trump may hinge on one important (and large) 
group of Americans: whites who did not attend college.

Polls have shown a deep division between whites of different education levels and 
economic circumstances. A lot rides on how large these groups will be on Election Day: 
All pollsters have their own assessment of who will show up, and their predictions 
rely on these evaluations.

Then, the expected output of the ./wordcount example.txt command should be:

$ ./wordcount example.txt
2 large
2 their
2 these
2 whites
2 who
2 will
3 a
3 and
3 of
4 on

In order to count the words, your program needs to transform the given textfile into a collection of words. First, you need to split the textfile into words. Each word refers to a consecutive sequence of alpha-numeric characters. Words are separated by one or more non-alphanumeric characters. (Hint: usage the C library function isalnum. For usage, type man isalnum) Second, you need to ``canonicalize`` (aka ``normalize'') each word by turning any uppercase letters into a lower case one.

To count the occurances of words, there are several strategies.

You can implement a hash table to store the mapping from each word (C strings) to its occurance counter. C's standard library does not have hash tables nor dictionary, so you'll have to implement your own. After you've counted all the words, you'll need to sort them by their occurance counters.
You can sort all words first. For sorting, you should learn to use the library function qsort (type man qsort to learn how to use it). You can then sum consecutive identical words in the sorted list to count them. Last, you need to sort words by their occurance counts.

Like in exercise 1, you should write a Makefile to compile your program and write your own unit tests to check the correctness. During grading, we'll test your program using text files of our own choosing.

Note that we will test your program using large text files. Your program must not run slower than O(n*log_n), otherwise, you will not be able to pass the test. You may assume that the miximum size of a word is no more than 100 characters

Once finished, add relevant source files and Makefile to your git repo by typing git add *.c *.h Makefile. Commit and push git commit -am "wordcount"; git push origin

Segment-2 Exercise 3: Game of Life

Put all your files for this exercise in cso18-labs/clab/scratch/gameoflife directory.

Write a C program to simulate the Conway's game of life.

In a game of life of size n by n, the universe is two dimensional and consists of n by n cells. Each cell is in one of two possible states, "live"/"dead" (or "populated"/"unpopulated"). Every cell interacts with its eight neighbours, which are the cells that are horizontally, vertically, or diagonally adjacent. At each step in time, the following transitions occur:

Any live cell with fewer than two live neighbours dies, as if caused by under-population.
Any live cell with two or three live neighbours lives on to the next generation.
Any live cell with more than three live neighbours dies, as if by over-population.
Any dead cell with exactly three live neighbours becomes a live cell, as if by reproduction.

The initial pattern constitutes the seed of the system. The first generation is created by applying the above rules simultaneously to every cell in the seed; births and deaths occur simultaneously, and the discrete moment at which this happens is sometimes called a tick. The rules continue to be applied repeatedly to create further generations.

What about boundary conditions? We treat the borders of the 2D world as if they wrap around. In a universe of n x n cells, let's refer to the top-left cell as in position (0,0) and the bottom right cell as in position (n-1,n-1). If borders wrap around, then the 8 neighbors of (i,j) are:

((i+1)%n,j) right neighbor
((i-1)%n,j) left neighbor
(i,(j+1)%n) top neighbor
(i,(j-1)%n) bottom neighbor
((i+1)%n,(j+1)%n) topright neighbor
((i-1)%n,(j+1)%n) topleft neighbor
((i+1)%n,(j-1)%n) bottomright neighbor
((i-1)%n,(j-1)%n) bottomleft neighbor

Your executable file must be named gameoflife. It should take two arguments. The first argument is the name of a seed pattern file. The second argument is the number of ticks to run for the simulation. The seed pattern file contains one line per row of the universe. If the cell is "dead", its position is marked with the '.' character. If the cell is "live", the position is marked with the 'x' character. The seed file also effectively specifies the size of the universe to simulate.

For example, the contents of an example seed file example_seed are as follows:

$ cat example_seed
.....
.....
.xxx.
.....
.....

Then, the expected output of the command ./gameoflife example_seed 1 should be:

$ ./gameoflife example_seed 1
.....
..x..
..x..
..x..
.....

The expected output of the command ./gameoflife example_seed 2 should be:

$ ./gameoflife example_seed 2
.....
.....
.xxx.
.....
.....

Like in exercise 1 and 2, you should write a Makefile to compile your program and write your own unit tests to check the correctness. During grading, we'll test your program using seed files of our own choosing. You may assume that the seed file used for testing contains a world of no more than 1000x1000

Once finished, add relevant source files and Makefile to your git repo by typing git add *.c *.h Makefile. Commit and push git commit -am "gameoflife"; git push origin

Handin Procedure

To handin your files, simply commit and push them to github.com

$ git commit -am "hand in lab1, segment-2"
$ git push origin

We will fetching your lab files from Github.com at the specified deadline and grade them.

Lab-1: Basic C Programming

Segment-1 due: 2/11Segment-2 due: 2/18

Before you start the lab:

Programming style:

Segment-1: mini-exercises (Due: 2/11)

File modification

Saving changes while you are working on Lab1

Handin Procedure

Debugging the lab

Challenge question for part7 (2 out of 8 total lab points)

Segment-2: Write C from scratch (Due: 2/18)

Segment-2 Exercise 1: Averaging columns in a *.csv file

Segment-2 Exercise 2: Find the ten most frequent words

Segment-2 Exercise 3: Game of Life

Handin Procedure

Segment-1 due: 2/11
Segment-2 due: 2/18