FAQ for Warmup Project #1

Summer 2025

CSCI 402

(Note: this page can change without notice!)

[ This is a BACKUP server. Do NOT use unless the primary server is DOWN. ]

Quick index:

general
compiler
- is it okay to get a "trigraph" compiler warning?
- is it okay to get a "[-Wint-to-pointer-cast]" compiler warning when compiling "listtest.c"?
C-string
- what is a C-string?
- which C-string manipulation functions should I need to learn?
file name vs. path name
- is tfile a file name or a path?
pointers
My402List
- how do I start this part of the assignment?
- the spec for Unlink(elem) says that I must not check if elem is on the list, what does it mean?
listtest
- why does "make listtest" not working?
- how do I debug Unrecoverable error in RandomShuffle()?
sort command
grading guidelines
gdb
valgrind
- what is valgrind and when should I use it?
submission
- how can I update my README.txt file after I have made a submission?
- can I find out what I have submitted?

General

Is the "grading guidelines" important?

According to our fairness policy, we must not grade one student differently from other students. Following the "grading guidelines" is the ONLY way we will grade all assignments. The grading guidelines is considered part of the spec for all assignments and you are expected to have tested your code against the grading guidelines.

Please note that the grading guidelines says that if you submit an incomplete README file, you can lose up to 20 points! Therefore, you must make sure that your README file satisfies all the requirements. You must replace every instance of "?" in the sample README file with your own evaluation.

Can I use Eclipse?

You can use it for your own development work. Please understand that even if everything runs perfectly under Eclipse, it doesn't count for anything! You have to make sure that your program runs perfectly on a 32-bit Ubuntu 16.04 system without Eclipse because the grader is not permitted to use Eclipse to compile your program!

Thanks to Zhiyi Xu, one of our graders in Spring 2014! He wrote a tutorial on how to make Eclipse work with the kernel assignments. You can use the relevant part for the warmup assignments.

A student from a previous semester, Luis Perez Cruz, also mentioned a link about how to use valgrind with Eclipse.

Another student from a previous semester, Kai Lu, also mentioned a link about how to install Valgrind plugin in Eclipse and some useful hints.

Can I use VS Code?

You can use it for your own development work. Please understand that even if everything runs perfectly under VS Code, it doesn't count for anything! You have to make sure that your program runs perfectly on a 32-bit Ubuntu 16.04 system without VS Code because the grader is not permitted to use VS Code to compile your program!

Thanks to Dhruv Gupta, one of the students in Fall 2020! He wrote a tutorial on how to set up VS Code to work with the warmup assignments.

For the kernel assignments, you can use VS Code as a code editor. I don't think you can use it to debug your kernel code.

Given all the problems people are have with vscode, I do not recommend VS Code! I personally don't use VS Code. It uses too much memory and slows things down quite a bit because it's trying to do all these fancy editing tricks. My recommendation is that it's best to stay away from VS Code!

If you want to used an editor with a graphical UI inside the 32-bit Ubuntu 16.04 system, please give Sublime Text a try. From what I've heard, quite a few students like it.

What do you mean by part (A) and part (B) of this assignment?

The grading guidelines is part of the spec. The grading guidelines is divided into two sections. First comes the "plus points" section, which adds up to be 100 points. Then comes the "minus points" section that take points away. Your goal is to get 100 points in the "plus points" section and lose 0 point in the "minus points" section.

For warmup1, in the "plus points" section, it's divided into part (A) and part (B).

For part (A), the only file you need to write is "my402list.c". This file should be around 100 lines long. (All this means is that if your "my402list.c" is 500 lines long, you probably misunderstood the spec.) Once your "my402list.c" is perfect, if follow the grading guidelines and do all the steps and then run through the scripts (available as "section-A.csh"), your printout should look like "section-A.out".

Once your "my402list.c" can pass "listtest", you should move to part (B). In part (B), the only file you need to write is "warmup1.c" which would take a "tfile" (or stdin -- and expect the data to be in the tfile format) as input, parse every line to create "Transaction" objects and add these "Transaction" objects to a My402List. Then your code would sort the list according to the integer values of the transaction times and print the sorted list in the format specified in the spec. (Some students would split "warmup1.c" into multiple .c files and that's also fine.)

What error message should I print?

Since there are gazillian ways an error can happen, the spec only specifies what's right (and that's how it is with typical specs). The point of error messages is to tell the user as closely as possible where you think the problem is to help the user fix what needs to be fixed. So, you need to think about what makes sense to your user and try to be as reasonably helpful as possible or you will lose points.

If the commandline is malformed, you must tell the user why you think the commandline is malformed AND tell them what is the right form for the commandline (by giving the "usage information", i.e., the "commandline syntax" in the spec). For example, if the user typed "./warmup1 sort file1 file2" in the commandline, you can print something like:

    malformed commandline - incorrect number of commandline arguments
    usage: warmup1 sort [tfile]

The 2nd line was there to give the "usage information", i.e., the correct commandline syntax.

If you are reading input data and you have detected that a particular line is invalid, you must tell the user the line number of that line. Otherwise, your error message is not very helpful! Also, if you are parsing input lines and a particular field is invalid, you must tell the user which field is invalid (no need to say why that field is invalid). Otherwise, your error message is not very helpful! (Of course, if the file is empty, there is no line number. In that case, you should tell the user that the file is not supposed to be empty.) Please also understand that an error inside the input file is not a "malformed command" and therefore you must not give "usage information" since that would be confusing to the user.

You should give an error message from the perspective of your code. If you detect the error using a certain piece of code, why did your code think that there is an error? Give that information in your error message. For example, if you are counting the number of tabs in an input line and you see that line 42 does not have 3 tabs, you can simply tell the user that the error you see is that line 42 does not have 3 tabls.

If there are multiple errors in a file, which one do you report? For our warmup assignments, there is no requirement to tell the user where all of the errors are. You get to pick (from the perspective of your code) which one to report. The rule for warmup assignments is that when you find an error in the input file, you must print an error message and quit your program immediately. If there are multiple errors on the line of input you are processing, you just need to pick one error and report it. Your error message must be a correct error message.

Compiler

Is it okay to get a "trigraph" compiler warning?

It's an easy thing to get around, so please change your code so that there's no compiler warnings. In this case, you can break the string containing question marks into two string and just concatenate them together. I know this is kind of silly because it's really not the problem with your code but it's the problem with the compiler. But since it's easy to get around, we should all try to make sure that our code gets zero compiler warnings.

Is it okay to get a "[-Wint-to-pointer-cast]" compiler warning when compiling "listtest.c"?

If you are getting the following:

    listtest.c:80:38: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]

and a bunch of similar compiler warnings on Ubuntu, it's probably because your laptop is a 64-bit machine. If that's the case, don't worry about it since it's warning you that casting a 64-bit pointer to a 32-bit interger may cause problem. Just make sure that you don't get these warning messages on a 32-bit Ubuntu 16.04 system because that's the system the graders must use to grade your assignments.

C-string manipulation

What is a C-string?

In C++, there is a "string" data type. In C, there is no such thing. Instead, a string is implemented as a null-terminated array of characters which sometimes is referred to as "C-string".

If buf is a C-string that can hold up to 79 characters, you can declare it as (the extra byte is for storing the "null character", i.e., '\0', in case the length of the string is 79):

    char buf[80];

Please understand that if you use the following declaration:

    char *buf;

you are only declaring that buf is a pointer that points to a C-string of an unknown/unspecified length. Furthermore, on a 32-bit machine, buf, if declared this way, refers to a memory location that contains a 4-byte data which is meant to be interpreted as an address to an array of characters (and you do not have to interpreted it as an address if you do type-casting carefully).

It's very important to understand the difference between the above two ways of declaring buf. One "creates" 80 bytes of memory storage and the other one "creates" 4 bytes of memory storage. I put "creates" in quotes because nothing is really "created". The correct term would be "allocated" (i.e., "allocate space" in the address space).

There are lots of C-string manipulation functions. Which are the ones that I need to know?

The weenix kernel uses the following C-string manipulation functions (they are actually implelmented in the kernel):

    int    memcmp(const void *cs, const void *ct, size_t count);
    void  *memcpy(void *dest, const void *src, size_t count);
    int    strncmp(const char *cs, const char *ct, size_t count);
    int    strcmp(const char *cs, const char *ct);
    char  *strcpy(char *dest, const char *src);
    char  *strncpy(char *dest, const char *src, size_t count);
    void  *memset(void *s, int c, size_t count);
    size_t strnlen(const char *s, size_t count);
    size_t strlen(const char *s);
    char  *strchr(const char *s, int c);
    char  *strrchr(const char *s, int c);
    char  *strstr(const char *s1, const char *s2);
    char  *strcat(char *dest, const char *src);
    char  *strdup(const char *s);
    char  *strtok(char *s, const char *d);

If these are good enough to implement an operating system, they should be all you need! Do man on all these functions to see the man pages for them. The function prototype may be slightly different on other operating systems, but the functionalities should be the same.

File name vs. path name

In the commandline syntax "warmup1 sort [tfile]", when tfile is specified, is it a file name or a path? If it's a name then how can we locate the file?

If the commandline is "warmup1 sort xyz", argc will be 3, argv[1] will contain "sort", and argv[2] will contain "xyz". Once you verify that argc is 3 and argv[1] contains "sort", you shouldn't care if argv[2] contains a filename or a filepath. Just call something like open(argv[2]). If argv[2] contains a filename (i.e., argv[2] does not contain a '/' character), the OS will look for the file in the "current working directory" of your "command shell" (or simply, "shell") of your Terminal program. To see the "current working directory" of your "command shell", just type "pwd" in your Terminal. When you start a Terminal program in your Ubuntu system, your default "current working directory" is the "home directory" of Ubuntu account and you can change the "current working directory" using the "cd" shell command.

Pointers

If I declare a list as My402List *list, does it mean system will allocate a space with sizeof(My402List) to list?

No. If you do:

        My402List *list;

you are declaring a pointer variable "list". On a 32-bit Ubuntu 16.04 system, "list" will be 4 bytes long and it's expected to contain a memory location which is the starting address of sizeof(My402List) bytes of data. Also, on a 32-bit Ubuntu 16.04 system, if you don't initialize "list", it will point to some random memory location which is most likely an invalid memory location. Therefore, if you do:

        list->num_members;

you will be doing the following... Use the content of "list" and treat it as a memory location that is the starting address of sizeof(My402List) bytes of data. Use the offset that corresponds to "num_members" of My402List and access that offset from the starting address stored in "list".

In "listtest.c", a list is often declared as:

        My402List list;

In this case, "list" is NOT a pointer type. "list" is a data structure and "&list" (i.e., address of "list") is the starting address of sizeof(My402List) bytes of data.

By the way, there is no "reference type" in C. So, "&list" has nothing to do with a C++ "reference".

I'm still confused! Do I need to allocate memory for My402List?

There are 2 ways to allocate memory (or "create space") for a My402List.

1) Static allocation. This is the method used in "listtest.c". List data structures are declared as local variable:

    My402List list;

In this case, list (which is a data structure) lives in the stack frame of the function where list is declared. This is also known as an "automatic variable". When the function is entered, memory space is created for this list data structure in the stack frame. When the function returns, the memory space for the list is automatically destroyed (and you must not use it after the function returns). To initialize such as list, you would do:

    My402ListInit(&list);

2) Dynamic allocation. This is not done in "listtest.c". Here's what you would do:

    My402List *pList=(My402List*)malloc(sizeof(My402List));

In this case, pList points to a data structure in the dynamic/heap segment. The data structure stays in the heap forever until you free() it. To initialize such a list, you would do:

    My402ListInit(pList);

Do I have to allocate memory for the anchor?

It's very important you understand the answer to this question.

A struct is a "data structure" and it's just a contiguous bytes/block of memory. In "my402list.h", My402List is a struct and My402ListElem is also a struct. The size of a struct can be obtained using the sizeof() macro. On 32-bit Ubuntu 16.04, sizeof(My402List) is 68 and sizeof(My402ListElem) is 12. If you look at "my402list.h", the My402ListElem struct has 3 pointers. Each pointer is 4 bytes long on a 32-bit machine. Therefore, sizeof(My402ListElem) must be 12. What about My402List? num_members takes up 4 bytes. anchor takes up 12 bytes. The others are all function points and there are 13 of them. A function pointer is just a pointer, so it's 4 bytes long. All the function pointers together takes up 52 bytes. 4+12+52=68. Therefore, sizeof(My402List) is 68.

Where is the anchor in My402List? It's 4 bytes away from the base address of a My402List. If you have a My402List struct at address 0x12345678, then the address of num_member is 0x12345678 (since its offset from the beginning of My402List is zero). The address of anchor would be 0x123435678+4=0x1234567c. If you have:

    My402List list;

If we use the above numbers, &list would be 0x12345678 and this struct is 68 bytes long and it already has an anchor within that 68 bytes of struct and the anchor's address is 0x1234567c. in this case, since list is not a pointer, you will get a compiler error if you do list->anchor.

How about this case?

    My402List *list_ptr=(My402List*)malloc(sizeof(My402List));

In this case, if list_ptr contains 0x12345678, the struct that's pointed to by list_ptr is 68 bytes long and it already has an anchor within that 68 bytes and the anchor's address is 0x1234567c. In this case, since list_ptr is a pointer, you can do list_ptr->anchor. list_ptr->anchor is simply a short hand for (*list_ptr).anchor since (*list_ptr) is a struct.

How should I allocate memory for My402ListElem?

When you call malloc(n), you are asking the memory allocator to allocate n contiguous bytes of memory from the heap/dynamic memory segment and return the base address (i.e., the first address) of this block of memory to you. So, if you call malloc(sizeof(My402ListElem*)), since sizeof() any pointer type is 4 on the 32-bit machine, if you do:

    My402ListElem *elem_ptr = malloc(sizeof(My402ListElem*));

elem_ptr will point to a block of 4 bytes of memory! Then if you do:

    elem_ptr->next = ...;
    elem_ptr->prev = ...;
    elem_ptr->obj = ...;

you would be writing into memory locations beyond the end of this block of memory (of size 4). This is called buffer overflow. Since you are writing into memory locations that you are not supposed to (i.e., beyond the end of this block of memory), you are corrupting your address space.

How do you know sizeof(My402ListElem*) is 4? You can modify "listtest.c" and add the following line of code at the beginning of main() to see it for your self:

    printf("sizeof(My402ListElem*) = %d\n", sizeof(My402ListElem*);
    printf("sizeof(My402ListElem) = %d\n", sizeof(My402ListElem);

The right thing to do would be:

    My402ListElem *elem_ptr = malloc(sizeof(My402ListElem));

because sizeof(My402ListElem) is 12 since the My402ListElem data structure contains 3 points. Then when you do:

    elem_ptr->next = ...;
    elem_ptr->prev = ...;
    elem_ptr->obj = ...;

you would be only modifying memory locations within the allocated memory block of 12 bytes.

What does it mean for my program to get a "segmentation fault"?

It means that you are accessing a memory location that you are not supposed to. For example, memory location 0. But you can also get a segmentation fault if you access some other memory locations that you are not allowed to access (i.e., outside of any of your memory segments). The first thing you need to do is to find out exactly where you are getting a segmentation fault. So, run your code under gdb. When your program crashed, type:

    where

in gdb to see a "stack trace". (In gdb, you can also use the command "bt", which means "back trace", to do the same thing.) It will display what stack frames you have. If you look at slide 38 of Lecture 2, it's showing that main() called family_tree() and family_tree() called family_tree() recursively. So, when you type "where", it will show something like that to do. It is extremely important that you learn how to read such a "stack trace". In the stack trace, it will give you the line numbers in your code where one function calls another and it will show you exactly which line number your program crashed. So, you should look at your code at that line number and try to figure out why your code is generating a segmentation fault. Or you can use gdb commands to look at the values of variables to see if there's something you are not supposed to do.

One common reason is that you are dereferencing a NULL pointer. For example, if your code looks like:

    My402ListElem *elem = NULL;

    if (elem->next == NULL) {
        ...
    }

Then on the line where you do elem->next, since elem is NULL, doing elem->anything would be "dereferencing a NULL pointer". If you don't know exactly what "dereferencing a NULL pointer" means, you can search the Internet or you can ask me about it.

In the above example, when you crashed on the line where you tried to dereference elem, in gdb, you should do:

    print elem

and you should see that elem is 0x0 and then you would know that your program crashed with a segmentation fault when you tried to access memory location 0.

My402List

How do I start doing the first part of the assignment, i.e., implementing all the functions mentioned in "my402list.h"?

The first step is to create "my402list.c". Since you are given "listtest.c" and it's suppose to work, you can just make a copy of it and rename the copy as "my402list.c". Then delete all the C code there (variables and functions) and keep only the beginning part of it (i.e., all the #include stuff). Then copy all the function declarations (and just the function declarations, which are all the lines that begins with the "extern" keyword) from "my402list.h" into it. Then remove all the "extern" keywords from the function declarations and replace the semicolons from the function declarations with a pair of curly braces (i.e., "{}"). Save your new "my402list.c" and type "make". You should noticed that you will get a bunch of compiler errors because some functions are suppose to return something. Edit "my402list.c" and have them return 0 or NULL so that the program will compile. Once this is done, you just need to implement each function separately.

When you implement a function, you should first ask yourself, "What is the precondition of this function (i.e., what must be true at the beginning of the function if the function is called)?" Other than "My402ListInit()", one obvious precondition is that the first argument must be a valid list. This means that you must assume that if the function is called, the first argument is a valid list and you never have to check if the list is a valid list. What if it's not a valid list? Well, it's not your problem! Whoever that's using your list must make sure that it's passing you a valid list Please see my comment about Unlink() below.

Now take a piece of paper and draw (by hand) what a valid list look like (for a list containing no elements, a list containing 1 element, a list containing 2 elements, a list containing 3 elements, etc.) You need to check with the warmup #1 spec to make sure what you are drawing is consisten with the spec. For each of the functions, modify the picture to show what the list should look like when the function is about to return. If you need to create a new data structure, then you know what you need to call malloc(). (Conversely, if you don't need to create a new data structure, then you also know what you don't need to call malloc()!) This would basically be a postcondition of the function (i.e., what must be true at the end of the function). Once you know the preconditions and postconditions of a function, all you need to do is to write the code that will take you to satisfy the postconditions for all possible inputs of your function.

The spec for Unlink(elem) says that I must not check if elem is on the list, what does it mean?

Unlink(elem) refers to My402ListUnlink(list,elem).

The spec regarding Unlink() means that if you must NOT check to see if "elem" is on the given "list" before you proceed to unlink "elem" from the "list". The reason is that this can take too long and your list implementation will be too inefficient. For example, if the "list" contains one million elements, on the average, it will take you half a million checks to verify that "elem" is on the "list". That's O(n) behavior where n is the length of the list.

The MAIN reason anyonen would use a doubly-linked list is so that insertion and deletion are O(1) operations (i.e., independent of the length of the "list"). If you check if "elem" is on the "list", then the operation cannot be O(1).

Therefore, when you Unlink(), you must NOT check if "elem" is on the "list" or not. What if "elem" is NOT on the "list"?! Your program may crash! Is that your fault (you, as the list implementor)? Absolutely not! The user of your list (i.e., application programmer) should NEVER call Unlink(elem) with "elem" NOT on the "list"! It would be your user's fault.

listtest

Why does "make listtest" not working?

When you download Makefile from the spec, you need to right-click on the link and select "Save As". Please don't click on the link and then do "Save" in your web browser because your web browser may modify the file.

Also, you need to save the file as "Makefile". On Windows, the browser would want to save the file as "Makefile.txt" and you need to change the file name to be "Makefile".

I got a "Unrecoverable error in RandomShuffle()" error message. How do I debug?

"Unrecoverable error in RandomShuffle()" is just an error message printed by the code in "listtest.c". Therefore, you need to read the code of "listtest.c" and try to understand under what condition it will print that error message. The next step is to figure out why the condition occur.

Here is a very important debugging trick (well, not much of a trick) I would like all of you to learn... If you look for 64 in "listtest.c", you should be able to find that the length of a test list is setup to 64. You should see that a list of 64 integers are created and manipulated. Let's say that something is wrong with your list. As "listtest" starts to manipulate your list, things starting to get out of whack. How can you keep track of 64 elements?! That's too much! So, when you are trying to find a bug, change "64" to something small, like 3 (if that doesn't make your bug appears, try 4, and so on). Recompile and re-run listtest and see what's the smallest value you can have and still have your bug to show up. Then debug your code under gdb. Set a breakpoint at a place where you know that the list is in a consistent state. Since the list is short, I want you to DRAW your list out on a piece of paper. Write down the value of EVERY FIELD of every list element. Then single step through your code and see how these values change. Hopefully, you will find your bug soon.

To print out a pointer value in gdb, you can do something like:

        printf "0x%08x\n", (unsigned int)(elem->next)

If you read the "listtest.c" code, you should notice that it's using a random number generator so that RandomShuffle() can shuffle the list differenly every time you run listtest. This can make it very difficult to debug. You should also notice that if you run listtest with the "-seed=value" commandline option, it will "seed" the random number generator with "value" and RandomShuffle() will shuffle the list the same way every time if you use the same "value".

In summary, to make this easier to debug, you should change num_items to a as small of a number as possible and use the "-seed=N" commandline argument with an N that will cause your code to crash with certainty.

sort command

I'm done implementing My402List and listtest runs perfectly. How do I start doing the 2nd part of the assignment, i.e., implementing the "sort" command?

Assming that your implementation of My402List is prefect and listtest runs perfectly. Here's what you can do next...

First, rename listtest.c as warmup1.c by doing the following:

    mv listtest.c warmup1.c

Then use a text editor (such as vi, emacs, nano, etc.) to edit the Makefile that you were using and replace every instance of the string "listtest" with the string "warmup1". At this point, if you type:

    make

An executable file warmup1 should get created. If you run this program by doing:

    ./warmup1

it should not produce any output since this is exactly the old listtest. Now you can use a text editor to edit warmup1.c, delete most of the code there since you don't need it for the sort command (but probably keep the bubble sort code) and change the rest to implement the sort command.

Why does the commandline syntax use "warmup1" but we must run our program by doing "./warmup1 ..."?

The commandline syntax usually doesn't specify where the program is. You can look at the man pages of Linux programs and most of them don't specify the location of the program. When you run "./warmup1", the "." refers to the current working directory of your command shell (try "ls ."). So, "./warmup1" means to run the "warmup1" program in the current working directory. If you want to change your current working directory, you can use the "cd" command.

If you don't specify the location of the program you want to run, then command shell would "search" for the name of the program in a "search path", which is specified by the PATH environment variable. To see your PATH environment variable, do:

    echo $PATH

The search path is a list of directory names, separated by the ":" character, and it specify a directory search order when the command shell needs to locate a program to execute. If "." is not in the search path of your command shell, running "warmup1" would result in a "command not found" error message. To add "." to your search path, you can do:

    export PATH $PATH:.

if your command shell is bash. If your command shell is tcsh, you can do:

    set path = ($path .); rehash

Please note that if you want to add "." to your search path, "." should be the last place you search.

The spec says that when you are reading a line of input and if amount is ≥ 10 million, it's an error and you should quit the program. The spec also says that when you print, if amount is ≥ 10 million, you need to print it in a certain way. Are these two statements in conflict?

You need to understand exactly what a "conflict" is! Two statements are "in conflic" or "inconsistent" means that they cannot be simultaneously true. There is actually no conflict between the two statements!

If you make sure that no amount is ≥ 10 million, then you should not encounter the case when you print where an amount is ≥ 10 million. Therefore, the "if" part of that particular statement will never be true. Then it doesn't matter what the "then" part of the statement says. So, may be a bit redundant, but still correct.

Please note that even if none of the amount can be ≥ 10 million, the balance can be ≥ 10 million or ≤ -10 million. Therefore, if you use the same routine to print out amount and balance, the ≥ 10 million check becomes relevant.

How do I find out what the current time is?

You can use either time() (run "man -s 2 time" on a 32-bit Ubuntu 16.04 system) or gettimeofday() (run "man gettimeofday" on a 32-bit Ubuntu 16.04 system).

How do parse a line of input?

Please checkout Week 2 discussion section slides (starting around page 5). Here's a quick summary of what you should do when you read a line from the input: Please be familiar with the tfile format.

I would first see if the line is too long. If it is, it's an error.
Then I would count the number of tabs (i.e., '\t' characters)in a line. If it's not equal to 3, then it's an error.
Otherwise, I would replace the '\t' characters with '\0' and use 4 character points to point to each part of that line. Each of these pointers would, by definition, now point to a C-string.
Then you validate each field separately.
- The first field should be trivial to validate.
- For the 2nd field, just follow the spec. If it's too long, then it's an error. Otherwise, make sure every character is between '0' and '9' and then convert the entire string into a number and make sure that it's within range.
- For the 3rd field, look for the period (since it's required). Make sure that there are exactly 2 digits to the right of the period and make sure that they are all digits to the left of the period and it's not too long. Then convert the number to the left of the period to be an integer and make sure that it's in range. Then convert the 2 digits to the right of the period to be another number. Multiply the first number by 100 and add the 2nd number to get the transaction amount in cents.
- For the 4th field, you need to remove leading space characters and make sure that the remaining string is not an empty string.
I may have missed something, but you get the idea.

Please understand that the purpose of the "warmup1" executable is simply to process input that conforms to the tfile format. Therefore, your "warmup1" code should assume that the input will be in the tfile format. The only way to know if the input is in the tfile format or not is to parse the input (Linux/Unix file name extension has no meaning). If it turns out that input does not conform to the tfile format, you must print an error message and quit your program immediately (and it doesn't matter if your program has generated any printout or not).

How big of a buffer should I use to read a line of input using fgets()?

There are no rules in general. So, you need to figure out what's reasonable for this assignment. In the spec, it says:

Furthermore, if a line is longer than 1,024 characters (including the '\n' at the end of a line), it is considered an error.

So, what should you use to hold a line of input? I would do:

    char buf[2000];

Why 2000? Why not?! Isn't that a waste of space? What a few bytes between friends?! :-)

If you use 2000, when fgets() return, you can use strlen() to get the length of the line you have read into the buffer. If the length is > 1024, you print an error message and quit your program. Easy! Right?!

You can use strlen() to get the length of the line becaus the man pages of fgets() says:

A terminating null byte ('\0') is stored after the last character in the buffer.

Not every string related function would do that! You need to read the man pages carefully so you know exactly what a function would do for you.

Finally, why do:

    char buf[2000];

and not use malloc()? Well, we should use malloc() when we have to create objects dynamically in cases when we don't know how many objects to create ahead of time. In this case, we know that we can use a buffer of size 2000, so there is no good justification to allocated these 2000 bytes dynamically using malloc().

If buf[] above is declared as a local variable, please remember that function arguments and local variables live inside the stack frame. Such a variable is called an automatic variable because it is created "automatically" when the function is called (i.e., it doesn't exist before the function is called). It's also important to understand that when you return from the function, automatic variables created for that function will be automatically destroyed and you must not refer to the contents of those variables once the function has returned. The "life time" of an automatic variable for a particular function is the "life time" of that function (i.e., between the start of that function and the end of that function). Very very important concept!

Which sorting algorithm should I use?

The grading guidelines says that you must use My402List and must not use arrays to implement "sort" or you will lose 30 points. So, you can implement any list-based sorting algorithm (and use My402List as the list).

My intention was for everyone to adapt existing code! In "listtest.c", there is a bubble sort algorithm and you can easily adapt it and use it to implement the "sort" command. The reason for doing it this way is to encourage you to read other people's code and use existing code. Later on, when you do the kernel assignments, you will have to read a lot of other people's code and work with those code. If you have the mindset that you always want to do things your way or implement algorithms from scratch, you may have a hard time working with existing code. I'm hoping that you will follow the procedure mentioned above and keep the bubble sort code and modify it for your needs.

My program is crashing in ctime(). How am I suppose to use ctime()?

If you do "man ctime", it says:

    char *ctime(const time_t *timep);

Let's take the first timestamp from test.tfile in the spec: 1230728833. If you do:

    time_t t = 1230728833;
    char buf[26];
    strcpy(buf, ctime(&t));
    /* buf[25] is guaranteed to be '\0' and buf[24] is guaranteed to be '\n' */
    buf[24] = '\0';
    printf("%s\n", buf);

You can write a small program where your main() function is just the above and debug it and use the "display buf" gdb command to see how the string in buf changes value. Once you understand how ctime() works, you can add your code back into your warmup1 code.

26 is a weird number! How do I know to use 26 in the above code? Well, because the man page says so!

Is ctime() timezone-dependent?

Yes! The grading data is generated in the "America/Los_Angeles" timezone (also known as the "US/Pacific-New" timezone). To change the timezone to "America/Los_Angeles", please run the following command:

    sudo timedatectl set-timezone America/Los_Angeles

The grader's machine will be set to this timezone.

If you are in a different timezone and you want to go back to the original timezone, you can simply run:

    timedatectl

before the above command and remember what your previous timezone was and set it back when you are done with testing your code.

Can I use setlocale() to display the amount value directly as a currency?

As long as you do it correctly! Please understand that our spec is locale-independent (i.e., even in places where you should use a comma for a decimal point, you are still suppose to use a period for a decimal point in your program printout). So, if you want to use locale, you should set the locale programatically so that your program works no matter what locale the user/grader is using. This way, you won't be surprised (since you don't have access to the grading account and your program is suppose to work there).

Someone from a previoius semester has suggested the following:

    setlocale(LC_NUMERIC, "en_US");

I have no idea if it works or not. In general, I do not recommend that you write locale-specific code in a CS class! But if you want to mess around with locale, it's your responsibility to make sure that your code works no matter what! In the end, if your submission doesn't work in the grading's 32-bit Ubuntu 16.04 system, there's nothing I can do about it because you have been warned.

Of course, another way it to write your own function to do the formatting. It should be reasonably straightforward. You should keep your balance as an integer in cents. Then you would divide the balance in cents by 100 and format the quotient and remainder separately. For the quotient, you can write a recursive function to do the formatting (to add commas at the right places). You can pass in a buffer when you make a recursive call and, just before the recursive function returns, you append to the buffer with what you've just formatted (at most 3 digits at a time). In the recursive function, you can divide the number you passed in by by 1000 and format the remainder into 3 digits and add a comma in front of it. Then you see if the quotient is ≥ 1000. If it is, you pass the quotient in recurviely into the recursive function. Otherwise, recursion would stop. Anyway, I'm not being very precise. But it's something like that.

Recusrive code may be conceptually more difficult to write. Below is a piece of code that can give you some idea of what this function may looks like without using recursion:

    void FormatCents(int amt_in_cents, char buf[80])
    {
        /* only handle non-negative amt_in_cents */
        if (amt_in_cents >= 1000000000) {
            snprintf(buf, 80, "?,???,???");
            return;
        }
        int cents = amt_in_cents % 100;
        int dollars = amt_in_cents / 100;
        if (dollars >= 1000000) {
            char dollar_buf[80];
            FormatDollarsWithTwoCommas(dollars, dollar_buf);
            snprintf(buf, 80, "%s.%02d", dollar_buf, cents);
        } else if (dollars >= 1000) {
            char dollar_buf[80];
            FormatDollarsWithOneComma(dollars, dollar_buf);
            snprintf(buf, 80, "%s.%02d", dollar_buf, cents);
        } else {
            snprintf(buf, 80, "%d.%02d", dollars, cents);
        }
    }

This code takes the first argument and split it into two parts called dollars and cents, format them into C-strings, and insert a period between the two strings. The same idea can be used to write the FormatDollarsWithTwoCommas() and FormatDollarsWithOneComma() functions (although instead of using "%02d" to print cents, you will need to use "%03d" to print a number ≥ 0 and < 1000 with leading zeroes). (Of course, FormatDollarsWithTwoCommas() will need to call FormatDollarsWithOneComma() if the number of thousands in the input amount is ≥ 1,000.)

For warmup1, after you format the amount in buf, you need to get the length of what's in buf and put it in the right place in the printout. Anyway, this is just for illustrating the basic idea.

How can I capture all the printout that flew by in a terminal when I run the grading script?

In Linux/Unix, there is a command called script and can create a transcript of everything that gets displayed in a termianl into a file named typescript. So, you can do the following:

    script
    [ copy and paste part of the grading script here ]
    exit

and typescript will contain everything that got printed when you run the script. Please do "man script" to see some useful commandline options.

How come I get "command not found" when I run "warmup1 sort ..."?

I don't know if you have noticed that in the grading guidelines, we always run your warmup1 by doing the following:

    ./warmup1 sort ...

May be you should do the same thing.

The period at the beginning refers to your "current working directory", which is the directory your login shell is in (you change your "current working directory" with the "cd" shell command). So, the command above says, "run warmup1 from the current working directory". If you don't specify where your warmup1 is, your login shell cannot find it!

How come all my Transcation objects are the same?

Let's say that you are calling My402ListAppend(list,obj). If obj is the same value every time you call My402ListAppend(list,obj), what would it look like? Draw a picture that looks like the the picture in the spec and have obj to contain exactly the same address. Then the obj fields of all list elements will point to the same Transaction object. This would be the most likely scenario that all your Transcation objects are the same. If this is not what you want, you should check to see if this is happening.

For part (B), every time you call My402ListAppend(list,obj), you need to append a different obj. How can you do that? You dynamically create each object and pass the pointer to the object as obj.

If it turns out that you are indeed using a different Transaction object each time you call My402ListAppend() and you are getting the same description field in all these Transaction objects, then you need to take a look at how the description field is declared in the Transaction data type. For example, if you declare it as:

    char *description;

then it's a just a pointer. If you copy pointers (using the '=' operator), you are just copying an address and you end up having 2 pointers pointing to the same memory location. It's very important to understand and different between copying pointers (using the '=' operator) and copying the objects the pointer is pointing to!

In this particular case, one solution is to do the following instead:

    char description[DESC_SIZE];

What should DESC_SIZE be? 1024 would certainly be safe because you know that the longest input line is 1024 characters long! But this may be a bit wasteful in memory usage (although we don't really worry about that in warmup assignments). Since you only need to keep the part of the description that you need to print, you can also look at the spec and figure out what's the smallest size you need (remember to add 1 to it because '\0' takes up a byte). Then when you create an Transaction object, you just can use strncpy() to make sure you don't overflow the buffer.

Do I need to check to see if a file is a directory or not?

There is no need to do that. Just open it as a file and when you read from it, you will see that it does not look like a tfile and you should print an appropriate error message about that fact that it's an invalid tfile and then quit your program immediately.

But if you really want to check if a file is a directory or not, please do:

    man -s 2 stat

You will need to be familiar with the stat() system call when you are doing kernel 2.

What's the maximum balance I need to handle in my code?

Since I have recommended that you keep trace of the transaction amount and balance in integers to avoid numeric precision problems, we need to think about the possibility of integer overflow problem. Since a transaction amount has a maximum, the only issue left is with the balance. Since the spec doesn't say that there is a limit on the number of lines in the input, theoretically, the balance can go to infinity! So, what should you do?

Well, can you handle infinity? The answer is actually yes! There is something called arbitrary-precision arithmetic (available as BIGNUM in openssl). But should you use that? Absolutely no, because it's too slow! Then what can you do?

Well, in programming, this is a design decision where you need to hard-code a limit on what your program can handle, i.e., limitation of your program by your design. You need to decide what you can handle and convince yourself that your design decision is reasonable (given the purpose of your program). For the maximum balance, is unsigned int (which is a 32-bit number) reasonable? Or, is unsigned long long (which is a 64-bit number) more reasonable? You decide! And it's probably not a bad idea to document all your "design decisions" in the README file for your own reference (and this is not a requirement since the last section of the README file is totally optional and the grader will not read that section).

grading guidelines

What is "usage information"?

"Usage" information means "how to use this program". So, it is simply the "commandline syntax" mentioned in the spec. For warmup 1, it's simply (and EXACTLY):

    warmup1 sort [tfile]

So, for any malformed command (where your program prints an error message saying that the command is malformed), all you have to do is to tell the user what is a "well-formed command" by printing:

    usage: warmup1 sort [tfile]

to satisfy the "MUST give usage information" requirement.

Under gdb, can I have my program read from the output of "cat" (to debug commands used in the grading guidelines)?

The first command that uses "cat" in the grading guidelines that you will run is:

    cat w1data/f15 | ./warmup1 sort > f15.sort

By looking at the above command, it seems that stdin is just the content of "f15" in the "w1data" directory. Then running the above command under gdb should look something like:

    gdb ./warmup1
    (gdb) run sort < w1data/f15 > f15.sort

If you search the Internet, you will find that people say that the above way is the right way to do it. Well, they are wrong! (Programming exercise: can you write a program to prove/demonstrate that they are wrong?)

Around slide 61 of the "a simple OS" lecture slides, we have seen the real meaning of "<", i.e., a regular file is "mapped/connected" to file descriptor 1.

When you read from the keyboard, you are reading from a "stream". One difference between a stream and a regular file is that if you haven't closed a stream, you don't know how long the stream is. For a regular file, you can always find out the file size. Another difference is that with a regular file, you can control where you want to read from (by using something like lseek()). If you try to do the same thing with a stream, there is no guarantee that it will work. Therefore, if you have to control where you want to read from and you don't know if you will be reading from a regular file or a stream, don't use functions like lseek().

Anyway, the bottomline is that there is no way that I know of to exactly "simulate" the way "cat" is used in the grading guidelines when you are debugging under gdb! You can redirect input from the file in the "w1data" directory, but you need to understand that there is a difference (and you should know exactly what the difference is by understanding what I said the above). If you have a bug that only shows up when your program reads data from stdin, one thing you can try is to run your program under valgrind and hope that it will tell you where your bugs are.

What does "Binary file differ" mean?

When you see "Binary file ... differ", it means that "diff" is thinking that binary files are being compared and they are different.

My solutions in the "w1data" directory are text files, therefore, it's saying that your output file is a binary file! Since you are not trying to generate a binary file, how did you end up generating a binary file? It's because you have a bug in your code and you need to find that bug.

How do you find binary characters in your output (e.g., a .sort file)? One way is to do a hexdump of your .sort file. On Linux, you can run "xxd -g 1 FILENAME". Then you look at the output and look for a binary data byte. Well, that may not be so easy.

Another way is to do a hexdump of my solution and do a hexdump of your program output and do a "diff" on the two hexdumps. Since the output of the hexdump program is guaranteed to be a text file, "diff" will tell you the first place these two files hexdumps are different. Hopefully, you can pin point the location of your binary character soon after.

By the way, the printout of the "xxd" program is in 3 columns. Each line of the printout corresponds to 16 bytes of the input file. The first column contains base addresses (in hex) of the corresponding blocks of 16 data bytes. The second column contains the hex representation of each by of data. The third column contains ASCII representation of corresponding 16 data bytes (where a period is printed if the corresponding data byte is not an ASCII character).

How can I understand the printout of the "diff" program?

The best way is to create two text files (say, "a" and "b") where you know their exact contents then run "diff" on them to see what printout gets generated by the "diff" program.

For example, you can create file "a" using a text editor (such a "pico") with the following content:

    This is line 1
    This is line 2
    This is line 3

Then do the following;

Do "ls -l a" to see the file size of "a".
Do a hexdump of "a" by typing "xxd -g 1 a" and see the exact content of file "a" (especially to see where all the "\n" characters are -- they have a hex code of "0a").
Make a copy of "a" and call it "b" using the cp command. Type "diff a b" and see that if two files are identical, the "diff" program prints nothing. Also, do "ls -l a b" to see that the file sizes of "a" and "b" are the same.
Delete one line from "b", then run "diff" and see what "diff" program would print when the first file has one more line than the second file. Do "ls -l a b" to see the file sizes of "a" and "b" and see that they are different.
Copy "a" over "b" using the cp command. Add one line in "b", then run "diff" and see what "diff" program would print when the second file has one more line than the first file. Do "ls -l a b" to see the file sizes of "a" and "b" and see that they are different.
Copy "a" over "b" using the cp command. Change one character in "b", then run "diff" and see what "diff" program would print when the files are different in line one. Do "ls -l a b".
Open "b" in a text editor and delete all the lines in it. Do "ls -l a b" to make sure that the file size of "b" is zero (i.e., "b" is an empty file). Then run "diff" and see what "diff" program would print when the second file is empty.
And so on.

How should I handle the case for /etc in the grading guidelines without checking to see if /etc is a directory?

You can treat "/etc" just like a regular file. If you use fopen() to open "/etc" for reading, you would succeed and a file pointer would be returned to you. Then you call fgets() to read a line from this file. I think on Ubuntu 16.04, fgets() will return NULL, which means that there is no more data to read. Then you would close this file.

So, is "/etc" a valid tfile? No! Why not? Because a valid tfile must contain at least one transcation!

The bottomline here is that "/etc" can be handled in exactly the same manner as an empty file (and you can print the same error message). Do you have to print an error message saying that the input is a directory? No! Is it a correct error message to say that "/etc" contains no transcations? Can you argue that it's not a correct error message?

What should I do with something like "Cannot compile : -? pts" in the README file?

The instructions for the SELF-GRADING section says:

    Replace each "?" below with a numeric value:

So, you need to replace "?" with a number.

First, you should look for "Cannot compile" in the grading guidelines. You should see something like:

    Cannot compile : -5 to -10, depending on effort to make it work

So, your choice is either DEDUCT nothing or something between 5 to 10 points.

The README requirement is that you must replace every "(Comments?)" with your own evaluation. So, if you think you won't lose any points (i.e., subtract 0 point), then I would replace the line in the README file by:

    Cannot compile : -0 pts

If you expect 6 points to be deducted, then you should replace that line by:

    Cannot compile : -6 pts

What does "/bin/rm: No match." mean?

Since that line starts with "/bin/rm", you should look for "/bin/rm" in the grading guidelines! As it turns out, it come from this command:

    /bin/rm -f f?.sort f??.sort

You can copy and paste the above command in a Terminal and see what it does. In tcsh, if you run the above command, it will match "?" with any character. So, the above is try to delete any file that looks like "f?.sort" or "f??.sort" where a "?" is a wildcard that match any single character. If it cannot find any match, it prints:

    /bin/rm : No match.

So, this is totally harmless if it cannot find such a file to remove/delete.

By the way, I think there is a bug in tcsh that it prints "/bin/rm: Not match", which it's not supposed to do! But if you run the above command under bash, then it will print nothing, which is what it's supposed to do. (Please see the man pages of "rm" to see why I think it's a tcsh bug.)

If I cannot open a file, how can I tell why I cannot open a file?

If you cannot open a file, you just need to print an error message telling the user what file you cannot open. For this assignment, there is no requirement that you know why you cannot open a file. If you read the grading guidelines carefully, you should see that there is no requirement to print something like "access denied":

What's required is that your error message must be a correct error message. Therefore, if you cannot open a file and your error message says that you cannot open a file, it's a perfectly good error message.

If you really want to know what the OS is telling you why you cannot open a file, you should read the man pages of the function you are calling (something like fopen() or open()) and see how to check for errors. If the man pages say that "errno" (slide 1 of Lecture 4 Part 1) will be set, then you can use perror() to print an errno-specific error message. perror() is a bit weird and you need to try it out to see how it prints error messages so that you can pass the right argument to it.

What does "Too slow" refer to in the grading guidelines?

It's refer to the running time of part (B) of the grading guidelines.

To run the entire part (B) of the grading guidelines, the grader will copy the entire part (B) commands and paste it into the Terminal. This should be equivalent to running the "section-B.csh" script. To see how long it would take to run "section-B.csh", the grader would use the "time" command like this (pleasae do "man time" to see what the "time" program does):

    time ./section-B.csh

The last line in the printout is how long it takes to run "./section-B.csh" and it can look something like:

    0.052u 0.056s 0:00.37 27.0%	0+0k 2584+664io 7pf+0w

The first number means that it spent 0.052 seconds of user time (whatever that means). The 2nd number means that it spent 0.056 seconds of system time (whatever that means). The 3rd number is the only important number, as far as grading is concern, and it's the time it spent to run "./section-B.csh", and in this example, the running time is 0 minute and 0.37 seconds.

How can I find out why my program takes so long to run?

If your program runs correctly but too slowly, you can use "gprof" (a profiling tool) to find out where your program spends most of its time.

To produce profiling data when your program is run, you should run gcc with the "-pg" commandline argument (when you compile a .c file into a .o file and when you link all the .o files to create your executable program). Run your program first (e.g., "./warmup1 sort test.tfile") first and when it finishes running, a file called "gmon.out" will contain all the profiling data. To look at the profiling data, do:

    gprof warmup1 gmon.out > warmup1.gprof.txt
    more warmup1.gprof.txt

For more information about gprof, please do "man gprof". Please understand that gprof can only track functions that have been compiled with the "-pg" commandline option. Therefore, it cannot track system calls or any C++ library function calls that did not compiled with the "-pg" commandline option.

Also, you need to make sure that you are not running your code inside a shared folder because weird things happen in a shared folder (e.g., things get really slow)!

gdb

I can see that the description field of my transaction object got corrupted, how do I find out when it got corrupted?

Since it's the description field that's getting corrupted, you need to set a watch point (using the "display" gdb command) on the description field. This way, as you single step (using either the "next" or "step" gdb command), the description field in question will get printed out at every step and you will be able to find out exactly which step would cause junk to appear in the description field.

If you give gdb an array of characters to "display", it will assume that it's the beginning of a C-string (i.e., terminated with '\0'). An array of characters is compatible with (char*). So, you just need to give gdb the address of the description field in question.

Therefore, what you should do is that, at the time in your program when the description field is correct, you need to get the address of the description field and give that to gdb to "display". Let's say that you have just built a transaction object of type Transcation and you point to this object using a pointer ptr. Let's say the name of the description field is desc. Then the address of the desc field of ptr would be ptr->desc. So, in gdb, you can do:

    print ptr->desc

and it will show you the address of ptr->desc and the C-string stored at this address. Let's say that the address you see is 0x12345678. Then you should do:

    display (char*)0x12345678

From this point on, after anything you do, it will print the C-string value at 0x12345678.

By the way, given what you know about the address space, the address itself give you some information. If the address is high in the address space (i.e., 0xbf?????? or 0xbe??????), then you know that you are pointing to someone's stack frame. You also know two things. (1) If that function returns, the stack frame will be popped and garbage may appear over that space. (2) If that address is pointed to by multiple transaction objects (i.e., shared by multiple transaction objects), when one of them changes that string, the other will see the changes. If the address is low in the address space (i.e., only 7 hexdigits long), then you know that you are pointing to the data segment, bss segment, or the heap/dynamic segment, although you cannot tell which directly.

Another point I want to make is the following. What if you do:

    display (Transaction*)0x12345678

You will see that since you ask gdb to interpret 0x12345678 as the start address of a data structure (namely, Transaction), it will not print the fields of this data structure. But what if you do:

    display *((Transaction*)0x12345678)

Is that okay?! Give it a try! You should see that gdb will simply pretend that at address 0x12345678 is a block of memory of size sizeof(Transaction) and would print each field of the data structure. What you need to understand is that this is the nature of memory. It's just a collection of memory addresses and in each memory address stores a byte of data. An address can also be viewed as the start of a block of memory (i.e., the "base address" of a block of memory). There's really no "data structures" in memory. It's all about how you interpret what's inside a block of memory. Starting with a base address, if you want to call the first byte a "transcation type", the next 4 bytes an integer, the next 4 bytes another integer, etc., there is no problem doing that! This is also how the operating system view the memory of an application program.

My program crashes normally but it does not crash under gdb, how can this happen?

I have seen cases like this before. So, even though this doesn't happen very often, it can happen. I am not 100% sure exactly how this happens since I don't know the details of gdb. I've read stuff on the Internet and here are some possible explanations, although I cannot verify any of them.

Basically, some are saying that if you run your program under gdb, the layout of the address space would be different. So, if your crash only happens with a particular address space layout, this might be the reason. Since you don't know if this is the reason or not, you might want to try some of these tricks. For example, you can do the following under gdb and see if it helps:

    (gdb) set disable-randomization off

Another thing you can try is to do post-mortem debugging. Set up your environment so that when your program crashes, a "core" file will be generated (which is an image of your address space). Afterwards, do the following:

    gdb warmup1 core

When you are in gdb, do "where" to see exactly where it crashed. You can also examine memory locations.

Data in my data structure got turned into garbage, how can a debug this?

Let's say that you have defined a data structure called Transcation as follows (just an example, yours don't have to look like this):

    typedef struct tagTransaction {
        int type;
        unsigned int timestamp;
        long long cents;
        char description[1024];
    } Transcation;

and you did:

    Transaction *t = (Transaction*)malloc(sizeof(Transaction));

and you fill all t's data fields with values and you print them out in gdb and they all look perfect. Later on, you noticed that t->description turned into garbage and you don't know which line of your code caused this to happen. Here's what you can do in gdb to set a spy point to monitor the values inside this data structure. Type:

    print *t

to see what the content looks right. Then do:

    print t

to get the address of this block of memory. Let's say you get (which is my favorite address):

    (Transcation *) 0x12345678

You can then add the following spy point using the following "display" gdb command:

    display *((Transcation *)0x12345678)

This way, whatever you do, it will take 0x12345678 as the address of a Transcation data structure and print its content. From this point on, you can keep doing single-stepping using either the step or the next gdb command. If you see the description field turns into garbage, you should look into the last thing you did.

How can I debug my program if stdin came from a pipe (like in the grading guidelines)?

As mentioned in the general Programming FAQ,

    cat w1data/f0 | ./warmup1 sort

is the same as:

    ./warmup1 sort < w1data/f0

if (and that's a big *if*) your program reads the input correctly.

What if your program doesn't read the input correctly? How can you debug? The 2nd form can be debugged in the usual way. But the first form cannot. The problem with the first form is that it's running two programs ("cat" and "./warmup1") and the command shell uses a Unix pipe (in this case, an "unnamed" pipe) to connect the stdout of the first program to the stdin of the 2nd program. If you want to debug this, you need to create a "named pipe" so that the stdout of the first program goes into the input of the named pipe and run gdb to debug the 2nd program and make it read stdin from the output of the named pipe. According to this article, you can do the following:

First, you need to create the named pipe and give it a name. Let's call this named pipe "mygdbpipe" and do:

    mkfifo mygdbpipe
    ls -l mygdbpipe

You might see something like this:

    prw-r--r-- 1 student student 0 Sep 13 13:12 mygdbpipe

The first character in the printout is a "p" to signify that this file is a "pipe file system object" (or simply, a "pipe").

Now run the 1st program and redirect its stdout into the pipe:

    cat w1data/f0 > mygdbpipe

Then start gdb as usual:

    gdb warmup1

In gdb, set a breakpoint at the beginning of the main() function and run warmup1 and redirect its stdin to come from the pipe:

    (gdb) break main
    (gdb) run sort < mygdbpipe

Later in the semester, we will learn that we can think of the Unix pipe as an instance of the Producer-Consumer Problem.

Please note that since mygdbpipe is a pipe and if you don't clear the pipe, next time when you use it, it may have unread data from previous use. To clear the pipe, you just have to read all the available data from it:

    cat mygdbpipe

The above would freeze when the pipe is cleared and you have to press <Ctl+C> to terminate the "cat" program.

valgrind

What is valgrind and when should I use it?

Memory corruption bugs are one of the toughest bugs to crack.

What is memory corruption? It means that your program is writing into memory locations that it's not suppose to write into. What's worse is that writing into such memory locations are perfectly legal and it won't cause a crash. After you have "corrupted memory", nothing in your code is guaranteed to work any more (including C library code and any other library code). Eventually, your program will crash, most likely in malloc() or free(), if the memory you have corrupted is in the data structures used by the memory allocator (we will cover memory allocator in Ch 3).

If your program dies in malloc() or free() and it's not clear why the crash has anything to do with malloc() or free(), then you should suspect that you have memory corruption bugs. In this case, your should try to run your program under valgrind. valgrind does not guarantee to catch your bugs. But when valgrind complained about an error, chances are, it's a real error and you should fix your code so that valgrind doesn't complain about it any more.

The problem is that the printout of valgrind is very difficult to read. You need to look for line numbers in the source code what you have written and look at your code very very carefully and try to figure out why valgrind is complaining about that line of code. Often times, you have to look at the lines above it to see how the variables that appeared in that line were declared. If you have trouble reading the printout of valgrind, please feel free to forward the valgrind printout to me, along with the line in your code valgrind is complaining about, so I can take a look.

Please ONLY deal with the very first error reported by valgrind and ignore everything else! If you fix the first bug, may be all the bugs will disappear. So, don't waste time on anything else.

Please ignore memory leaks reported by valgrind. In warmup assignements, we do not care about memory leaks!

submission

How can I update my README file after I have made a submission?

If it's before the submission deadline, you have 2 options.

Make a new submission and get a new timestamp. The timestamp is what we will use to determine if your submission is early or late and how much extra credit or penalty you will get.
Send me modification instructions within 24 hours of the submission deadline. During that time period, the first 3 lines are free. Then it costs you 3 points per line of modification.

If you go with (2), we will use the timestamp of your last submission to determine if your submission is early or late and how much extra credit or penalty you will get.

So, if you have a lot of changes to your README file, (1) would be the way to go. If you have 3 lines or less changes, (2) would be the way to go. If you have more than 3 lines of changes but not a lot of changes, you need to decide for yourself which way you want to go.

Can I find out what I have submitted?

There is no way to do that! When you make a submission, an e-mail containing your submission ticket is sent to your USC e-mail address. You must not delete that e-mail!