[ This is a BACKUP server. Do NOT use unless the primary server is DOWN. ]
Quick index:- general
- is the grading guidelines important?
- can I use Eclipse?
- can I use VS Code?
- what do you mean by part (A) and part (B) of this assignment?
- what error message should I print?
- compiler
- is it okay to get a "trigraph" compiler warning?
- is it okay to get a "[-Wint-to-pointer-cast]" compiler warning when compiling "listtest.c"?
- C-string
- file name vs. path name
- pointers
- My402List
- listtest
- sort command
- how do I start this part of the assignment?
- why does the commandline syntax use "warmup1" but we must run our program by doing "./warmup1 ..."?
- is there a conflict in the spec about Amount?
- how do I find out what the current time is?
- how do parse a line of input?
- how big of a buffer should I use to read a line of input using fgets()?
- which sorting algorithm should I use?
- how am I suppose to use ctime()?
- is ctime() timezone-dependent?
- can I use setlocale() to format the Amount and Balance?
- how can I capture all the printout that flew by in a terminal when I run the grading script?
- how come I get "command not found" when I run "warmup1"?
- how come all my Transcation objects are the same?
- do I need to check to see if a file is a directory or not?
- what's the maximum balance I need to handle in my code?
- grading guidelines
- what is "usage information"?
- under gdb, can I have my program read from the output of "cat"?
- what does "Binary file differ" mean?
- how can I understand the printout of the "diff" program?
- how should I handle the case for /etc in the grading guidelines?
- what should I do with something like "Cannot compile : -? pts" in the README file?
- what does "/bin/rm: No match." mean?
- if I cannot open a file, how can I tell why I cannot open a file?
- what does "Too slow" refer to in the grading guidelines?
- how can I find out why my program takes so long to run?
- gdb
- I can see that the description field of my transaction object got corrupted, how do I find out when it got corrupted?
- my program crashes normally but it does not crash under gdb, how can this happen?
- data in my data structure got turned into garbage, how can a debug this?
- how can I debug my program if stdin came from a pipe (like in the grading guidelines)?
- valgrind
- submission
Please note that the grading guidelines says that if you submit an incomplete README file, you can lose up to 20 points! Therefore, you must make sure that your README file satisfies all the requirements. You must replace every instance of "?" in the sample README file with your own evaluation.
Thanks to Zhiyi Xu, one of our graders in Spring 2014! He wrote a tutorial on how to make Eclipse work with the kernel assignments. You can use the relevant part for the warmup assignments.
A student from a previous semester, Luis Perez Cruz, also mentioned a link about how to use valgrind with Eclipse.
Another student from a previous semester, Kai Lu, also mentioned a link about how to install Valgrind plugin in Eclipse and some useful hints.
Thanks to Dhruv Gupta, one of the students in Fall 2020! He wrote a tutorial on how to set up VS Code to work with the warmup assignments.
For the kernel assignments, you can use VS Code as a code editor. I don't think you can use it to debug your kernel code.
Given all the problems people are have with vscode, I do not recommend VS Code! I personally don't use VS Code. It uses too much memory and slows things down quite a bit because it's trying to do all these fancy editing tricks. My recommendation is that it's best to stay away from VS Code!
If you want to used an editor with a graphical UI inside the 32-bit Ubuntu 16.04 system, please give Sublime Text a try. From what I've heard, quite a few students like it.
For warmup1, in the "plus points" section, it's divided into part (A) and part (B).
For part (A), the only file you need to write is "my402list.c". This file should be around 100 lines long. (All this means is that if your "my402list.c" is 500 lines long, you probably misunderstood the spec.) Once your "my402list.c" is perfect, if follow the grading guidelines and do all the steps and then run through the scripts (available as "section-A.csh"), your printout should look like "section-A.out".
Once your "my402list.c" can pass "listtest", you should move to part (B). In part (B), the only file you need to write is "warmup1.c" which would take a "tfile" (or stdin -- and expect the data to be in the tfile format) as input, parse every line to create "Transaction" objects and add these "Transaction" objects to a My402List. Then your code would sort the list according to the integer values of the transaction times and print the sorted list in the format specified in the spec. (Some students would split "warmup1.c" into multiple .c files and that's also fine.)
If the commandline is malformed, you must tell the user why you think the commandline is malformed AND tell them what is the right form for the commandline (by giving the "usage information", i.e., the "commandline syntax" in the spec). For example, if the user typed "./warmup1 sort file1 file2" in the commandline, you can print something like:
malformed commandline - incorrect number of commandline arguments
usage: warmup1 sort [tfile]
The 2nd line was there to give the "usage information", i.e., the correct commandline syntax.
If you are reading input data and you have detected that a particular line is invalid, you must tell the user the line number of that line. Otherwise, your error message is not very helpful! Also, if you are parsing input lines and a particular field is invalid, you must tell the user which field is invalid (no need to say why that field is invalid). Otherwise, your error message is not very helpful! (Of course, if the file is empty, there is no line number. In that case, you should tell the user that the file is not supposed to be empty.) Please also understand that an error inside the input file is not a "malformed command" and therefore you must not give "usage information" since that would be confusing to the user.
You should give an error message from the perspective of your code. If you detect the error using a certain piece of code, why did your code think that there is an error? Give that information in your error message. For example, if you are counting the number of tabs in an input line and you see that line 42 does not have 3 tabs, you can simply tell the user that the error you see is that line 42 does not have 3 tabls.
If there are multiple errors in a file, which one do you report? For our warmup assignments, there is no requirement to tell the user where all of the errors are. You get to pick (from the perspective of your code) which one to report. The rule for warmup assignments is that when you find an error in the input file, you must print an error message and quit your program immediately. If there are multiple errors on the line of input you are processing, you just need to pick one error and report it. Your error message must be a correct error message.
listtest.c:80:38: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
and a bunch of similar compiler warnings on Ubuntu, it's probably because your laptop is a 64-bit machine.
If that's the case, don't worry about it since it's warning you that casting a 64-bit pointer to a 32-bit interger
may cause problem. Just make sure that you don't get these warning messages on a 32-bit Ubuntu 16.04 system
because that's the system the graders must use to grade your assignments.
If buf is a C-string that can hold up to 79 characters, you can declare it as (the extra byte is for storing the "null character", i.e., '\0', in case the length of the string is 79):
char buf[80];
Please understand that if you use the following declaration:
char *buf;
you are only declaring that buf is a pointer that points to
a C-string of an unknown/unspecified length. Furthermore, on a 32-bit machine, buf, if declared this way,
refers to a memory location that contains a 4-byte data which is meant to be interpreted as an address to an array of characters
(and you do not have to interpreted it as an address if you do type-casting carefully).
It's very important to understand the difference between the above two ways of declaring buf. One "creates" 80 bytes of memory storage and the other one "creates" 4 bytes of memory storage. I put "creates" in quotes because nothing is really "created". The correct term would be "allocated" (i.e., "allocate space" in the address space).
int memcmp(const void *cs, const void *ct, size_t count);
void *memcpy(void *dest, const void *src, size_t count);
int strncmp(const char *cs, const char *ct, size_t count);
int strcmp(const char *cs, const char *ct);
char *strcpy(char *dest, const char *src);
char *strncpy(char *dest, const char *src, size_t count);
void *memset(void *s, int c, size_t count);
size_t strnlen(const char *s, size_t count);
size_t strlen(const char *s);
char *strchr(const char *s, int c);
char *strrchr(const char *s, int c);
char *strstr(const char *s1, const char *s2);
char *strcat(char *dest, const char *src);
char *strdup(const char *s);
char *strtok(char *s, const char *d);
If these are good enough to implement an operating system, they should be all you need!
Do man on all these functions to see the man pages for them. The function prototype
may be slightly different on other operating systems, but the functionalities should be the same.
My402List *list;
you are declaring a pointer variable "list".
On a 32-bit Ubuntu 16.04 system, "list" will be 4 bytes long and it's expected to
contain a memory location which is the starting address of sizeof(My402List) bytes of data.
Also, on a 32-bit Ubuntu 16.04 system, if you don't initialize "list",
it will point to some random memory location which is most likely an invalid memory location.
Therefore, if you do:
list->num_members;
you will be doing the following...
Use the content of "list" and treat it as a memory location
that is the starting address of sizeof(My402List) bytes of data.
Use the offset that corresponds to "num_members" of My402List
and access that offset from the starting address stored in "list".
In "listtest.c", a list is often declared as:
My402List list;
In this case, "list" is NOT a pointer type.
"list" is a data structure and "&list" (i.e., address of "list")
is the starting address of sizeof(My402List) bytes of data.
By the way, there is no "reference type" in C. So, "&list" has nothing to do with a C++ "reference".
1) Static allocation. This is the method used in "listtest.c". List data structures are declared as local variable:
My402List list;
In this case, list (which is a data structure) lives in the stack frame of the function where list is declared.
This is also known as an "automatic variable".
When the function is entered, memory space is created for this list data structure in the stack frame.
When the function returns, the memory space for the list is automatically destroyed
(and you must not use it after the function returns).
To initialize such as list, you would do:
My402ListInit(&list);
2) Dynamic allocation. This is not done in "listtest.c". Here's what you would do:
My402List *pList=(My402List*)malloc(sizeof(My402List));
In this case, pList points to a data structure in the dynamic/heap segment.
The data structure stays in the heap forever until you free() it. To initialize such a list, you would do:
My402ListInit(pList);
A struct is a "data structure" and it's just a contiguous bytes/block of memory. In "my402list.h", My402List is a struct and My402ListElem is also a struct. The size of a struct can be obtained using the sizeof() macro. On 32-bit Ubuntu 16.04, sizeof(My402List) is 68 and sizeof(My402ListElem) is 12. If you look at "my402list.h", the My402ListElem struct has 3 pointers. Each pointer is 4 bytes long on a 32-bit machine. Therefore, sizeof(My402ListElem) must be 12. What about My402List? num_members takes up 4 bytes. anchor takes up 12 bytes. The others are all function points and there are 13 of them. A function pointer is just a pointer, so it's 4 bytes long. All the function pointers together takes up 52 bytes. 4+12+52=68. Therefore, sizeof(My402List) is 68.
Where is the anchor in My402List? It's 4 bytes away from the base address of a My402List. If you have a My402List struct at address 0x12345678, then the address of num_member is 0x12345678 (since its offset from the beginning of My402List is zero). The address of anchor would be 0x123435678+4=0x1234567c. If you have:
My402List list;
If we use the above numbers, &list would be 0x12345678 and this struct is 68 bytes long and it already has an anchor within that
68 bytes of struct and the anchor's address is 0x1234567c.
in this case, since list is not a pointer, you will get a compiler error if you do list->anchor.
How about this case?
My402List *list_ptr=(My402List*)malloc(sizeof(My402List));
In this case, if list_ptr contains 0x12345678, the struct that's pointed to by list_ptr is 68 bytes long and
it already has an anchor within that 68 bytes and the anchor's address is 0x1234567c.
In this case, since list_ptr is a pointer, you can do list_ptr->anchor.
list_ptr->anchor is simply a short hand for (*list_ptr).anchor since (*list_ptr) is a struct.
My402ListElem *elem_ptr = malloc(sizeof(My402ListElem*));
elem_ptr will point to a block of 4 bytes of memory!
Then if you do:
elem_ptr->next = ...;
elem_ptr->prev = ...;
elem_ptr->obj = ...;
you would be writing into memory locations beyond the end of this block of memory (of size 4).
This is called buffer overflow. Since you are writing into memory locations that
you are not supposed to (i.e., beyond the end of this block of memory), you are corrupting your address space.
How do you know sizeof(My402ListElem*) is 4? You can modify "listtest.c" and add the following line of code at the beginning of main() to see it for your self:
printf("sizeof(My402ListElem*) = %d\n", sizeof(My402ListElem*);
printf("sizeof(My402ListElem) = %d\n", sizeof(My402ListElem);
The right thing to do would be:
My402ListElem *elem_ptr = malloc(sizeof(My402ListElem));
because sizeof(My402ListElem) is 12 since the My402ListElem data structure contains 3 points.
Then when you do:
elem_ptr->next = ...;
elem_ptr->prev = ...;
elem_ptr->obj = ...;
you would be only modifying memory locations within the allocated memory block of 12 bytes.
where
in gdb to see a "stack trace". (In gdb, you can also use the command "bt", which means "back trace", to do the same thing.)
It will display what stack frames you have.
If you look at slide 38 of Lecture 2, it's showing that main() called family_tree() and family_tree() called family_tree() recursively.
So, when you type "where", it will show something like that to do.
It is extremely important that you learn how to read such a "stack trace".
In the stack trace, it will give you the line numbers in your code where one
function calls another and it will show you exactly which line number your program crashed.
So, you should look at your code at that line number and try to figure out why your code is generating a segmentation fault.
Or you can use gdb commands to look at the values of variables to see if there's something you are not supposed to do.
One common reason is that you are dereferencing a NULL pointer. For example, if your code looks like:
My402ListElem *elem = NULL;
if (elem->next == NULL) {
...
}
Then on the line where you do elem->next, since elem is NULL,
doing elem->anything would be "dereferencing a NULL pointer".
If you don't know exactly what "dereferencing a NULL pointer" means, you can search the Internet or you can ask me about it.
In the above example, when you crashed on the line where you tried to dereference elem, in gdb, you should do:
print elem
and you should see that elem is 0x0 and then you would know that
your program crashed with a segmentation fault when you tried to access memory location 0.
When you implement a function, you should first ask yourself, "What is the precondition of this function (i.e., what must be true at the beginning of the function if the function is called)?" Other than "My402ListInit()", one obvious precondition is that the first argument must be a valid list. This means that you must assume that if the function is called, the first argument is a valid list and you never have to check if the list is a valid list. What if it's not a valid list? Well, it's not your problem! Whoever that's using your list must make sure that it's passing you a valid list Please see my comment about Unlink() below.
Now take a piece of paper and draw (by hand) what a valid list look like (for a list containing no elements, a list containing 1 element, a list containing 2 elements, a list containing 3 elements, etc.) You need to check with the warmup #1 spec to make sure what you are drawing is consisten with the spec. For each of the functions, modify the picture to show what the list should look like when the function is about to return. If you need to create a new data structure, then you know what you need to call malloc(). (Conversely, if you don't need to create a new data structure, then you also know what you don't need to call malloc()!) This would basically be a postcondition of the function (i.e., what must be true at the end of the function). Once you know the preconditions and postconditions of a function, all you need to do is to write the code that will take you to satisfy the postconditions for all possible inputs of your function.
The spec regarding Unlink() means that if you must NOT check to see if "elem" is on the given "list" before you proceed to unlink "elem" from the "list". The reason is that this can take too long and your list implementation will be too inefficient. For example, if the "list" contains one million elements, on the average, it will take you half a million checks to verify that "elem" is on the "list". That's O(n) behavior where n is the length of the list.
The MAIN reason anyonen would use a doubly-linked list is so that insertion and deletion are O(1) operations (i.e., independent of the length of the "list"). If you check if "elem" is on the "list", then the operation cannot be O(1).
Therefore, when you Unlink(), you must NOT check if "elem" is on the "list" or not. What if "elem" is NOT on the "list"?! Your program may crash! Is that your fault (you, as the list implementor)? Absolutely not! The user of your list (i.e., application programmer) should NEVER call Unlink(elem) with "elem" NOT on the "list"! It would be your user's fault.
Also, you need to save the file as "Makefile". On Windows, the browser would want to save the file as "Makefile.txt" and you need to change the file name to be "Makefile".
Here is a very important debugging trick (well, not much of a trick) I would like all of you to learn... If you look for 64 in "listtest.c", you should be able to find that the length of a test list is setup to 64. You should see that a list of 64 integers are created and manipulated. Let's say that something is wrong with your list. As "listtest" starts to manipulate your list, things starting to get out of whack. How can you keep track of 64 elements?! That's too much! So, when you are trying to find a bug, change "64" to something small, like 3 (if that doesn't make your bug appears, try 4, and so on). Recompile and re-run listtest and see what's the smallest value you can have and still have your bug to show up. Then debug your code under gdb. Set a breakpoint at a place where you know that the list is in a consistent state. Since the list is short, I want you to DRAW your list out on a piece of paper. Write down the value of EVERY FIELD of every list element. Then single step through your code and see how these values change. Hopefully, you will find your bug soon.
To print out a pointer value in gdb, you can do something like:
printf "0x%08x\n", (unsigned int)(elem->next)
If you read the "listtest.c" code, you should notice that it's using a random number generator so that RandomShuffle() can shuffle the list differenly every time you run listtest. This can make it very difficult to debug. You should also notice that if you run listtest with the "-seed=value" commandline option, it will "seed" the random number generator with "value" and RandomShuffle() will shuffle the list the same way every time if you use the same "value".
In summary, to make this easier to debug, you should change num_items to a as small of a number as possible and use the "-seed=N" commandline argument with an N that will cause your code to crash with certainty.
First, rename listtest.c as warmup1.c by doing the following:
mv listtest.c warmup1.c
Then use a text editor (such as vi, emacs, nano, etc.)
to edit the Makefile that you were using and replace every instance of the string "listtest"
with the string "warmup1". At this point, if you type:
make
An executable file warmup1 should get created. If you run this program by doing:
./warmup1
it should not produce any output since this is exactly the old listtest.
Now you can use a text editor to edit warmup1.c, delete most of the code
there since you don't need it for the sort command (but probably keep the bubble sort code)
and change the rest to implement the sort command.
If you don't specify the location of the program you want to run, then command shell would "search" for the name of the program in a "search path", which is specified by the PATH environment variable. To see your PATH environment variable, do:
echo $PATH
The search path is a list of directory names, separated by the ":" character,
and it specify a directory search order when the command shell needs to locate a program to execute.
If "." is not in the search path of your command shell, running "warmup1" would result in a "command not found" error message.
To add "." to your search path, you can do:
export PATH $PATH:.
if your command shell is bash. If your command shell is tcsh, you can do:
set path = ($path .); rehash
Please note that if you want to add "." to your search path, "." should be the last place you search.
If you make sure that no amount is ≥ 10 million, then you should not encounter the case when you print where an amount is ≥ 10 million. Therefore, the "if" part of that particular statement will never be true. Then it doesn't matter what the "then" part of the statement says. So, may be a bit redundant, but still correct.
Please note that even if none of the amount can be ≥ 10 million, the balance can be ≥ 10 million or ≤ -10 million. Therefore, if you use the same routine to print out amount and balance, the ≥ 10 million check becomes relevant.
- I would first see if the line is too long. If it is, it's an error.
- Then I would count the number of tabs (i.e., '\t' characters)in a line. If it's not equal to 3, then it's an error.
- Otherwise, I would replace the '\t' characters with '\0' and use 4 character points to point to each part of that line. Each of these pointers would, by definition, now point to a C-string.
- Then you validate each field separately.
- The first field should be trivial to validate.
- For the 2nd field, just follow the spec. If it's too long, then it's an error. Otherwise, make sure every character is between '0' and '9' and then convert the entire string into a number and make sure that it's within range.
- For the 3rd field, look for the period (since it's required). Make sure that there are exactly 2 digits to the right of the period and make sure that they are all digits to the left of the period and it's not too long. Then convert the number to the left of the period to be an integer and make sure that it's in range. Then convert the 2 digits to the right of the period to be another number. Multiply the first number by 100 and add the 2nd number to get the transaction amount in cents.
- For the 4th field, you need to remove leading space characters and make sure that the remaining string is not an empty string.
- I may have missed something, but you get the idea.
Furthermore, if a line is longer than 1,024 characters (including the '\n' at the end of a line), it is considered an error.So, what should you use to hold a line of input? I would do:
char buf[2000];
Why 2000? Why not?! Isn't that a waste of space? What a few bytes between friends?! :-)
If you use 2000, when fgets() return, you can use strlen() to get the length of the line you have read into the buffer. If the length is > 1024, you print an error message and quit your program. Easy! Right?!
You can use strlen() to get the length of the line becaus the man pages of fgets() says:
A terminating null byte ('\0') is stored after the last character in the buffer.Not every string related function would do that! You need to read the man pages carefully so you know exactly what a function would do for you.
Finally, why do:
char buf[2000];
and not use malloc()? Well, we should use malloc() when we have to create objects dynamically
in cases when we don't know how many objects to create ahead of time. In this case, we know that we can use a buffer of size 2000,
so there is no good justification to allocated these 2000 bytes dynamically using malloc().
If buf[] above is declared as a local variable, please remember that function arguments and local variables live inside the stack frame. Such a variable is called an automatic variable because it is created "automatically" when the function is called (i.e., it doesn't exist before the function is called). It's also important to understand that when you return from the function, automatic variables created for that function will be automatically destroyed and you must not refer to the contents of those variables once the function has returned. The "life time" of an automatic variable for a particular function is the "life time" of that function (i.e., between the start of that function and the end of that function). Very very important concept!
My intention was for everyone to adapt existing code! In "listtest.c", there is a bubble sort algorithm and you can easily adapt it and use it to implement the "sort" command. The reason for doing it this way is to encourage you to read other people's code and use existing code. Later on, when you do the kernel assignments, you will have to read a lot of other people's code and work with those code. If you have the mindset that you always want to do things your way or implement algorithms from scratch, you may have a hard time working with existing code. I'm hoping that you will follow the procedure mentioned above and keep the bubble sort code and modify it for your needs.
char *ctime(const time_t *timep);
Let's take the first timestamp from test.tfile in the spec: 1230728833. If you do:
time_t t = 1230728833;
char buf[26];
strcpy(buf, ctime(&t));
/* buf[25] is guaranteed to be '\0' and buf[24] is guaranteed to be '\n' */
buf[24] = '\0';
printf("%s\n", buf);
You can write a small program where your main() function is just
the above and debug it and use the "display buf" gdb command to see how
the string in buf changes value.
Once you understand how ctime() works, you can add your code back into your warmup1 code.
26 is a weird number! How do I know to use 26 in the above code? Well, because the man page says so!
sudo timedatectl set-timezone America/Los_Angeles
The grader's machine will be set to this timezone.
If you are in a different timezone and you want to go back to the original timezone, you can simply run:
timedatectl
before the above command and remember what your previous timezone was and set it back when you are done with testing your code.
Someone from a previoius semester has suggested the following:
setlocale(LC_NUMERIC, "en_US");
I have no idea if it works or not. In general, I do not recommend that you write locale-specific
code in a CS class! But if you want to mess around with locale, it's your responsibility to make
sure that your code works no matter what! In the end, if your submission doesn't work in the grading's
32-bit Ubuntu 16.04 system, there's nothing I can do about it because you have been warned.
Of course, another way it to write your own function to do the formatting. It should be reasonably straightforward. You should keep your balance as an integer in cents. Then you would divide the balance in cents by 100 and format the quotient and remainder separately. For the quotient, you can write a recursive function to do the formatting (to add commas at the right places). You can pass in a buffer when you make a recursive call and, just before the recursive function returns, you append to the buffer with what you've just formatted (at most 3 digits at a time). In the recursive function, you can divide the number you passed in by by 1000 and format the remainder into 3 digits and add a comma in front of it. Then you see if the quotient is ≥ 1000. If it is, you pass the quotient in recurviely into the recursive function. Otherwise, recursion would stop. Anyway, I'm not being very precise. But it's something like that.
Recusrive code may be conceptually more difficult to write. Below is a piece of code that can give you some idea of what this function may looks like without using recursion:
void FormatCents(int amt_in_cents, char buf[80])
{
/* only handle non-negative amt_in_cents */
if (amt_in_cents >= 1000000000) {
snprintf(buf, 80, "?,???,???");
return;
}
int cents = amt_in_cents % 100;
int dollars = amt_in_cents / 100;
if (dollars >= 1000000) {
char dollar_buf[80];
FormatDollarsWithTwoCommas(dollars, dollar_buf);
snprintf(buf, 80, "%s.%02d", dollar_buf, cents);
} else if (dollars >= 1000) {
char dollar_buf[80];
FormatDollarsWithOneComma(dollars, dollar_buf);
snprintf(buf, 80, "%s.%02d", dollar_buf, cents);
} else {
snprintf(buf, 80, "%d.%02d", dollars, cents);
}
}
This code takes the first argument and split it into two parts called dollars and cents,
format them into C-strings, and insert a period between the two strings. The same idea can be used to
write the FormatDollarsWithTwoCommas() and FormatDollarsWithOneComma() functions (although instead of using "%02d" to print cents, you will need to
use "%03d" to print a number ≥ 0 and < 1000 with leading zeroes).
(Of course, FormatDollarsWithTwoCommas() will need to call FormatDollarsWithOneComma()
if the number of thousands in the input amount is ≥ 1,000.)
For warmup1, after you format the amount in buf, you need to get the length of what's in buf and put it in the right place in the printout. Anyway, this is just for illustrating the basic idea.
script
[ copy and paste part of the grading script here ]
exit
and typescript will contain everything that got printed when you run the script.
Please do "man script" to see some useful commandline options.
./warmup1 sort ...
May be you should do the same thing.
The period at the beginning refers to your "current working directory", which is the directory your login shell is in (you change your "current working directory" with the "cd" shell command). So, the command above says, "run warmup1 from the current working directory". If you don't specify where your warmup1 is, your login shell cannot find it!
For part (B), every time you call My402ListAppend(list,obj), you need to append a different obj. How can you do that? You dynamically create each object and pass the pointer to the object as obj.
If it turns out that you are indeed using a different Transaction object each time you call My402ListAppend() and you are getting the same description field in all these Transaction objects, then you need to take a look at how the description field is declared in the Transaction data type. For example, if you declare it as:
char *description;then it's a just a pointer. If you copy pointers (using the '=' operator), you are just copying an address and you end up having 2 pointers pointing to the same memory location. It's very important to understand and different between copying pointers (using the '=' operator) and copying the objects the pointer is pointing to!
In this particular case, one solution is to do the following instead:
char description[DESC_SIZE];What should DESC_SIZE be? 1024 would certainly be safe because you know that the longest input line is 1024 characters long! But this may be a bit wasteful in memory usage (although we don't really worry about that in warmup assignments). Since you only need to keep the part of the description that you need to print, you can also look at the spec and figure out what's the smallest size you need (remember to add 1 to it because '\0' takes up a byte). Then when you create an Transaction object, you just can use strncpy() to make sure you don't overflow the buffer.
But if you really want to check if a file is a directory or not, please do:
man -s 2 stat
You will need to be familiar with the stat() system call when you are doing kernel 2.
Well, can you handle infinity? The answer is actually yes! There is something called arbitrary-precision arithmetic (available as BIGNUM in openssl). But should you use that? Absolutely no, because it's too slow! Then what can you do?
Well, in programming, this is a design decision where you need to hard-code a limit on what your program can handle, i.e., limitation of your program by your design. You need to decide what you can handle and convince yourself that your design decision is reasonable (given the purpose of your program). For the maximum balance, is unsigned int (which is a 32-bit number) reasonable? Or, is unsigned long long (which is a 64-bit number) more reasonable? You decide! And it's probably not a bad idea to document all your "design decisions" in the README file for your own reference (and this is not a requirement since the last section of the README file is totally optional and the grader will not read that section).
warmup1 sort [tfile]
So, for any malformed command (where your program prints an error message saying that the command is malformed),
all you have to do is to tell the user what is a "well-formed command" by printing:
usage: warmup1 sort [tfile]
to satisfy the "MUST give usage information" requirement.
cat w1data/f15 | ./warmup1 sort > f15.sort
By looking at the above command, it seems that stdin is just the content of
"f15" in the "w1data" directory. Then running the above command under gdb
should look something like:
gdb ./warmup1
(gdb) run sort < w1data/f15 > f15.sort
If you search the Internet, you will find that people say that the above way is the right way to do it.
Well, they are wrong! (Programming exercise: can you write a program to prove/demonstrate that they are wrong?)
Around slide 61 of the "a simple OS" lecture slides, we have seen the real meaning of "<", i.e., a regular file is "mapped/connected" to file descriptor 1.
When you read from the keyboard, you are reading from a "stream". One difference between a stream and a regular file is that if you haven't closed a stream, you don't know how long the stream is. For a regular file, you can always find out the file size. Another difference is that with a regular file, you can control where you want to read from (by using something like lseek()). If you try to do the same thing with a stream, there is no guarantee that it will work. Therefore, if you have to control where you want to read from and you don't know if you will be reading from a regular file or a stream, don't use functions like lseek().
Anyway, the bottomline is that there is no way that I know of to exactly "simulate" the way "cat" is used in the grading guidelines when you are debugging under gdb! You can redirect input from the file in the "w1data" directory, but you need to understand that there is a difference (and you should know exactly what the difference is by understanding what I said the above). If you have a bug that only shows up when your program reads data from stdin, one thing you can try is to run your program under valgrind and hope that it will tell you where your bugs are.
My solutions in the "w1data" directory are text files, therefore, it's saying that your output file is a binary file! Since you are not trying to generate a binary file, how did you end up generating a binary file? It's because you have a bug in your code and you need to find that bug.
How do you find binary characters in your output (e.g., a .sort file)? One way is to do a hexdump of your .sort file. On Linux, you can run "xxd -g 1 FILENAME". Then you look at the output and look for a binary data byte. Well, that may not be so easy.
Another way is to do a hexdump of my solution and do a hexdump of your program output and do a "diff" on the two hexdumps. Since the output of the hexdump program is guaranteed to be a text file, "diff" will tell you the first place these two files hexdumps are different. Hopefully, you can pin point the location of your binary character soon after.
By the way, the printout of the "xxd" program is in 3 columns. Each line of the printout corresponds to 16 bytes of the input file. The first column contains base addresses (in hex) of the corresponding blocks of 16 data bytes. The second column contains the hex representation of each by of data. The third column contains ASCII representation of corresponding 16 data bytes (where a period is printed if the corresponding data byte is not an ASCII character).
For example, you can create file "a" using a text editor (such a "pico") with the following content:
This is line 1
This is line 2
This is line 3
Then do the following;
- Do "ls -l a" to see the file size of "a".
- Do a hexdump of "a" by typing "xxd -g 1 a" and see the exact content of file "a" (especially to see where all the "\n" characters are -- they have a hex code of "0a").
- Make a copy of "a" and call it "b" using the cp command. Type "diff a b" and see that if two files are identical, the "diff" program prints nothing. Also, do "ls -l a b" to see that the file sizes of "a" and "b" are the same.
- Delete one line from "b", then run "diff" and see what "diff" program would print when the first file has one more line than the second file. Do "ls -l a b" to see the file sizes of "a" and "b" and see that they are different.
- Copy "a" over "b" using the cp command. Add one line in "b", then run "diff" and see what "diff" program would print when the second file has one more line than the first file. Do "ls -l a b" to see the file sizes of "a" and "b" and see that they are different.
- Copy "a" over "b" using the cp command. Change one character in "b", then run "diff" and see what "diff" program would print when the files are different in line one. Do "ls -l a b".
- Open "b" in a text editor and delete all the lines in it. Do "ls -l a b" to make sure that the file size of "b" is zero (i.e., "b" is an empty file). Then run "diff" and see what "diff" program would print when the second file is empty.
- And so on.
So, is "/etc" a valid tfile? No! Why not? Because a valid tfile must contain at least one transcation!
The bottomline here is that "/etc" can be handled in exactly the same manner as an empty file (and you can print the same error message). Do you have to print an error message saying that the input is a directory? No! Is it a correct error message to say that "/etc" contains no transcations? Can you argue that it's not a correct error message?
Replace each "?" below with a numeric value:
So, you need to replace "?" with a number.
First, you should look for "Cannot compile" in the grading guidelines. You should see something like:
Cannot compile : -5 to -10, depending on effort to make it work
So, your choice is either DEDUCT nothing or something between 5 to 10 points.
The README requirement is that you must replace every "(Comments?)" with your own evaluation. So, if you think you won't lose any points (i.e., subtract 0 point), then I would replace the line in the README file by:
Cannot compile : -0 pts
If you expect 6 points to be deducted, then you should replace that line by:
Cannot compile : -6 pts
/bin/rm -f f?.sort f??.sortYou can copy and paste the above command in a Terminal and see what it does. In tcsh, if you run the above command, it will match "?" with any character. So, the above is try to delete any file that looks like "f?.sort" or "f??.sort" where a "?" is a wildcard that match any single character. If it cannot find any match, it prints:
/bin/rm : No match.So, this is totally harmless if it cannot find such a file to remove/delete.
By the way, I think there is a bug in tcsh that it prints "/bin/rm: Not match", which it's not supposed to do! But if you run the above command under bash, then it will print nothing, which is what it's supposed to do. (Please see the man pages of "rm" to see why I think it's a tcsh bug.)
What's required is that your error message must be a correct error message. Therefore, if you cannot open a file and your error message says that you cannot open a file, it's a perfectly good error message.
If you really want to know what the OS is telling you why you cannot open a file, you should read the man pages of the function you are calling (something like fopen() or open()) and see how to check for errors. If the man pages say that "errno" (slide 1 of Lecture 4 Part 1) will be set, then you can use perror() to print an errno-specific error message. perror() is a bit weird and you need to try it out to see how it prints error messages so that you can pass the right argument to it.
To run the entire part (B) of the grading guidelines, the grader will copy the entire part (B) commands and paste it into the Terminal. This should be equivalent to running the "section-B.csh" script. To see how long it would take to run "section-B.csh", the grader would use the "time" command like this (pleasae do "man time" to see what the "time" program does):
time ./section-B.csh
The last line in the printout is how long it takes to run "./section-B.csh" and it can look something like:
0.052u 0.056s 0:00.37 27.0% 0+0k 2584+664io 7pf+0w
The first number means that it spent 0.052 seconds of user time (whatever that means).
The 2nd number means that it spent 0.056 seconds of system time (whatever that means).
The 3rd number is the only important number, as far as grading is concern, and it's the time it spent to run "./section-B.csh",
and in this example, the running time is 0 minute and 0.37 seconds.
To produce profiling data when your program is run, you should run gcc with the "-pg" commandline argument (when you compile a .c file into a .o file and when you link all the .o files to create your executable program). Run your program first (e.g., "./warmup1 sort test.tfile") first and when it finishes running, a file called "gmon.out" will contain all the profiling data. To look at the profiling data, do:
gprof warmup1 gmon.out > warmup1.gprof.txt
more warmup1.gprof.txt
For more information about gprof, please do "man gprof". Please understand that gprof can only track
functions that have been compiled with the "-pg" commandline option. Therefore, it cannot track system calls
or any C++ library function calls that did not compiled with the "-pg" commandline option.
Also, you need to make sure that you are not running your code inside a shared folder because weird things happen in a shared folder (e.g., things get really slow)!
If you give gdb an array of characters to "display", it will assume that it's the beginning of a C-string (i.e., terminated with '\0'). An array of characters is compatible with (char*). So, you just need to give gdb the address of the description field in question.
Therefore, what you should do is that, at the time in your program when the description field is correct, you need to get the address of the description field and give that to gdb to "display". Let's say that you have just built a transaction object of type Transcation and you point to this object using a pointer ptr. Let's say the name of the description field is desc. Then the address of the desc field of ptr would be ptr->desc. So, in gdb, you can do:
print ptr->desc
and it will show you the address of ptr->desc and the C-string stored at this address.
Let's say that the address you see is 0x12345678. Then you should do:
display (char*)0x12345678
From this point on, after anything you do, it will print the C-string value at 0x12345678.
By the way, given what you know about the address space, the address itself give you some information. If the address is high in the address space (i.e., 0xbf?????? or 0xbe??????), then you know that you are pointing to someone's stack frame. You also know two things. (1) If that function returns, the stack frame will be popped and garbage may appear over that space. (2) If that address is pointed to by multiple transaction objects (i.e., shared by multiple transaction objects), when one of them changes that string, the other will see the changes. If the address is low in the address space (i.e., only 7 hexdigits long), then you know that you are pointing to the data segment, bss segment, or the heap/dynamic segment, although you cannot tell which directly.
Another point I want to make is the following. What if you do:
display (Transaction*)0x12345678
You will see that since you ask gdb to interpret 0x12345678 as the start address of a data structure (namely, Transaction),
it will not print the fields of this data structure. But what if you do:
display *((Transaction*)0x12345678)
Is that okay?! Give it a try! You should see that gdb will simply pretend that at address 0x12345678 is a block of memory
of size sizeof(Transaction) and would print each field of the data structure. What you need to understand is that this is the nature of memory.
It's just a collection of memory addresses and in each memory address stores a byte of data.
An address can also be viewed as the start of a block of memory (i.e., the "base address" of a block of memory).
There's really no "data structures" in memory. It's all about how you interpret what's inside a block of memory.
Starting with a base address, if you want to call the first byte a "transcation type", the next 4 bytes an integer, the next 4 bytes another integer, etc.,
there is no problem doing that! This is also how the operating system view the memory of an application program.
Basically, some are saying that if you run your program under gdb, the layout of the address space would be different. So, if your crash only happens with a particular address space layout, this might be the reason. Since you don't know if this is the reason or not, you might want to try some of these tricks. For example, you can do the following under gdb and see if it helps:
(gdb) set disable-randomization off
Another thing you can try is to do post-mortem debugging. Set up your environment so that when your
program crashes, a "core" file will be generated (which is an image of your address space).
Afterwards, do the following:
gdb warmup1 core
When you are in gdb, do "where" to see exactly where it crashed. You can also examine
memory locations.
typedef struct tagTransaction {
int type;
unsigned int timestamp;
long long cents;
char description[1024];
} Transcation;
and you did:
Transaction *t = (Transaction*)malloc(sizeof(Transaction));
and you fill all t's data fields with values and you print them out in gdb and they all look perfect.
Later on, you noticed that t->description turned into garbage and you don't know which line of your code
caused this to happen. Here's what you can do in gdb to set a spy point to monitor the values inside this
data structure. Type:
print *t
to see what the content looks right. Then do:
print t
to get the address of this block of memory. Let's say you get (which is my favorite address):
(Transcation *) 0x12345678
You can then add the following spy point using the following "display" gdb command:
display *((Transcation *)0x12345678)
This way, whatever you do, it will take 0x12345678 as the address of a Transcation data structure and print its content.
From this point on, you can keep doing single-stepping using either the step or the next gdb command.
If you see the description field turns into garbage, you should look into the last thing you did.
cat w1data/f0 | ./warmup1 sort
is the same as:
./warmup1 sort < w1data/f0
if (and that's a big *if*) your program reads the input correctly.
What if your program doesn't read the input correctly? How can you debug? The 2nd form can be debugged in the usual way. But the first form cannot. The problem with the first form is that it's running two programs ("cat" and "./warmup1") and the command shell uses a Unix pipe (in this case, an "unnamed" pipe) to connect the stdout of the first program to the stdin of the 2nd program. If you want to debug this, you need to create a "named pipe" so that the stdout of the first program goes into the input of the named pipe and run gdb to debug the 2nd program and make it read stdin from the output of the named pipe. According to this article, you can do the following:
First, you need to create the named pipe and give it a name. Let's call this named pipe "mygdbpipe" and do:
mkfifo mygdbpipe
ls -l mygdbpipe
You might see something like this:
prw-r--r-- 1 student student 0 Sep 13 13:12 mygdbpipe
The first character in the printout is a "p" to signify that this file is a "pipe file system object" (or simply, a "pipe").
Now run the 1st program and redirect its stdout into the pipe:
cat w1data/f0 > mygdbpipe
Then start gdb as usual:
gdb warmup1
In gdb, set a breakpoint at the beginning of the main() function and run warmup1 and redirect its stdin to come from the pipe:
(gdb) break main
(gdb) run sort < mygdbpipe
Later in the semester, we will learn that we can think of the Unix pipe as an instance of the Producer-Consumer Problem.
Please note that since mygdbpipe is a pipe and if you don't clear the pipe, next time when you use it, it may have unread data from previous use. To clear the pipe, you just have to read all the available data from it:
cat mygdbpipe
The above would freeze when the pipe is cleared and you have to press <Ctl+C> to terminate the "cat" program.
What is memory corruption? It means that your program is writing into memory locations that it's not suppose to write into. What's worse is that writing into such memory locations are perfectly legal and it won't cause a crash. After you have "corrupted memory", nothing in your code is guaranteed to work any more (including C library code and any other library code). Eventually, your program will crash, most likely in malloc() or free(), if the memory you have corrupted is in the data structures used by the memory allocator (we will cover memory allocator in Ch 3).
If your program dies in malloc() or free() and it's not clear why the crash has anything to do with malloc() or free(), then you should suspect that you have memory corruption bugs. In this case, your should try to run your program under valgrind. valgrind does not guarantee to catch your bugs. But when valgrind complained about an error, chances are, it's a real error and you should fix your code so that valgrind doesn't complain about it any more.
The problem is that the printout of valgrind is very difficult to read. You need to look for line numbers in the source code what you have written and look at your code very very carefully and try to figure out why valgrind is complaining about that line of code. Often times, you have to look at the lines above it to see how the variables that appeared in that line were declared. If you have trouble reading the printout of valgrind, please feel free to forward the valgrind printout to me, along with the line in your code valgrind is complaining about, so I can take a look.
Please ONLY deal with the very first error reported by valgrind and ignore everything else! If you fix the first bug, may be all the bugs will disappear. So, don't waste time on anything else.
Please ignore memory leaks reported by valgrind. In warmup assignements, we do not care about memory leaks!
- Make a new submission and get a new timestamp.
The timestamp is what we will use to determine if your submission is early or late
and how much extra credit or penalty you will get.
- Send me modification instructions within 24 hours of the submission deadline. During that time period, the first 3 lines are free. Then it costs you 3 points per line of modification.
So, if you have a lot of changes to your README file, (1) would be the way to go. If you have 3 lines or less changes, (2) would be the way to go. If you have more than 3 lines of changes but not a lot of changes, you need to decide for yourself which way you want to go.