Lab #4

[ This content is protected and may not be shared, uploaded, or distributed. ]

(Please also check out PA2 FAQ.)

Lab 4 has four parts:

Part A - simple web server
Part B - simple web client
Part C - run your simple web client against your simple web server
Part D - persistent HTTP connection
Part E - wireshark

Please note that this lab assumes that you have watched the PA2 lecture videos.

Part A (`lab4a`) - simple web server (useful for PA2):

The server part of your lab3d has the basic structure of a simple web server! All you have to do is to change it to handle HTTP protocol messages (instead of ASCII lines of text messages)!
For this part of the lab, you need to change it to handle HTTP request messages from a web client (such as wget), parse the lines coming from the web client to understand what the client is asking for, send the requested data back to the client in the format of an HTTP response message, then immediately shutdown and close the connection (i.e., our web server can only handle a single request from a client and send a single response to the client).
The HTTP protocol was designed so that an HTTP request message or an HTTP response message have the same basic structure. Every HTTP request message or HTTP response message consists of an HTTP header and an HTTP body, separated by an empty line. An HTTP header are simply lines of text, while an HTTP body contains binary data. In Lab 3, a line is terminated by an <LF> character. (i.e., '\n'). In HTTP, each line in the HTTP header must be terminated by a two-character sequence, <CR><LF> (i.e., "\r\n"). The empty line that separates the HTTP header from the HTTP body is, therefore, two characters long.
The first line in an HTTP request header is special and it is known as the request line. It has the format: "METHOD URI VERSION" where METHOD, URI, and VERSION are strings that do not contain space characters and they are separated by a space character. The HTTP specificiation is very extensive and the terminology used there may be slightly different from ours. For this lab exercise, you only have to process the "GET" method (and you must refuse to process any other method). The URI specifies the location of a resource at the server (i.e., that's what the client wants to "download") For this lab exercise, a resource is simply a path name in a file system. The VERSION should be "HTTP/1.x" where "x" can be anything.
The general format of the URI is "/.../.../.../.../LASTPART" where the "..." are directory or subdirectory names and "LASTPART" denotes everything after the last forward-slash ('/') character.
The first line in an HTTP response header is special and it is known as the status line. It has the format: "VERSION STATUS REASON" where VERSION and STATUS are strings that do not contain space characters and they are separated by a space character while REASON is a phrase that may contain space characters.
All the other lines in an HTTP request header or an HTTP response header must be in the format: "KEY:VALUE". When you parse the an HTTP header, you should look at each line, look for the colon character, temporarily replace it with a null (i.e., '\0') character to get the KEY and get the VALUE part by removing all the leading spaces, then put the colon back where it was. If you don't recognized a particular KEY, you must ignore and not process the VALUE. The VALUE part can also have multiple parts with a semicolon character (i.e., ";") being the separator.
VERY IMPORTANT: You should always think of HTTP body as binary data even though sometimes it happens to be text. If you always treat an HTTP body as binary data, your code will always work! How do you read/write binary data? You must use the read/write() system call to be sure! Do not use C++ code to read/write binary data unless you are 100% sure that they would do exactly what you need and would satisfy the requirements of our labs and programming assignments!
The executable of this part of the lab must be named lab4a. The usage information (i.e., commandline syntax) for lab4a is as follows (similar to lab3d):
    lab4a PORT
where PORT is a TCP port number.
To write code for this part of the lab, you must start by copying "lab3d.cpp" (from Part D of Lab 3) into "lab4a.cpp" and create a Makefile so that when the user types "make lab4a", the "lab4a" executable will be created. Please do it this way to make it easier when you need to do Part B of this lab.
A web server needs to serve requests for a particular host and providing service at a particular TCP port number. For all our labs and programming assignments, your server must service requests for the host known as "localhost" and the TCP port number you are providing service for would come from the PORT commandline argument.
VERY IMPORTANT: On some systems, such as viterbi-scf1.usc.edu and viterbi-scf1.usc.edu, "localhost" does not work for unknown reasons and you have to use "127.0.0.1" instead. By default, we will use "127.0.0.1" instead of "localhost" for all our networking labs and programming assignments. Therefore, whenver you see "localhost" in a lab spec or a programming assignment spec, you should replace it with "127.0.0.1". When you write your code, please always use the compiler-specified LOCALHOST and do not use "localhost" or "127.0.0.1" explicitly.
Let's say that your lab4a web server is running on "localhost:12345" (i.e., on "localhost" servicing port 12345). If you contact this web server by running the following command:
    wget -O x http://localhost:12345/x/y/z.html
The wget program will send the following HTTP request to your web server (for readability, I have put them on separate "lines"; but please understand that they are just one long stream of characters):
    GET /x/y/z.html HTTP/1.1\r\n
    User-Agent: Wget/1.17.1 (linux-gnu)\r\n
    Accept: */*\r\n
    Accept-Encoding: identity\r\n
    Host: localhost:12345\r\n
    \r\n
For this lab exercise, you only have to process the "request line" and you must ignore the rest (other than printing them out). You should modify the talk_to_client() function in "lab4a.cpp" to use the read_a_line() function you wrote in lab3d to read each line in the HTTP header and make sure you see the empty line that denote the end of the HTTP header before you send anything back to the client. Please print the entire HTTP request header (including the empty line) to cout. You should indent these lines by preceeding every line with a <TAB> (i.e., "\t") character so that they stand out.
When you process the "request line", you must verify that the METHOD in the "request line" is "GET" and the VERSION is "HTTP/1.x" where "x" can be anything (even if it's empty). You must then take the URI in the "request line" and append it to the string: "lab4data/" to create a file system path and expect it to refer to a file in a subdirectory of the directory where you have your lab4a executable. Let's say that you store this file system path in a variable called path. You should use the following code to get the size (number of bytes) of the file:
    #include <sys/stat.h>

    /**
     * Use this code to return the file size of path.
     *
     * You should be able to use this function as it.
     *
     * @param path - a file system path.
     * @return the file size of path, or (-1) if failure.
     */
    static
    int get_file_size(string path)
    {
        struct stat stat_buf;
        if (stat(path.c_str(), &stat_buf) != 0) {
            return (-1);
        }
        return (int)(stat_buf.st_size);
    }
One very important rule in networking is that if you get a well-formed request, you must always send a response! If get_file_size() returns (-1), it means that the file does not exist. In this case, you must send the following HTTP response message to the client (for readability, I have put them on separate "lines"; but please understand that they are just one long stream of characters):
    HTTP/1.1 404 Not Found\r\n
    Server: lab4a\r\n
    Content-Type: text/html\r\n
    Content-Length: 63\r\n
    \r\n
    <html><head></head><body><h1>404 Not Found</h1></body></html>\r\n
If get_file_size() returns a valid file size (i.e., anyting ≥ 0), it means that the file exists and you should store the file size in a variable. Then you must open that file using the open() system call (this should succeed since the stat() system call was successful previously). Before you send out any content of the file, you must first send the following HTTP response header followed by an empty line (for readability, I have put them on separate "lines"; but please understand that they are just one long stream of characters):
    HTTP/1.1 200 OK\r\n
    Server: lab4a\r\n
    Content-Type: application/octet-stream\r\n
    Content-Length: NUMBER\r\n
    \r\n
where NUMBER must be the size of the file you got from get_file_size(). The above tells the web client that, the HTTP response body is NUMBER bytes of binary data of unknown type (which is what "application/octet-stream" means, i.e., a stream of 8-bit data bytes). Please print the entire HTTP response header (including the empty line) to cout. You should indent these lines by preceeding every line with a <TAB> (i.e., "\t") character so that they stand out.
In Lab 3, it was mentioned that whenever you write data into the socket, you must use better_write(). To make debugging easier, I strongly recommend that whenever you write any part of a message header (including the empty line) into the socket, you should call better_write_header() instead. If the special debugging flag in "my_readwrite.cpp" is not turned on, please read the code in better_write_header() to see that better_write_header() would simply call better_write(). So, they are the same thing if the debugging flag is off. It's important that you must never call better_write_header() to write any part of a message body into the socket because we must treat a message body as binary data and better_write_header() assumes that you are writing ASCII data (and that's okay for message headers since all our message headers are ASCII data)!
Then you must stay in a loop and read the content of the file at most 1,024 bytes at a time using the read() system call, write all the data that you have read into the socket using the write() system call, keep repeating reading and writing until there is no more data to read (i.e., read() returns a value ≤ 0). Then you close the file and shutdown and close the socket. It's important that the number of byte you have sent in the HTTP body is exactly NUMBER or you may confuse the web client!
When you are done with implementing lab4a, please do the following:
Create an empty directory (call it "lab4") and change directory into it.
Download lab4data.tar.gz into that directory and type:
    tar xvf lab4data.tar.gz
This should create a subdirectory called "lab4data" with a bunch of files in it.
Start two Terminals and change into the "lab4" directory. Make sure that your command shell is tcsh. (If your command shell is bash, just type "tcsh" in both Terminals to switch to running tcsh.)
In the first Terminal, type "script lab4a.script" to start a transcript. Then type:
    uname -a
    cat /etc/os-release
    make clean
    make lab4a
    ./lab4a 12345
In the 2nd Terminal, type:
    wget -O x http://localhost:12345/textbooks-2-small.jpg
Wait for download to finish, then type:
    ls -l lab4data/textbooks-2-small.jpg
    ls -l x
    diff x lab4data/textbooks-2-small.jpg
The wget web client will talk to your lab4a server at port 12345 on "localhost" and it should download "lab4data/textbooks-2-small.jpg" and save it as "x". The "ls -l" commands should show you the file sizes of "lab4data/textbooks-2-small.jpg" and "x" and they should be the same size. The "diff" commands compares "lab4data/textbooks-2-small.jpg" and "x" and it should not generate any printout because these files should be identical.
Do the above again, but this time with debugging turned on for wget (and read the printout carefully to see if what you see makes sense):
    wget --debug -O x http://localhost:12345/textbooks-2-small.jpg
    ls -l lab4data/textbooks-2-small.jpg
    ls -l x
    diff x lab4data/textbooks-2-small.jpg
If the above is not working right, please fix your code until they work correctly.
Otherwise, please continue with the following commands:
    foreach f (textbooks-2-small.jpg textbooks-3-small.jpg usc-seal-1597x360.png viterbi-seal-rev-770x360.png)
        wget -O x http://localhost:12345/$f
        diff x lab4data/$f
        echo -n 'Make sure "diff" comand above did not print anything, then press any key to continue... '
        set junk=$<
    end
    foreach f (hamlet.txt random.garbage rfc7540.txt rfc793.txt)
        wget -O x http://localhost:12345/$f
        diff x lab4data/$f
        echo -n 'Make sure "diff" comand above did not print anything, then press any key to continue... '
        set junk=$<
    end
In the first Terminal, type <Ctrl+c> to kill your server. Then type "exit" to close the transcript.
Alternatively, you can also do everything inside one Terminal and run tmux. You can split the screen vertically and run the client and server in separate panes.

Part B (`lab4b`) - simple web client (useful for PA2):

The client part of your lab4a has the basic structure of a simple web client (if you did Part A correctly)! All you have to do is to change it to handle commandline arguments a little differently, send an "HTTP request message" to a web server, parse the "HTTP response message" received from the web server, save the HTTP message body in a (binary) file, then shutdown and close the connection.
The executable of your web client for this lab exercise must be named "lab4b" (please modify the Makefile so that when the user types "make lab4b", this executable will be generated). It should be able to talk to any standard web server. The usage information (i.e., commandline syntax) for "lab4b" is as follows:
    lab4b -c HOST PORT URI OUTPUTFILE
where "-c" is required (to indicate that you are runing a client program), HOST is a host name of the web server, PORT is the port number the web server is listening on, URI is the string that goes into the 2nd field in the "request line" in the HTTP request header, and OUTPUTFILE specifies where the downloaded content should go. Please note that for this lab exercise, if the first character in the URI is not the forward-slash ('/') character, it's not an error and you must prepend a '/' character to the URI in the "request line" you will send. The last character in the above URI must not be the forward-slash ('/') character. If the last character in URI is '/', please print an appropriate error message and quit your program immediately without sending a request to the server. (Please note that some of these checks ideally should be done by a web server. We are doing these checks in the client simply because this is a lab exercise. For example, in PA2, it would be the server that's checking whether the last character in URI is '/' or not.)
Please note that if you run lab4b with:
    lab4b PORT
since there is no "-c" immediately after lab4b, your lab4b should behave identical to your lab4a in Part A of this lab (since we are using the same executable for both the client and the server).
To write code for this part of the lab, you must start by copying "lab4a.cpp" (from Part A above) into "lab4b.cpp"
Your client in lab3d (and therefore, lab4a) only talks to a server on "localhost". For this lab exercise, you should replace the "localhost" in your code with the "HOST" commandline argument mentioned above. Your HTTP request should look like the following (for readability, I have put them on separate "lines"; but please understand that they are just one long stream of characters):
    GET URI HTTP/1.1\r\n
    User-Agent: lab4b\r\n
    Accept: */*\r\n
    Host: HOST:PORT\r\n
    \r\n
where URI, HOST, and PORT are from the commandline arguments and your HTTP request body must be empty. Please print the entire HTTP request header (including the last empty line) to cout. You should indent these lines by preceeding every line with a <TAB> character so that they stand out.
When you get a response back from the web server, you must print the entire HTTP response header (including the last empty line) to cout (and indent every line with a <TAB> character). You also must parse the first line in the HTTP response header for a HTTP version string, followed by a space character, followed by a status code ("200" means "OK", "404" means "not found", etc.). You should ignore the remaining characters in the first line. If the status line looks valid (i.e., has a valid HTTP version string and a status code), you must parse every line in the HTTP response header into KEY/VALUE pairs. If KEY is "Content-Length" (case insensitive), the corresponding VALUE is the number of bytes of binary data in the HTTP response body that you must save into OUTPUTFILE. If there is no "Content-Length" key in any of the HTTP response header lines, you must print an error message and shutdown and close the connection and quit your program without reading any additional data from the socket.
When you are done with implementing lab4b, please do the following:
Change directory into the "lab4" directory mentioned above.
Type "script lab4b.script" to start a transcript. Then type:
    uname -a
    cat /etc/os-release
    make clean
    make lab4b 
    /bin/rm -f x?
    ./lab4b -c merlot.usc.edu 80 /index.html x0
    ./lab4b -c merlot.usc.edu 80 /cs353/images/usctommy.gif x1
    ./lab4b -c merlot.usc.edu 80 /cs353/images/upc_map.gif x2
    ./lab4b -c merlot.usc.edu 80 /william/usc/images/upc_map.pdf x3
    ./lab4b -c merlot.usc.edu 80 /cs353/images/viterbi-seal-rev-770x360.png x4
    ./lab4b -c merlot.usc.edu 80 /cs353/images/usc-seal-1597x360.png x5
    ./lab4b -c merlot.usc.edu 80 /cs353/invalid x6
    ls -l x?
Please read the printout of the "ls" command above and verify the following:
The file size of "x0" should be 190 bytes.
The file size of "x1" should be 1689 bytes.
The file size of "x2" should be 416140 bytes.
The file size of "x3" should be 746248 bytes.
The file size of "x4" should be 95957 bytes.
The file size of "x5" should be 250749 bytes.
The file size of "x6" should be 196 bytes.
Of course, just because the file sizes are the same doesn't mean that the file contents are identical. (For now, we will not verify the contents of these files.)
Type "exit" to close the transcript. Make sure that you see all the HTTP response headers in the transcript.

Part C (`lab4c`) - run your simple web client against your simple web server (useful for PA2):

This part has no coding (other than maybe bug fixing). When you are done with Part A and Part B above, please do the following:
Change directory into the "lab4" directory mentioned above. Type "script lab4c.script" to start a transcript. (If your command shell is bash, the "foreach" command below will not work and you should first type "tcsh" to change your command shell to tcsh before proceeding.) Then type:
    ./lab4b 12345 &
Please note that there is an "&" character at the end of the above command. This tells the command shell to run the command in the background. When you run a command in the background, your command shell will print a prompt to indicate that it's ready to run another command.
Type the following commands:
    foreach f (textbooks-2-small.jpg textbooks-3-small.jpg usc-seal-1597x360.png viterbi-seal-rev-770x360.png)
        ./lab4b -c localhost 12345 /$f x
        chmod 600 x
        diff x lab4data/$f
        echo -n 'Make sure "diff" comand above did not print anything, then press any key to continue... '
        set junk=$<
    end
    foreach f (hamlet.txt random.garbage rfc7540.txt rfc793.txt)
        ./lab4b -c localhost 12345 /$f x
        chmod 600 x
        diff x lab4data/$f
        echo -n 'Make sure "diff" comand above did not print anything, then press any key to continue... '
        set junk=$<
    end
    fg
Please note that the "fg" above is the "foreground" command. It brings the command that's running in the background into the foreground. This way, you can kill it with a <Ctrl+c>.
Press <Ctrl+c> to kill the simple web server. Type "exit" to close the transcript.

Part D (`lab4d`) - persistent HTTP connection (useful for PA2):

In this part of the lab, we will continue with the code you have in Part A and Part B of this lab and make the client and the server handle persistent HTTP connections.
Please first do the following:

Copy "lab4b.cpp" (from Part B above) into "lab4d.cpp"
Modify your Makefile from Part B above so that when you type "make lab4d" in the commandline, the executable lab4d will be created.

The usage information (i.e., commandline syntax) for running lab4d as a web server is as follows:
    lab4d PORT
where PORT is a TCP port number your server must listen on.
The usage information (i.e., commandline syntax) for running lab4d as a web client is as follows:
    lab4d -c HOST PORT URI1 OUTPUTFILE1 [URI2 OUTPUTFILE2 ...]
where HOST is a host name of the web server, PORT is the port number the web server is listening on, URI1, URI2, ... are the URIs you must request to download from the web server using a single persistent HTTP connection (i.e., send and receive multiple HTTP request messages and response messages over the same connection), and OUTPUTFILE1, OUTPUTFILE2, ... are where the corresponding downloaded contents should go. Of course, the number of URIs must match the number of OUTPUTFILEs in the commandline arguments.

Modify the server part:
In Part A of this lab, your server reads an HTTP request message from the client, sends an HTTP response message to the client, and then shutdowns and closes the socket. For this part of the lab, your server needs to stay in an infinite loop and alternate between reading an HTTP request message and sending an HTTP response message over the same connection. You keep doing so until the read() system call returns either a zero (which means that the client has closed the connection) or a (-1) (which means that the connection was broken somehow). Then you break out of the infinite loop and then you shutdown and close the connection.
The wget client uses a persistent HTTP connection to download multiple files from a web server. This happens when it first downloads an HTML file. After it has downloaded an HTML file, it will parse the HTML file to look for embedded images. If there are embedded images, it will use the same connection to download these images one at a time by sending HTTP request messages. If your server is listening on port 12345 and you run the following comand:
    wget -r -l 1 http://localhost:12345/persistent.html
you should get the following HTTP request (for readability, I have put them on separate "lines"; but please understand that they are just one long stream of characters):
    GET /persistent.html HTTP/1.1\r\n
    User-Agent: Wget/1.17.1 (linux-gnu)\r\n
    Accept: */*\r\n
    Accept-Encoding: identity\r\n
    Host: localhost:12345\r\n
    \r\n
For this lab, when you parse the request line in the HTTP request header, you must look at the file name extension in the URI (i.e., everything after the last period in the URI part of the request line). In the above example, the URI in the request line is the string "/persistent.html". Therefore, the file name extension in the URI is the string "html". If the file name extension in the URI is "html" (case-insensitive) and you are sending back a 200 OK HTTP response message, the "Content-Type" key in the HTTP response header must have a value of "text/html". This is equivalent as saying that the last 5 characters in URI is ".html" (case-insensitive). Since "lab4data/persistent.html" is 186 bytes long, if you get the above request, you must send a 200 OK HTTP response message with the following response header (including the empty line):
    HTTP/1.1 200 OK\r\n
    Server: lab4a\r\n
    Content-Type: text/html\r\n
    Content-Length: 186\r\n
    \r\n
The HTTP response header must be immediately followed by 186 bytes of binary data which corresponds to the content of the "lab4data/persistent.html" file.
If the file name extension in the URI is anything else, you should send back the same HTTP response message as Part A of this lab.
No matter what response you have sent, after sending the response message, you must go back to the top of the infinite loop to read the next HTTP request message from the client and then send back an HTTP response message, and so on. You must not shutdown or close the connection after each message.
As in Part A of this lab, please print (to cout) the HTTP request header you have received and the HTTP response header you have sent, each line indented by a <TAB> character.
Modify the client part:

In Part B of this lab, your client sends an HTTP request message to the server, reads an HTTP response message from the server, and shutdowns and closes the socket. For this part of the lab, your client must send an HTTP request message for URI1 to the server, reads HTTP response message from the server and save the HTTP response body into OUTPUTFILE1, send an HTTP request message for URI2 to the server using the same socket, reads HTTP response message from the server and save the HTTP response body into OUTPUTFILE2, and so on. When your client have exhausted the commandline arguments, it must shutdown and close the socket and self-terminate.
As in Part B of this lab, please print (to cout) the HTTP request header you have sent and the HTTP response header you have received.
When you are done with implementing all the above, please do the following:
Type "script lab4d1.script" to start a transcript. Then type:
    uname -a
    cat /etc/os-release
    make clean
    make lab4d
    ./lab4d 12345
Start another Terminal window and cd into the same "lab4" directory.
In the second Terminal window, type:
    wget -r -l 1 http://localhost:12345/persistent.html
Wait for download to finish then do:
    foreach f (persistent.html textbooks-2-small.jpg textbooks-3-small.jpg usctommy.gif)
        diff localhost:12345/$f lab4data/$f
        echo -n 'Make sure "diff" comand above did not print anything, then press any key to continue... '
        set junk=$<
    end
The above command should produce no printout. If there is any printout, please fix your server code and try again.
Remove the download directory by doing the following:
    /bin/rm -rf localhost:12345
Continue with the following in the 2nd window:
Type "script lab4d2.script" to start a transcript. Then type:
    ./lab4d -c merlot.usc.edu 80 /cs353/images/upc_map.pdf x0 /cs353/images/usctommy.gif x1
Wait for download to finish then do:
    diff lab4data/upc_map.pdf x0
    diff lab4data/usctommy.gif x1
The above command should produce no printout. If there is any printout, please fix your server code and try again.
Type "exit" to close the transcript.
Continue with the following in the 2nd window:
Type "script lab4d3.script" to start a transcript. Then type:
    ./lab4d -c localhost 12345 upc_map.pdf x2 persistent.html x3 upc_map.gif x4 
Wait for download to finish then do:
    diff lab4data/upc_map.pdf x2
    diff lab4data/persistent.html x3
    diff lab4data/upc_map.gif x4
The above command should produce no printout. If there is any printout, please fix your server code and try again.
Type "exit" to close the transcript.
Do the following in the first window:

Press <Ctrl+c> kill the server.
Type "exit" to close the transcript.

Alternatively, you can also do everything inside one Terminal and run tmux. You can split the screen vertically and run the client and server in separate panes.

Part E (`lab4e`) - `wireshark` (may be useful for debugging PA2):

This part has no coding and nothing to turn in. As with Part E of Lab 3, if you are running on a shared server (such as viterbi-scf1.usc.edu or viterbi-scf2.usc.edu), please skip this part of the lab if you cannot run wireshark.
Repeat the first part of Part E of Lab 3. This time, identify the request line and identify all the "lines" in the HTTP request header all the way to the "empty line".
Click on the 2nd HTTP message (which corresponds to the HTTP response message sent from the server back to the client). Identify the status line and identify all the "lines" in the HTTP response header all the way to the "empty line". Find the "Content-Length" KEY and the corresponding VALUE and verify that this VALUE is exactly the number of bytes in the response body.
Change the filter value to "tcp.port == 12345 && ip.addr == 127.0.0.1" to inspect the "application level" data being exchanged between your client application and the server application in Part C and Part D of this lab. This can be helpful to debug your code in case you have sent extra bytes of data or you have skipped some data. Make sure that there are no null characters (i.e., '\0') in an HTTP request or response header and that every line in the HTTP request or response header is terminated with "\r\n" and make sure that you can identify the empty line that defines the end of an HTTP request or response header.
Tshark
If you are running on AWS Free Tier, wireshark there is either very very slow or it crashes over VNC. In this case, you can use tshark, which is basically wireshark without the graphical user interface or an interactive user interface. Using tshark, you can capture all the packets just like wireshark and have them go into a file. When you are done capturing all the data, run tshark to print everything you would see in wireshark so you can inspect all the packets that were captured. Let's try the following:
Open two Terminal windows and type the follwoing into the first window to capture TCP data created when you download "index.html" from merlot.usc.edu.
    tshark -i any -w test.pcap -f "host 68.181.32.44" -f "tcp port 80"
In the above command, "any" refers to any interface and the argument following the "-w" commandline option is the name of the output file. Please use the file name extension ".pcap" to mean that the file is a "raw data capture (binary) file".
In the 2nd Terminal windows, type the following to run your lab3c echo server:
    wget -O x http://68.181.32.44/
In the first window, press <Ctrl+C> to kill the tshark program, then type the following to see a top-level summary of the packets you have captured:
    tshark -r test.pcap --color
In the 2nd window, type the following to create a full dump of the packets you have captured and we will send the printout into a text file:
    tshark -r test.pcap -V -x > test.out
You can open test.out with a text editor to examine what's in every captured packet.
In the first window, look for HTTP frames in the summary printout and find the corresponding frame in test.out and see if you can find an HTTP message near the end of that frame.
For example, you may see something like the following in the hexdump portion of a frame (the colors are mine):
    0000  00 04 00 01 00 06 02 4c f1 9f ae 47 00 00 08 00   .......L...G....
    0010  45 00 00 b5 53 8e 40 00 40 06 75 c5 0a 00 02 0f   E...S.@.@.u.....
    0020  44 b5 20 2c 81 ec 00 50 b8 44 f6 3f 84 d4 0a 02   D. ,...P.D.?....
    0030  50 18 72 10 71 97 00 00 47 45 54 20 2f 20 48 54   P.r.q...GET / HT
    0040  54 50 2f 31 2e 31 0d 0a 55 73 65 72 2d 41 67 65   TP/1.1..User-Age
    0050  6e 74 3a 20 57 67 65 74 2f 31 2e 31 37 2e 31 20   nt: Wget/1.17.1
    0060  28 6c 69 6e 75 78 2d 67 6e 75 29 0d 0a 41 63 63   (linux-gnu)..Acc
    0070  65 70 74 3a 20 2a 2f 2a 0d 0a 41 63 63 65 70 74   ept: */*..Accept
    0080  2d 45 6e 63 6f 64 69 6e 67 3a 20 69 64 65 6e 74   -Encoding: ident
    0090  69 74 79 0d 0a 48 6f 73 74 3a 20 6d 65 72 6c 6f   ity..Host: merlo
    00a0  74 2e 75 73 63 2e 65 64 75 0d 0a 43 6f 6e 6e 65   t.usc.edu..Conne
    00b0  63 74 69 6f 6e 3a 20 4b 65 65 70 2d 41 6c 69 76   ction: Keep-Aliv
    00c0  65 0d 0a 0d 0a                                    e....
Using the same technique as in Lab 3, you can identify the IP header (in blue), the TCP header (in green) with a "header length" of 5 (in red), and application data (in orange).

Templates

All pseudo-code is incomplete and error checking is often left out in pseudo-code. Feel free to send your questions (and not your code) to the instructor.

It's very important that you check for error conditions so you can break out certain infinite loops.

Pseudo-code for `lab4d` server (not necessarily complete):

    do forever /* in each iteration, handle one persistent client connection */
        socket_fd = my_accept()
        talk_to_client(socket_fd)
        shutdown(socket_fd)
        close(socket_fd)
    end-do

Pseudo-code for `talk_to_client(socket_fd)` for `lab4d` server:

    do forever /* in each iteration, read one request and send one response */
        do forever /* this loop reads all lines in a request header */
            line = read_a_line(socket_fd)
            if first line then
                uri = parse(line)
            else if line is "\r\n" then
                break;
            end-if
        end-do
        fd = open_file_for_reading(uri)
        write response header and blank line into socket_fd
        do forever
            data = read(fd, 1024);
            if data valid then
                write(socket_fd, data, data.size);
            else
                break;
            end-if
        end-do
        close(fd)
    end-do

Please note that data.size above refers to the return value of the read() system call when it returns the number of bytes read. Please also note that the above is just pseudo-code and you cannot really write code this way because there is no such thing as data.size! When you read from a file using the read() system call, you must use the return value of read() to know how many bytes of data was read from the file and store that information inside a local variable. That's what data.size above is referring to. Please see ReadBinaryFromSocket() in the PA2 FAQ for more detail.
The above pseudo-code has two inner infinite loops inside the outer infinite loop. It's very important to follow this recipe to read an entire request message before proceeding to the 2nd inner infinite loop! Some students decide to just read the first line from the socket in the first inner infinite loop because all the other lines are "not useful". Please do not do that because I have seen cases on Mac OS X machines where if you do that, your code may not function properly. Apparently, the remaining data in the socket can cause problem in the 2nd inner infinite loop! This is really not supposed to happen. But unfortunately, it does. Therefore, it's best if you read an entire request message before you send a response message.
It's highly recommended that you write a function to read all the lines in a request header (plus the empty line) and have this function return an object that represents a request message that you have received. This function must not return until it has read an entire request message from the socket. This function needs to be very precise in the sense that it must not read an extra byte of data from the socket and it must not miss a single byte of data from the socket. Once you are confident that this function works perfectly, you can use this function or modify this function to read other messages in future labs and assignments.

Pseudo-code for `lab4d` client:

    socket_fd = create_client_socket_and_connect()
    for j = 1 to K do
        write request header and blank line into socket_fd to request URIj
        do forever /* this loop reads all lines in response header */
            line = read_a_line(socket_fd)
            if line is first line then
                /* do nothing */
            else if line is "\r\n" then
                break;
            else
                (key, value) = parse(line)
                if key is "Content-Length" then
                    content_length = value
                end-if
            end-if
        end-do
        fd = open_file_for_writing(OUTPUTFILEj)
        bytes_left = content_length
        while bytes_left > 0 do
            if bytes_left > 1024 then
                data = read(socket_fd, 1024);
            else
                data = read(socket_fd, bytes_left);
            end-if
            write(fd, data, data.size)
            bytes_left = bytes_left - data.size
        end-do
    end-for

The code for open_file_for_writing() is in "lab4data/copoyfile.cpp". You should copy the code for open_file_for_reading() and open_file_for_writing() into your code.
Please note that data.size above refers to the return value of the read() system call when it returns the number of bytes read.
It's highly recommended that you write a function to read all the lines in a response header (plus the empty line) and have this function return an object that represents a header of a response message that you have received. This function must not return until it has read an entire response header (including the empty line) from the socket. This function needs to be very precise in the sense that it must not read an extra byte of data from the socket and it must not miss a single byte of data from the socket. Once you are confident that this function works perfectly, you can use this function or modify this function to read other messages in future labs and assignments. (It also should be clear that you can use this function to read a request message mentioned above!)

Grading

Below is the grading breakdown:

(1 pt) submitted a valid lab4.tar.gz file with all the required files using the submission procedure below
(1 pt) contents in "lab4a.script", "lab4b.script", and "lab4c.script" are correct
(1 pt) contents in "lab4d1.script", "lab4d2.script", and "lab4d3.script" are correct
(1 pt) "Makefile" works for "make lab4a", "make lab4b", and "make lab4d"
(1 pt) source code of your simple web server/client program in "lab4a.cpp", "lab4b.cpp", and "lab4d.cpp" looks right

Minimum deduction is 0.5 pt for anything that's incorrect. Please note that for the "Makefile" item, you can only get credit for it if your "source code" is relevant to this lab; therefore, you can only get as many points as the "source code" item in the best case.

Please keep in mind that even though lab grading is "light", it doesn't mean that you can just put anything into your submission! It's still your responsibility to make sure that the files in your submission contains information that's relevant to the tests you were supposed to run. Use the "more" command to view your script/log files to make sure that they contain the right information. If a file has the wrong stuff in it, you should delete it and create the file again and verify. If most of the stuff in your script/log files are wrong and you did not notice it, we will most likely have to take points off.

Submission

To submit your work, you must first tar all the files you want to submit into a tarball and gzip it to create a gzipped tarfile named "lab4.tar.gz". Then you upload "lab4.tar.gz" to our Bistro submission server.

Change into the "lab4" directory you have created above and enter the following command to create your submission file "lab4.tar.gz" (if you don't have any ".h" files, don't include "*.h*" at the end):

    tar cvzf lab4.tar.gz lab4*.script Makefile *.c* *.h*
    ls -l lab4.tar.gz

The last command shows you how big the created "lab4.tar.gz" file is. If "lab4.tar.gz" is larger than 1MB in size, the submission server will not accept it.

If you use an IDE, the IDE may put your source code in subdirectories. In that case, you need to modify the commands above so that you include ALL the necessary source files and subdirectories (and don't include any binary files) ane make sure that your code can be compiled without the IDE since the grader is not allowed to use an IDE to compile your code.

You should read the output of the above commands carefully to make sure that "lab4.tar.gz" is created properly. If you don't understand the output of the above commands, you need to learn how to read it! It's your responsibility to ensure that "lab4.tar.gz" is created properly.

To check the content of "lab4.tar.gz", you can use the following command:

    tar tvf lab4.tar.gz

Please read the output of the above command carefully to see what files were included in "lab4.tar.gz" and what are their file sizes and make sure that they make sense.

Please enter your USC e-mail address and your submission PIN below. Then click on the Browse button and locate and select your submission file (i.e., "lab4.tar.gz"). Then click on the Upload button to submit your "lab4.tar.gz". (Be careful what you click! Do NOT submit the wrong file!) If you see an error message, please read the dialogbox carefully and fix what needs to be fixed and repeat the procedure. If you don't know your submission PIN, please visit this web site to have your PIN e-mailed to your USC e-mail address.

When this web page was last loaded, the time at the submission server at merlot.usc.edu was 27Nov2025-18:59:25. Reload this web page to see the current time on merlot.usc.edu.

If the command is executed successfully and if everything checks out, a ticket will be issued to you to let you know "what" and "when" your submission made it to the Bistro server. The next web page you see would display such a ticket and the ticket should look like the sample shown in the submission web page (of course, the actual text would be different, but the format should be similar). Make sure you follow the Verify Your Ticket instructions to verify the SHA1 hash of your submission to make sure what you did not accidentally submit the wrong file. Also, an e-mail (showing the ticket) will be sent to your USC e-mail address. Please read the ticket carefully to know exactly "what" and "when" your submission made it to the Bistro server. If there are problems, please contact the instructor.

It is extreme important that you also verify your submission after you have submitted "lab4.tar.gz" electronically to make sure that every you have submitted is everything you wanted us to grade. If you don't verify your submission and you ended up submit the wrong files, please understand that due to our fairness policy, there's absolutely nothing we can do.

Finally, please be familiar with the Electronic Submission Guidelines and information on the bsubmit web page.

(5 points total)

Simple Web Server & Simple Web Client

Part A (`lab4a`) - simple web server (useful for PA2):

Part B (`lab4b`) - simple web client (useful for PA2):

Part C (`lab4c`) - run your simple web client against your simple web server (useful for PA2):

Part D (`lab4d`) - persistent HTTP connection (useful for PA2):

Modify the server part:

Modify the client part:

Part E (`lab4e`) - `wireshark` (may be useful for debugging PA2):

Tshark

Pseudo-code for `lab4d` server (not necessarily complete):

Pseudo-code for `talk_to_client(socket_fd)` for `lab4d` server:

Pseudo-code for `lab4d` client:

Lab #4

(5 points total)

Simple Web Server & Simple Web Client

Part A (lab4a) - simple web server (useful for PA2):

Part B (lab4b) - simple web client (useful for PA2):

Part C (lab4c) - run your simple web client against your simple web server (useful for PA2):

Part D (lab4d) - persistent HTTP connection (useful for PA2):

Modify the server part:

Modify the client part:

Part E (lab4e) - wireshark (may be useful for debugging PA2):

Tshark

Pseudo-code for lab4d server (not necessarily complete):

Pseudo-code for talk_to_client(socket_fd) for lab4d server:

Pseudo-code for lab4d client:

Part A (`lab4a`) - simple web server (useful for PA2):

Part B (`lab4b`) - simple web client (useful for PA2):

Part C (`lab4c`) - run your simple web client against your simple web server (useful for PA2):

Part D (`lab4d`) - persistent HTTP connection (useful for PA2):

Part E (`lab4e`) - `wireshark` (may be useful for debugging PA2):

Pseudo-code for `lab4d` server (not necessarily complete):

Pseudo-code for `talk_to_client(socket_fd)` for `lab4d` server:

Pseudo-code for `lab4d` client: