[ This content is protected and may not be shared, uploaded, or distributed. ]
(Please also check out PA2 FAQ.)
Lab 4 has four parts:
Please note that this lab assumes that you have watched the PA2 lecture videos.
Part A (lab4a) - simple web server (useful for PA2):
The server part of your lab3d has the basic structure of a simple web server!
All you have to do is to change it to handle HTTP protocol messages (instead of ASCII lines of text messages)!
For this part of the lab, you need to change it to handle HTTP request messages from a web client (such as
wget),
parse the lines coming from the web client to understand what the client is asking for, send the
requested data back to the client in the format of an HTTP response message, then immediately shutdown and close the connection
(i.e., our web server can only handle a single request from a client and send a single response to the client).
The HTTP protocol was designed so that an HTTP request message or an HTTP response message have the same basic structure.
Every HTTP request message or HTTP response message consists of an HTTP header and an HTTP body, separated by an empty line.
An HTTP header are simply lines of text, while an HTTP body contains binary data. In Lab 3,
a line is terminated by an <LF> character. (i.e., '\n'). In HTTP, each line in the HTTP header must be terminated by a two-character sequence,
<CR><LF> (i.e., "\r\n"). The empty line that separates the HTTP header from the HTTP body is, therefore, two characters long.
The first line in an HTTP request header is special and it is known as the request line.
It has the format: "METHOD URI VERSION" where METHOD, URI, and VERSION
are strings that do not contain space characters and they are separated by a space character.
The HTTP specificiation is very extensive and the terminology
used there may be slightly different from ours.
For this lab exercise, you only have to process the "GET" method (and you must refuse to process any other method).
The URI specifies the location of a resource at the server (i.e., that's what the client wants to "download")
For this lab exercise, a resource is simply a path name in a file system.
The VERSION should be "HTTP/1.x" where "x" can be anything.
The general format of the URI is
"/.../.../.../.../LASTPART" where the "..." are directory or subdirectory names
and "LASTPART" denotes everything after the last forward-slash ('/') character.
The first line in an HTTP response header is special and it is known as the status line.
It has the format: "VERSION STATUS REASON" where VERSION and STATUS
are strings that do not contain space characters and they are separated by a space character while REASON is a phrase that may contain space characters.
All the other lines in an HTTP request header or an HTTP response header must be in the format: "KEY:VALUE".
When you parse the an HTTP header,
you should look at each line, look for the colon character,
temporarily replace it with a null (i.e., '\0') character to get
the KEY and get the VALUE part by removing all the leading spaces, then put the colon back where it was.
If you don't recognized a particular KEY, you must ignore and not process the VALUE.
The VALUE part can also have multiple parts with a semicolon character (i.e., ";") being the separator.
VERY IMPORTANT: You should always think of HTTP body as binary data even though sometimes it happens to be text.
If you always treat an HTTP body as binary data, your code will always work!
How do you read/write binary data? You must use the read/write() system call to be sure!
Do not use C++ code to read/write binary data unless you are 100% sure that they would do exactly what you need and would satisfy the requirements
of our labs and programming assignments!
The executable of this part of the lab must be named lab4a.
The usage information (i.e., commandline syntax) for lab4a is as follows (similar to lab3d):
lab4a PORT
where PORT is a TCP port number.
To write code for this part of the lab,
you must start by copying "lab3d.cpp" (from Part D of Lab 3) into "lab4a.cpp"
and create a Makefile so that when the user types "make lab4a",
the "lab4a" executable will be created.
Please do it this way to make it easier when you need to do Part B of this lab.
A web server needs to serve requests for a particular host and providing service at a particular TCP port number.
For all our labs and programming assignments, your server must service requests for the host known as
"localhost" and
the TCP port number you are providing service for would come from the PORT commandline argument.
VERY IMPORTANT: On some systems, such as viterbi-scf1.usc.edu and viterbi-scf1.usc.edu,
"localhost" does not work for unknown reasons and you have to use "127.0.0.1" instead.
By default, we will use "127.0.0.1" instead of "localhost" for all our networking labs and programming assignments.
Therefore, whenver you see "localhost" in a lab spec or a programming assignment spec, you should replace it with "127.0.0.1".
When you write your code, please always use the compiler-specified LOCALHOST and do not use "localhost" or "127.0.0.1" explicitly.
Let's say that your lab4a
web server is running on "localhost:12345" (i.e., on "localhost" servicing port 12345).
If you contact this web server by running the following command:
wget -O x http://localhost:12345/x/y/z.html
The wget program will send the following HTTP request to your web server (for readability, I have put them on separate "lines"; but please
understand that they are just one long stream of characters):
GET /x/y/z.html HTTP/1.1\r\n
User-Agent: Wget/1.17.1 (linux-gnu)\r\n
Accept: */*\r\n
Accept-Encoding: identity\r\n
Host: localhost:12345\r\n
\r\n
For this lab exercise, you only have to process the "request line" and you must ignore the rest (other than printing them out).
You should modify the talk_to_client() function in "lab4a.cpp" to use the read_a_line() function you wrote
in lab3d
to read each line in the HTTP header and make sure you see the empty line that denote the end of the HTTP header before you send anything back to the client.
Please print the entire HTTP request header (including the empty line) to cout. You should indent these lines by preceeding every line
with a <TAB> (i.e., "\t") character so that they stand out.
When you process the "request line", you must verify that the METHOD in the "request line" is "GET" and the VERSION is "HTTP/1.x"
where "x" can be anything (even if it's empty).
You must then take the URI in the "request line" and append it to the string: "lab4data/" to create a file system path
and expect it to refer to a file in a subdirectory of the directory where you have your lab4a executable.
Let's say that you store this file system path in a variable called path.
You should use the following code to get the size (number of bytes) of the file:
#include <sys/stat.h>
/**
* Use this code to return the file size of path.
*
* You should be able to use this function as it.
*
* @param path - a file system path.
* @return the file size of path, or (-1) if failure.
*/
static
int get_file_size(string path)
{
struct stat stat_buf;
if (stat(path.c_str(), &stat_buf) != 0) {
return (-1);
}
return (int)(stat_buf.st_size);
}
One very important rule in networking is that if you get a well-formed request, you must always send a response!
If get_file_size() returns (-1), it means that the file does not exist.
In this case, you must send the following HTTP response message to the client
(for readability, I have put them on separate "lines"; but please understand that they are just one long stream of characters):
HTTP/1.1 404 Not Found\r\n
Server: lab4a\r\n
Content-Type: text/html\r\n
Content-Length: 63\r\n
\r\n
<html><head></head><body><h1>404 Not Found</h1></body></html>\r\n
If get_file_size() returns a valid file size (i.e., anyting ≥ 0), it means that the file exists and
you should store the file size in a variable.
Then you must open that file using the open() system call (this should succeed since the stat() system call was successful previously).
Before you send out any content of the file, you must first send the following HTTP response header followed by an empty line
(for readability, I have put them on separate "lines"; but please understand that they are just one long stream of characters):
HTTP/1.1 200 OK\r\n
Server: lab4a\r\n
Content-Type: application/octet-stream\r\n
Content-Length: NUMBER\r\n
\r\n
where NUMBER must be the size of the file you got from get_file_size().
The above tells the web client that, the HTTP response body is NUMBER bytes
of binary data of unknown type (which is what "application/octet-stream" means, i.e., a stream of 8-bit data bytes).
Please print the entire HTTP response header (including the empty line) to cout. You should indent these lines by preceeding every line
with a <TAB> (i.e., "\t") character so that they stand out.
In Lab 3, it was mentioned that whenever you write data into the socket, you must use better_write().
To make debugging easier, I strongly recommend that whenever you write any part of a message header (including the empty line) into the socket,
you should call better_write_header() instead. If the special debugging flag in "my_readwrite.cpp"
is not turned on, please read the code in better_write_header() to see that
better_write_header() would simply call better_write(). So, they are the same thing if the debugging flag is off.
It's important that you must never call better_write_header() to write any part of a message body into the socket
because we must treat a message body as binary data and better_write_header() assumes that you are writing ASCII data
(and that's okay for message headers since all our message headers are ASCII data)!
Then you must stay in a loop and read the content of the file at most 1,024 bytes at a time using
the read() system call, write all the data that you have read into the socket using
the write() system call, keep repeating reading and writing until there is no more data to read
(i.e., read() returns a value ≤ 0). Then you close the file and shutdown and close the socket.
It's important that the number of byte you have sent in the HTTP body is exactly NUMBER or you
may confuse the web client!
When you are done with implementing lab4a, please do the following:
- Create an empty directory (call it "lab4") and change directory into it.
- Download lab4data.tar.gz into that directory and type:
tar xvf lab4data.tar.gz
This should create a subdirectory called "lab4data" with a bunch of files in it.
- Start two Terminals and change into the "lab4" directory. Make sure that your command shell is tcsh.
(If your command shell is bash, just type "tcsh" in both Terminals to switch to running tcsh.)
- In the first Terminal, type "script lab4a.script" to start a transcript.
Then type:
uname -a
cat /etc/os-release
make clean
make lab4a
./lab4a 12345
- In the 2nd Terminal, type:
wget -O x http://localhost:12345/textbooks-2-small.jpg
Wait for download to finish, then type:
ls -l lab4data/textbooks-2-small.jpg
ls -l x
diff x lab4data/textbooks-2-small.jpg
The wget web client will talk to your lab4a server at port 12345 on "localhost"
and it should download "lab4data/textbooks-2-small.jpg" and save it as "x".
The "ls -l" commands should show you the file sizes of "lab4data/textbooks-2-small.jpg" and "x"
and they should be the same size.
The "diff" commands compares "lab4data/textbooks-2-small.jpg" and "x" and
it should not generate any printout because these files should be identical.
- Do the above again, but this time with debugging turned on for wget (and read the printout carefully to see if what you see makes sense):
wget --debug -O x http://localhost:12345/textbooks-2-small.jpg
ls -l lab4data/textbooks-2-small.jpg
ls -l x
diff x lab4data/textbooks-2-small.jpg
- If the above is not working right, please fix your code until they work correctly.
- Otherwise, please continue with the following commands:
foreach f (textbooks-2-small.jpg textbooks-3-small.jpg usc-seal-1597x360.png viterbi-seal-rev-770x360.png)
wget -O x http://localhost:12345/$f
diff x lab4data/$f
echo -n 'Make sure "diff" comand above did not print anything, then press any key to continue... '
set junk=$<
end
foreach f (hamlet.txt random.garbage rfc7540.txt rfc793.txt)
wget -O x http://localhost:12345/$f
diff x lab4data/$f
echo -n 'Make sure "diff" comand above did not print anything, then press any key to continue... '
set junk=$<
end
In the first Terminal, type <Ctrl+c> to kill your server. Then type "exit" to close the transcript.
Alternatively, you can also do everything inside one Terminal and run tmux.
You can split the screen vertically and run the client and server in separate panes.
Part B (lab4b) - simple web client (useful for PA2):
The client part of your lab4a has the basic structure of a simple web client (if you did Part A correctly)!
All you have to do is to change it to handle commandline arguments a little differently, send an "HTTP request message" to a web server,
parse the "HTTP response message" received from the web server, save the HTTP message body in a (binary) file, then shutdown and close the connection.
The executable of your web client for this lab exercise must be named "lab4b"
(please modify the Makefile so that when the user types "make lab4b",
this executable will be generated).
It should be able to talk to any standard web server. The usage information (i.e., commandline syntax)
for "lab4b" is as follows:
lab4b -c HOST PORT URI OUTPUTFILE
where "-c" is required (to indicate that you are runing a client program),
HOST is a host name of the web server, PORT is the port number the web server is listening on,
URI is the string that goes into the 2nd field in the "request line" in the HTTP request header,
and OUTPUTFILE specifies where the downloaded content should go.
Please note that for this lab exercise, if the first character in the URI is not the forward-slash ('/') character,
it's not an error and you must prepend a '/' character to the URI in the "request line" you will send.
The last character in the above URI must not be the forward-slash ('/') character.
If the last character in URI is '/', please print an appropriate error message and quit your program
immediately without sending a request to the server.
(Please note that some of these checks ideally should be done by a web server. We are doing these checks in the client simply because this is a lab exercise.
For example, in PA2, it would be the server that's checking whether the last character in URI is '/' or not.)
Please note that if you run lab4b with:
lab4b PORT
since there is no "-c" immediately after lab4b, your lab4b should behave identical to
your lab4a in Part A of this lab
(since we are using the same executable for both the client and the server).
To write code for this part of the lab,
you must start by copying "lab4a.cpp" (from Part A above) into "lab4b.cpp"
Your client in lab3d
(and therefore, lab4a) only talks to a server on "localhost".
For this lab exercise, you should replace the "localhost" in your code with the "HOST" commandline argument mentioned above.
Your HTTP request should look like the following
(for readability, I have put them on separate "lines"; but please understand that they are just one long stream of characters):
GET URI HTTP/1.1\r\n
User-Agent: lab4b\r\n
Accept: */*\r\n
Host: HOST:PORT\r\n
\r\n
where URI, HOST, and PORT are from the commandline arguments and your HTTP request body must be empty.
Please print the entire HTTP request header (including the last empty line) to cout.
You should indent these lines by preceeding every line with a <TAB> character so that they stand out.
When you get a response back from the web server, you must print the entire HTTP response header (including the last empty line) to cout
(and indent every line with a <TAB> character).
You also must parse the first line in the HTTP response header for a HTTP version string, followed
by a space character, followed by a status code ("200" means "OK", "404" means "not found", etc.). You should ignore the remaining
characters in the first line. If the status line looks valid
(i.e., has a valid HTTP version string and a status code), you must parse every line in the HTTP response header into
KEY/VALUE pairs. If KEY is "Content-Length" (case insensitive), the corresponding VALUE
is the number of bytes of binary data in the HTTP response body that you must save into OUTPUTFILE.
If there is no "Content-Length" key in any of the HTTP response header lines, you must
print an error message and shutdown and close the connection and quit your program without reading any additional data from the socket.
When you are done with implementing lab4b, please do the following:
Part C (lab4c) - run your simple web client against your simple web server (useful for PA2):
This part has no coding (other than maybe bug fixing).
When you are done with Part A and Part B above, please do the following:
- Change directory into the "lab4" directory mentioned above.
Type "script lab4c.script" to start a transcript.
(If your command shell is bash, the "foreach" command below will not work and you should first type "tcsh" to
change your command shell to tcsh before proceeding.)
Then type:
./lab4b 12345 &
Please note that there is an "&" character at the end of the above command. This tells the command shell to run the command in the background.
When you run a command in the background, your command shell will print a prompt to indicate that it's ready to run another command.
- Type the following commands:
foreach f (textbooks-2-small.jpg textbooks-3-small.jpg usc-seal-1597x360.png viterbi-seal-rev-770x360.png)
./lab4b -c localhost 12345 /$f x
chmod 600 x
diff x lab4data/$f
echo -n 'Make sure "diff" comand above did not print anything, then press any key to continue... '
set junk=$<
end
foreach f (hamlet.txt random.garbage rfc7540.txt rfc793.txt)
./lab4b -c localhost 12345 /$f x
chmod 600 x
diff x lab4data/$f
echo -n 'Make sure "diff" comand above did not print anything, then press any key to continue... '
set junk=$<
end
fg
Please note that the "fg" above is the "foreground" command. It brings the command that's running in the background into the foreground.
This way, you can kill it with a <Ctrl+c>.
- Press <Ctrl+c> to kill the simple web server.
Type "exit" to close the transcript.
Part D (lab4d) - persistent HTTP connection (useful for PA2):
In this part of the lab, we will continue with the code you have in Part A and Part B of this lab
and make the client and the server handle persistent HTTP connections.
Please first do the following:
- Copy "lab4b.cpp" (from Part B above) into "lab4d.cpp"
- Modify your Makefile from Part B above
so that when you type "make lab4d" in the commandline, the executable lab4d will be created.
The usage information (i.e., commandline syntax) for running lab4d as a web server is as follows:
lab4d PORT
where PORT is a TCP port number your server must listen on.
The usage information (i.e., commandline syntax) for running lab4d as a web client is as follows:
lab4d -c HOST PORT URI1 OUTPUTFILE1 [URI2 OUTPUTFILE2 ...]
where HOST is a host name of the web server, PORT is the port number the web server is listening on,
URI1, URI2, ... are the URIs you must request to download from the web server using a single persistent HTTP connection
(i.e., send and receive multiple HTTP request messages and response messages over the same connection),
and OUTPUTFILE1, OUTPUTFILE2, ... are where the corresponding downloaded contents should go.
Of course, the number of URIs must match the number of OUTPUTFILEs in the commandline arguments.
Modify the server part:
In Part A of this lab, your server reads an HTTP request message from the client,
sends an HTTP response message to the client, and then shutdowns and closes the socket. For this part of the lab,
your server needs to stay in an infinite loop and alternate between reading an HTTP request message and sending
an HTTP response message over the same connection. You keep doing so until the read() system call
returns either a zero (which means that the client has closed the connection) or a (-1) (which
means that the connection was broken somehow). Then you break out of the infinite loop and then you shutdown and close the connection.
The wget client uses a persistent HTTP connection to download multiple files from a web server.
This happens when it first downloads an HTML file. After it has downloaded an HTML file, it will
parse the HTML file to look for embedded images. If there are embedded images, it will use the
same connection to download these images one at a time by sending HTTP request messages.
If your server is listening on port 12345 and you run the following comand:
wget -r -l 1 http://localhost:12345/persistent.html
you should get the following HTTP request (for readability, I have put them on separate "lines"; but please
understand that they are just one long stream of characters):
GET /persistent.html HTTP/1.1\r\n
User-Agent: Wget/1.17.1 (linux-gnu)\r\n
Accept: */*\r\n
Accept-Encoding: identity\r\n
Host: localhost:12345\r\n
\r\n
For this lab, when you parse the request line in the HTTP request header,
you must look at the file name extension in the URI (i.e., everything after the last period in the URI part of the request line).
In the above example, the URI in the request line is the string "/persistent.html".
Therefore, the file name extension in the URI is the string "html".
If the file name extension in the URI is "html" (case-insensitive) and you are sending back
a 200 OK HTTP response message, the "Content-Type" key in the HTTP response header must have a value of "text/html".
This is equivalent as saying that the last 5 characters in URI is ".html" (case-insensitive).
Since "lab4data/persistent.html" is 186 bytes long, if you get the above request,
you must send a 200 OK HTTP response message with the following response header (including the empty line):
HTTP/1.1 200 OK\r\n
Server: lab4a\r\n
Content-Type: text/html\r\n
Content-Length: 186\r\n
\r\n
The HTTP response header must be immediately followed by 186 bytes of binary data
which corresponds to the content of the "lab4data/persistent.html" file.
If the file name extension in the URI is anything else, you should send back the same HTTP response message as
Part A of this lab.
No matter what response you have sent, after sending the response message,
you must go back to the top of the infinite loop to read the next HTTP request message
from the client and then send back an HTTP response message, and so on.
You must not shutdown or close the connection after each message.
As in Part A of this lab,
please print (to cout) the HTTP request header you have received and the HTTP response header you have sent,
each line indented by a <TAB> character.
Modify the client part:
In Part B of this lab, your client sends an HTTP request message to the server,
reads an HTTP response message from the server, and shutdowns and closes the socket. For this part of the lab,
your client must send an HTTP request message for URI1 to the server,
reads HTTP response message from the server and save the HTTP response body into OUTPUTFILE1,
send an HTTP request message for URI2 to the server using the same socket,
reads HTTP response message from the server and save the HTTP response body into OUTPUTFILE2, and so on.
When your client have exhausted the commandline arguments, it must shutdown and close the socket and self-terminate.
As in Part B of this lab,
please print (to cout) the HTTP request header you have sent and the HTTP response header you have received.
When you are done with implementing all the above, please do the following:
- Type "script lab4d1.script" to start a transcript. Then type:
uname -a
cat /etc/os-release
make clean
make lab4d
./lab4d 12345
- Start another Terminal window and cd into the same "lab4" directory.
- In the second Terminal window, type:
wget -r -l 1 http://localhost:12345/persistent.html
- Wait for download to finish then do:
foreach f (persistent.html textbooks-2-small.jpg textbooks-3-small.jpg usctommy.gif)
diff localhost:12345/$f lab4data/$f
echo -n 'Make sure "diff" comand above did not print anything, then press any key to continue... '
set junk=$<
end
The above command should produce no printout. If there is any printout, please fix your server code and try again.
- Remove the download directory by doing the following:
/bin/rm -rf localhost:12345
Continue with the following in the 2nd window:
Continue with the following in the 2nd window:
Do the following in the first window:
- Press <Ctrl+c> kill the server.
- Type "exit" to close the transcript.
Alternatively, you can also do everything inside one Terminal and run tmux.
You can split the screen vertically and run the client and server in separate panes.
Part E (lab4e) - wireshark (may be useful for debugging PA2):
This part has no coding and nothing to turn in.
As with Part E of Lab 3, if you are running on a shared server (such as viterbi-scf1.usc.edu or viterbi-scf2.usc.edu),
please skip this part of the lab if you cannot run wireshark.
Repeat the first part of Part E of Lab 3.
This time, identify the request line and identify all the "lines" in the HTTP request header
all the way to the "empty line".
Click on the 2nd HTTP message (which corresponds to the HTTP response message sent from the server back to the client).
Identify the status line and identify all the "lines" in the HTTP response header
all the way to the "empty line". Find the "Content-Length" KEY and the corresponding VALUE and verify that
this VALUE is exactly the number of bytes in the response body.
Change the filter value to "tcp.port == 12345 && ip.addr == 127.0.0.1"
to inspect the "application level" data being exchanged between
your client application and the server application in Part C and Part D of this lab.
This can be helpful to debug your code in case you have sent extra bytes of data or you have skipped some data.
Make sure that there are no null characters (i.e., '\0') in an HTTP request or response header and that every line
in the HTTP request or response header is terminated with "\r\n" and make sure that you can identify the empty line
that defines the end of an HTTP request or response header.
Tshark
If you are running on AWS Free Tier, wireshark there is either very very slow or
it crashes over VNC.
In this case, you can use tshark, which is basically wireshark without the graphical user interface or an interactive user interface.
Using tshark, you can capture all the packets just like wireshark and have them go into a file. When you are done capturing
all the data, run tshark to print everything you would see in wireshark so you can inspect all the packets that were captured.
Let's try the following:
- Open two Terminal windows and type the follwoing into the first window to capture TCP data created when you download "index.html"
from merlot.usc.edu.
tshark -i any -w test.pcap -f "host 68.181.32.44" -f "tcp port 80"
In the above command, "any" refers to any interface
and the argument following the "-w" commandline option is the name of the output file.
Please use the file name extension ".pcap" to mean that the file is a "raw data capture (binary) file".
- In the 2nd Terminal windows, type the following to run your lab3c echo server:
wget -O x http://68.181.32.44/
- In the first window, press <Ctrl+C> to kill the tshark program, then type the following to see a top-level summary of the packets you have captured:
tshark -r test.pcap --color
- In the 2nd window, type the following to create a full dump of the packets you have captured and we will send the printout into a text file:
tshark -r test.pcap -V -x > test.out
- You can open test.out with a text editor to examine what's in every captured packet.
- In the first window, look for HTTP frames in the summary printout and find the corresponding frame in test.out
and see if you can find an HTTP message near the end of that frame.
For example, you may see something like the following in the hexdump portion of a frame (the colors are mine):
0000 00 04 00 01 00 06 02 4c f1 9f ae 47 00 00 08 00 .......L...G....
0010 45 00 00 b5 53 8e 40 00 40 06 75 c5 0a 00 02 0f E...S.@.@.u.....
0020 44 b5 20 2c 81 ec 00 50 b8 44 f6 3f 84 d4 0a 02 D. ,...P.D.?....
0030 50 18 72 10 71 97 00 00 47 45 54 20 2f 20 48 54 P.r.q...GET / HT
0040 54 50 2f 31 2e 31 0d 0a 55 73 65 72 2d 41 67 65 TP/1.1..User-Age
0050 6e 74 3a 20 57 67 65 74 2f 31 2e 31 37 2e 31 20 nt: Wget/1.17.1
0060 28 6c 69 6e 75 78 2d 67 6e 75 29 0d 0a 41 63 63 (linux-gnu)..Acc
0070 65 70 74 3a 20 2a 2f 2a 0d 0a 41 63 63 65 70 74 ept: */*..Accept
0080 2d 45 6e 63 6f 64 69 6e 67 3a 20 69 64 65 6e 74 -Encoding: ident
0090 69 74 79 0d 0a 48 6f 73 74 3a 20 6d 65 72 6c 6f ity..Host: merlo
00a0 74 2e 75 73 63 2e 65 64 75 0d 0a 43 6f 6e 6e 65 t.usc.edu..Conne
00b0 63 74 69 6f 6e 3a 20 4b 65 65 70 2d 41 6c 69 76 ction: Keep-Aliv
00c0 65 0d 0a 0d 0a e....
Using the same technique as in Lab 3, you can identify the IP header (in blue), the TCP header (in green) with
a "header length" of 5 (in red), and application data (in orange).
All pseudo-code is incomplete and error checking is often left out in pseudo-code.
Feel free to send your questions (and not your code) to the instructor.
It's very important that you check for error conditions so you can break out certain infinite loops.
Pseudo-code for lab4d server (not necessarily complete):
do forever /* in each iteration, handle one persistent client connection */
socket_fd = my_accept()
talk_to_client(socket_fd)
shutdown(socket_fd)
close(socket_fd)
end-do
Pseudo-code for talk_to_client(socket_fd) for lab4d server:
do forever /* in each iteration, read one request and send one response */
do forever /* this loop reads all lines in a request header */
line = read_a_line(socket_fd)
if first line then
uri = parse(line)
else if line is "\r\n" then
break;
end-if
end-do
fd = open_file_for_reading(uri)
write response header and blank line into socket_fd
do forever
data = read(fd, 1024);
if data valid then
write(socket_fd, data, data.size);
else
break;
end-if
end-do
close(fd)
end-do
Please note that data.size above refers to the return value of the read() system call
when it returns the number of bytes read.
Please also note that the above is just pseudo-code and you cannot really write code this way
because there is no such thing as data.size!
When you read from a file using the read() system call, you must use the return
value of read() to know how many bytes of data was read from the file and store
that information inside a local variable. That's what data.size above is referring to.
Please see ReadBinaryFromSocket() in the PA2 FAQ for more detail.
The above pseudo-code has two inner infinite loops inside the outer infinite loop.
It's very important to follow this recipe to read an entire request message before proceeding to the 2nd inner infinite loop!
Some students decide to just read the first line from the socket in the first inner infinite loop because all the other lines are "not useful".
Please do not do that because I have seen cases on Mac OS X machines where if you do that, your code may not function properly.
Apparently, the remaining data in the socket can cause problem in the 2nd inner infinite loop! This is really not supposed to happen.
But unfortunately, it does. Therefore, it's best if you read an entire request message before you send a response message.
It's highly recommended that you write a function to
read all the lines in a request header (plus the empty line) and have this function return
an object that represents a request message that you have received.
This function must not return until it has read an entire request message
from the socket. This function needs to be very precise in the sense that
it must not read an extra byte of data from the socket and it must not
miss a single byte of data from the socket. Once you are confident that
this function works perfectly, you can use this function or modify this
function to read other messages in future labs and assignments.
socket_fd = create_client_socket_and_connect()
for j = 1 to K do
write request header and blank line into socket_fd to request URIj
do forever /* this loop reads all lines in response header */
line = read_a_line(socket_fd)
if line is first line then
/* do nothing */
else if line is "\r\n" then
break;
else
(key, value) = parse(line)
if key is "Content-Length" then
content_length = value
end-if
end-if
end-do
fd = open_file_for_writing(OUTPUTFILEj)
bytes_left = content_length
while bytes_left > 0 do
if bytes_left > 1024 then
data = read(socket_fd, 1024);
else
data = read(socket_fd, bytes_left);
end-if
write(fd, data, data.size)
bytes_left = bytes_left - data.size
end-do
end-for
The code for open_file_for_writing() is in "lab4data/copoyfile.cpp".
You should copy the code for open_file_for_reading() and open_file_for_writing() into your code.
Please note that data.size above refers to the return value of the read() system call
when it returns the number of bytes read.
It's highly recommended that you write a function to
read all the lines in a response header (plus the empty line) and have this function return
an object that represents a header of a response message that you have received.
This function must not return until it has read an entire response header (including the empty line)
from the socket. This function needs to be very precise in the sense that
it must not read an extra byte of data from the socket and it must not
miss a single byte of data from the socket. Once you are confident that
this function works perfectly, you can use this function or modify this
function to read other messages in future labs and assignments.
(It also should be clear that you can use this function to read a request message mentioned above!)
Below is the grading breakdown:
- (1 pt) submitted a valid lab4.tar.gz file with all the required files using the submission procedure below
- (1 pt) contents in "lab4a.script", "lab4b.script", and "lab4c.script" are correct
- (1 pt) contents in "lab4d1.script", "lab4d2.script", and "lab4d3.script" are correct
- (1 pt) "Makefile" works for "make lab4a", "make lab4b", and "make lab4d"
- (1 pt) source code of your simple web server/client program in "lab4a.cpp", "lab4b.cpp", and "lab4d.cpp" looks right
Minimum deduction is 0.5 pt for anything that's incorrect.
Please note that for the " Makefile" item, you can only get credit for it if your "source code" is relevant to this lab; therefore, you can only get as many points as the "source code" item
in the best case.
Please keep in mind that even though lab grading is "light", it doesn't mean that you can just put anything
into your submission! It's still your responsibility to make sure that the files in your submission contains
information that's relevant to the tests you were supposed to run.
Use the "more" command to view your script/log files to make sure that they contain the right information.
If a file has the wrong stuff in it, you should delete it and create the file again and verify.
If most of the stuff in your script/log files are wrong and you did not notice it, we will most likely have to take points off.
To submit your work, you must first tar all the files you want to submit into a tarball and
gzip it to create a gzipped tarfile named " lab4.tar.gz".
Then you upload " lab4.tar.gz" to our Bistro submission server.
Change into the "lab4" directory you have created above and enter the following command
to create your submission file "lab4.tar.gz" (if you don't have any ".h" files, don't include "*.h*" at the end):
tar cvzf lab4.tar.gz lab4*.script Makefile *.c* *.h*
ls -l lab4.tar.gz
The last command shows you how big the created " lab4.tar.gz" file is.
If " lab4.tar.gz" is larger than 1MB in size, the submission server will not accept it.
If you use an IDE, the IDE may put your source code in subdirectories. In that case,
you need to modify the commands above so that you include ALL
the necessary source files and subdirectories (and don't include any binary files)
ane make sure that your code can be compiled without the IDE since the grader is not allowed to use an IDE to compile your code.
You should read the output of the above commands carefully to make sure that "lab4.tar.gz" is created properly.
If you don't understand the output of the above commands, you need to learn how to read it!
It's your responsibility to ensure that "lab4.tar.gz" is created properly.
To check the content of "lab4.tar.gz", you can use the following command:
tar tvf lab4.tar.gz
Please read the output of the above command carefully to see what files were included in " lab4.tar.gz"
and what are their file sizes and make sure that they make sense.
Please enter your USC e-mail address and your submission PIN below. Then click on the Browse button
and locate and select your submission file (i.e., "lab4.tar.gz").
Then click on the Upload button to submit your "lab4.tar.gz".
(Be careful what you click! Do NOT submit the wrong file!)
If you see an error message, please read the dialogbox carefully and fix what needs to be fixed and repeat the procedure.
If you don't know your submission PIN, please visit this web site to have your PIN e-mailed to your USC e-mail address.
When this web page was last loaded, the time at the submission server at merlot.usc.edu was
27Nov2025-18:59:25.
Reload this web page to see the current time on merlot.usc.edu.
If the command is executed successfully and if everything checks out,
a ticket will be issued to you to let you know "what" and "when"
your submission made it to the Bistro server. The next web page you
see would display such a ticket and the ticket should look like
the sample shown in the submission web page
(of course, the actual text would be different, but the format should be similar).
Make sure you follow the Verify Your Ticket instructions
to verify the SHA1 hash of your submission to make sure what you did not accidentally submit the wrong file.
Also, an e-mail (showing the ticket) will be sent to your USC e-mail address.
Please read the ticket carefully to know exactly "what" and "when"
your submission made it to the Bistro server.
If there are problems, please contact the instructor.
It is extreme important that you also verify your submission
after you have submitted "lab4.tar.gz" electronically to make
sure that every you have submitted is everything you wanted us to grade.
If you don't verify your submission and
you ended up submit the wrong files, please understand that due to our fairness policy,
there's absolutely nothing we can do.
Finally, please be familiar with the Electronic Submission Guidelines
and information on the bsubmit web page.
|