Note that Part 1 may be completed any time before the due date -- there is no additional information you need to fill out on the submit form for Part 1. Using Telnet, submit a valid HTTP 1.1 GET request for the resource whose URL is http://www.courses.fas.harvard.edu/~cscie12/assgn/part1a.cgi In order to identify yourself, send your FAS username as the "User-Agent" in your HTTP request. For example, the FAS user jharvard would include the following HTTP request header:
User-Agent: jharvard
Use your FAS username in place of "jharvard". If you do not include this information, you will not receive credit! Using Telnet, submit a valid HTTP 1.1 HEAD request for the resource whose URL is http://www.courses.fas.harvard.edu/~cscie12/assgn/part1b.cgi In order to identify yourself, send your FAS username as the "User-Agent" in your HTTP request. For example, the FAS user jharvard would include the following HTTP request header:
User-Agent: jharvard
Use your FAS username in place of "jharvard". If you do not include this information, you will not receive credit! Using Telnet, submit a valid HTTP 1.1 GET request for the resource whose URL is http://www.courses.fas.harvard.edu/~cscie12/assgn/part1c.cgi In order to identify yourself, send your FAS username in the query string, using the parameter name FASusername and the value being your FAS username. The parameter name and value are case sensitive! Extra Credit: Using Telnet, submit a valid HTTP 1.1 POST request for the resource whose URL is http://www.courses.fas.harvard.edu/~cscie12/assgn/part1d.cgi In order to identify yourself, send your FAS username in the body of the HTTP request, using the parameter name FASusername and the value being your FAS username. The parameter name and value are case sensitive! Using Telnet, submit a valid HTTP 1.1 GET request for the resource whose URL is http://www.courses.fas.harvard.edu/~cscie12/assgn/part1e.cgi In order to identify yourself, send your FAS username as the value of a Cookie, whose name is FASusername. The cookie name and value are case sensitive! Using Telnet, submit a valid HTTP 1.1 GET request for the resource whose URL is http://www.courses.fas.harvard.edu/~cscie12/assgn/part1f/ This resource is restricted, and you will need to use the username "cscie12" and the password "webdev" (all lowercase, no quotes) to access it. In order to identify yourself, send your FAS username as the "User-Agent" in your HTTP request. When does the resource located at http://www.courses.fas.harvard.edu/~cscie12/assgn/part2.html expire? Create a directory and put a simple HTML file and an image in it. Using directives in an .htaccess file, make the HTML file expire 5 minutes after it has been accessed and make the image expire 1 day after it has been accessed. The content of the HTML file and the image are not important -- the only important characteristic is the expiration time. Important Hint and Help for Part 3 Create custom error documents for your Web site on your FAS account. You should have custom error documents for status codes of 404 and 401. Note that IE may display its own "error page" instead of your custom one, if your custom page is small (in terms of file size). Also, lwp-request may display its own error message instead of your custom one. These "features" are a reason why everyone should know how to use telnet to be an HTTP client! Create a directory within your public_html directory, put a simple HTML file in it, and restrict access to the directory such that requests coming from domain .w3.org are allowed. Requests coming from all other domains should be denied. Create another directory within your public_html directory, put a simple HTML file in it and restrict access to the directory using Basic authentication. Allow the user gradebot with the password letmein to be allowed access to the directory via HTTP Basic authentication.
Saturday, August 13, 2011
Assignment 6 Extension: Due Monday, May 6
Assignment 7: Link Checking and Log Analysis
Your task is to find the links that result in a non-200 HTTP status response (e.g. that result in 404 Not Found, 301 Moved Permanently, 302 Moved Temporarily). I recommend that you use checkbot to do this. ice% ~cscie12/bin/chekbot --helpCheckbot 1.66 command line options: --debug Debugging mode: No pauses, stop after 25 links. --verbose Verbose mode: display many messages about progress. --url url Start URL --match match Check pages only if URL matches `match' If no match is given, the start URL is used as a match --exclude exclude Exclude pages if the URL matches 'exclude' --ignore ignore Do not list error messages for pages that the URL matches 'ignore' --file file Write results to file, default is checkbot.html --mailto address Mail brief synopsis to address when done. --note note Include Note (e.g. URL to report) along with Mail message. --proxy URL URL of proxy server for external http and ftp requests. --internal-only Only check internal links, skip checking external links. --sleep seconds Sleep for secs seconds between requests (default 2) --timeout seconds Timeout for http requests in seconds (default 120) --interval seconds Maximum time interval between updates (default 10800) --dontwarn codes Do not write warnings for these HTTP response codes --enable-virtual Use only virtual names, not IP numbers for serversOptions --match, --exclude, and --ignore can take a perl regular expressionas their argumentUse 'perldoc checkbot' for more verbose documentation.Checkbot WWW page : http://degraaff.org/checkbot/Mail bugs and problems: checkbot@degraaff.orgCheckbot will produce output in HTML (filename of "checkbot.html") in the directory in which you start checkbot. So, as an example, you could do the following: make a directory for your checkbot results cd to the directory change permissions for the directory run checkbot --verbose will let you see what checkbot is doing --sleep 0 will cause checkbot to not pause between requests (it will finish faster) change permissions on the HTML files that checkbot produceed view the results from a web browser. ice% mkdir ~/public_html/checkbotice% cd ~/public_html/checkbotice% chmod a+rx ./ice% ~cscie12/bin/checkbot \? --verbose --sleep 0 \? --url http://www.courses.fas.harvard.edu/~cscie12/haystack/...output not shown...ice% lscheckbot-www.courses.fas.harvard.edu.htmlcheckbot.html ice% chmod a+r *.htmland now, view the "checkbot.html" page with a web browser. The report details will be in the "checkbot-www.courses.fas.harvard.edu.html" page.
For each link that does not give a "200 OK" HTTP response, you will need to report: the URL that gave a non-200 response the status code the URL of the page that contained the linkThe haystack is located at: http://www.courses.fas.harvard.edu/~cscie12/haystack/ You will analyze the log file of this course (/home/c/s/cscie12/logs/cscie12.log.gz) from September through November and provide an "executive" summary (i.e. be short and to the point) of your analysis. Where appropriate, you should link to any reports generated from Analog. Draw some conclusions about the use of the site, don't simply cite numbers.
Note that this log does not contain log entries for the Discussion Group or the Lecture Videos. Also, the hostnames of the machines have been changed to protect privacy -- for example, heitmeyer.mediaone.net might be changed to something like gibnax.mediaone.net
How much was the site used?What areas of the sites were most heavily used? Does this correspond to how you used the site?What weeks, days, times of the day and days of the week showed the most activity? Why?What sites and pages outside of Harvard referred people to the sites?What browsers and versions (and operating systems) were widely used? Based on these statistics, would you recommend reliance on CSS?Log files /home/c/s/cscie12/logs/cscie12.log.gz ("combined log format") I have already generated an Analog Report from the above log file. (note the only report you will need to generate is for the "browser summary"). Hints and reminders You should run analog on the "ice" machines only.I have configured analog (~cscie12/bin/analog) to know where the log file for the course is located -- you do not need to specify a location. The logs are in gzipped format -- analog knows how to handle decompressing these files. If you are curious (good for you!) and want to look at the contents of the file, you can do that with the "zcat" command. Be careful, these files are roughly a quarter million lines long -- you'll want to pipe them through "more" if you just want to look at a few pages. ice% zcat ~cscie12/logs/cscie12.log.gz | moreSimply hit "CTRL-C" when you've had enough. Analog is in ~cscie12/bin/analog You do not need to copy Analog to your directory.You do not need to copy the log file to your directory.The "-A" command turns off all analog reports The "+A" command turns on all analog reports Analog Reports and command line options For example, to turn off all reports ("-A") and produce a text output ("+a") of the "browser summary" report ("+b" see http://www.analog.cx/docs/output.html), the following command would work: ice% ~cscie12/bin/analog -A +a +b | more...output not shown...To turn off all reports and produce an HTML output ("-a") of the "browser summary" report and to direct the HTML output to a file called "browser_summary.html": ice% ~cscie12/bin/analog -A -a +b > browser_summary.html ice% chmod a+r browser_summary.htmlPopular Posts
-
Study Suggests Double Risk of Childhood Obesity for Kids With Fastest Weight Gain by Age 2 Nov. 8, 2011 -- Babies who gain weight most quic...
-
35 States, D.C. and Puerto Rico Submit Applications for the Race to the Top-Early Learning ChallengeThe U.S. Departments of Education and Health and Human Services announced today that 35 states, D.C. and Puerto Rico submitted applications...
-
Have you ever tried to explain human evolution to your children, well, it's not easy, and therefore, there is a very good book I'd ...
-
Student loan forgiveness programs offer many debt elimination options to college graduates who have taken out varying amounts of student lo...
-
GLEEFUL The Beelzebubs of Tufts University have enjoyed a little TV fame. COLLEGIATE a cappella is enjoying a moment — perhaps its bigge...