i Harvard CS205 - Projects

Projects - Spring 2019

Extreme scale data science at the convergence of big data and massively parallel computing is enabling simulation, modelling and real-time analysis of complex natural and social phenomena at unprecedented scales. The aim of the project is to gain practical experience into this interplay by applying parallel computation principles in solving a compute and data-intensive problem.

Your final project is to solve a data-intensive or a compute-intensive problem with parallel processing on the AWS cloud or on Harvard’s supercomputer: Odyssey (or both!). You will identify a compute or and data science problem, analyse its compute scaling requirement, collect the data, design and implement a parallel software, and demonstrate scaled performance of an end-to-end application.

Rules

Project Requirements: Consider that your project should:

Group Size: Students are required to form teams and to partition the work among the team members. The final project must be done in teams with 4 students each (exceptions by permission of the instructor). You can use the course forum to find prospective team members. You may also find and discuss project ideas on the forum. In general, we do not anticipate that the grades for each group member will be different. However, we reserve the right to assign different grades to each group member if it becomes apparent that one of them put in a vastly different amount of effort than the others.

Project Milestones: There are five milestones for your final project. It is critical to note that no extensions will be given for any of these milestones for any reason. Projects submitted after the final due date will not be graded.

Project Proposal: Your group needs to present a project proposal (and submit the PDF of the presentation) to the teaching staff with the following sections:

The aim of the session is to review each proposal and we may suggest modifications if necessary. Our main concern is the amount of effort a given project will require; either too much or too little is unacceptable. You will have 5, and ONLY 5, minutes to briefly summarize your proposal followed by 5 minutes of discussion time. You have to prepare 2-3 slides for your proposal. We will enforce the 5-minute time limit. This presentation is a chance for you to get feedback.

Project Progress (Design): Your group needs to present a project progress (and submit the PDF of the presentation) to the teaching staff covering the main aspects in the design of the parallel application with the following sections:

The aim of the session is to briefly summarize your progress with a focus on the design part. You will have 5, and ONLY 5, minutes to briefly summarize your proposal. You have to prepare 2-3 slides for your progress. We will enforce the 5-minute time limit. This presentation is a chance for you to get feedback from the teaching staff, and to come up with ways around roadblocks you encounter. It is also a chance for the staff to ensure that your project is on track, and that your project is still in the appropriate-amount-of-work range.

Project Deliverables:

Project Web Site: An important piece of your final project is a public web site that describes all the great work you did for your project. The web site serves as the final project report, and needs to describe your complete project. You can use GitHub Pages, or the README file on the GitHub repository, so you can easily refer to the software at the GitHub repository. You should assume the reader has no prior knowledge of your project and has not read your proposal. It should address the following aspects:

Your web page should include screenshots of your software that demonstrate how it functions. You should include a link to your source code.

Project Software: Your final project can be implemented using any API or programming language you would like. Make your own repository on GitHub with a link to your project web page. Software with evaluation data sets, test cases should be available on the repo. Include a README that describes the code and application files, and how your program should be run. We will be grading these projects on a variety of platforms, so you must include detailed instructions on how to run or compile your code. If we cannot run your application from the instructions included with your submission, we will not be able to grade this portion of your project. Your performance results should be reproducible, so you should provide all the information of the system and the environment needed to reproduce your tests.

Project Presentations: You will have 10, and ONLY 10, minutes to briefly present your project followed by 5 minutes of discussion time. You may prepare 4-5 slides for your summary, but we will enforce the 10-minute time limit. Focus the majority of your presentation on your main contributions rather than on technical details. What do you feel is the coolest part of your project? What insights did you gain? What is the single most important thing you would like to show the class? Upload the presentation to the GitHub repo and on Canvas.

Project Grading: The final project grades are dependent on the following criteria:

Project will be graded on the depth of work undertaken and communication (web site, presentation):


Examples from Previous Years