Course Project Guideline - New
Course Project Guideline - New
Data Analysis
In big data processing, data analysis is a very critical step since it draws conclusions from
new datasets that are important to a specific domain. Nowadays, trending application domain
contains healthcare, insurance, transportation, social media, etc. As a part of this project, you
need to identify: a) A trending topic that is interesting to you. b) Novel and influential analytic
1
methods that are relevant to this topic. (You need to find some papers) c) New datasets that are
not fully explored by data scientists. (Better if the datasets are released within 1 year).
Impact: you may create new data science insights that can impact the society and improve
people’s life. For example, by analyzing the taxi trajectory data carefully, you may help to cut
down the waiting time for each user and make the trip more environment friendly.
Examples: a) A real-time sentiment analysis of twitter feeds with the NASDAQ index (i.e.,
analyze the correlation between tweeter feeds and hourly movements of the NASDAQ index), b)
Use deep learning techniques to predict the stock price.
Self-proposed Projects
If you have some ideas that may not fit exactly what is listed above (e.g., you want to design
a totally new algorithm, or new statistical measures of interestingness, etc), talk to the lecturer.
Impact: The only limit to your impact is your imagination and commitment. (by Tony Robbins).
Examples: a new machine learning platform.
2
https://github.com/openimages/dataset
http://www.cs.cmu.edu/~enron/
https://webobservatory.soton.ac.uk/
……
Regarding sample big data projects, you can refer to the following links. However, we encourage
you to think beyond that and explore significantly new ideas (for example, whether you should
identify a trending topic that is interesting to you, and then apply novel and influential analytic
methods that are relevant to this topic).
https://www.coursera.org/specializations/big-data
http://hadoopproject.com/big-data-projects/
https://blog.kaspersky.com/cool-big-data-projects/8186/
……
5. Project Proposal
The purpose of the project proposal is to provide background material for the work that you
are to complete and to describe the actual experiments and expected results. The proposal should
be sufficiently detailed so that I can understand specifically what you are going to do. An important
part of the proposal is my understanding of your proposed work. If I think the project is too large
I will ask you to trim it down. The following is a general outline for the proposal. All page counts
are for 11pt. font, single spaced, single column.
You need to submit your project proposal to LumiNUS, and name of the file should be include all
student IDs of your team (usually staring with A). For example, if the group has four members,
the file name can be like:
A1234567X-A1234567Y-A1234567Z-A1234567T-proposal.pdf
3
6. Submission Requirements
a) Final Report
You need to submit a report which is at most 20 single-columned page paper on the
problem and solution(s) and what will be demonstrated. All page counts are for 11pt. font,
single spaced, single column.
You should compare the different solutions qualitatively as well as provide an experimental
analysis. In general, the final report should contain the following contents:1. Project
Introduction. This can be the same as for the proposal. 2. Methodology and experimentation.
How did you perform the experimentation? This can be a revision version of the proposal. 3.
Discussion of results. 4. Problems encountered and lesson learnt. 5. Personal contribution
(for each student written by the individual student). 6. Project summary. 7. References. 8.
Workloads of each group member (for example: which member completes which task, which
member writes which section of the report).
b) Code
Code can be in any programming languages (c, c++, java, matlab, R…).
c) File Name
You need to compress all the documents and code into a zip (or rar) file, and name of the
file should be include all student IDs of your team. For example, if the group has four members,
the file name should be:
A1234567X-A1234567Y-A1234567Z-A1234567T-FinalReport.zip
Note: Please ensure that the code and report are complete before submission! You need to
ensure that your work is recoverable by others using the code and report that you provided.
7. Project Presentation
The presentations will be submitted as video presentation. You need to submit your video
and slide to LumiNUS with the following file name (suppose that your group has four members).
Your presentation should cover the same aspects of the project mentioned above.
A1234567X-A1234567Y-A1234567Z-A1234567T-VideoSlide.zip
After the presentation submission, you are encouraged to add more solid results and findings into
the report, besides those presented in oral presentation.
8. Grading Criterion
Although students performing different works, to make the grading work fair and reasonable,
we will evaluate your work from the following perspectives. For each category, we have given
some example issues for your reference.
(a) Linguistic ability: whether the report/slide is well prepared, whether figures and tables are
well presented.
4
(b) Complexity and novelty of the problem (you must carefully review the previous studies on
the same problem): whether the literature review is comprehensive (e.g., including both web
URL and research articles), whether a comparison between the previous studies and the
proposed study is clearly presented, whether you present the challenge/complexity of the
problem.
(c) Tools and algorithms: We expect you to use the systems learnt from the lecture. You are
welcome to use other systems and tools beyond this lecture. The usage and implementation
needs clear presentation.
(d) Comprehensiveness of the analysis and findings: If you focus on system design and
performance analysis, we expect you to have some in-depth system analysis and optimization
reasoning. If you focus on data analytics, you need to carefully support your claims and
findings (say, with different approaches for the same tasks, with multiple data sets, from
different angles of the same data sets).
(e) Impact of your project or findings: Are your finding new? Are your findings
impactful/meaningful?
(f) Datasets used (Large? New?): a reasonable guideline for “Large” is that the data set should
be bigger than 10 GB (many of our current desktop has 4GB-8GB main memory), and the
data set should be released within one year. That means,
a. For the data set size, "over 10GB" is a guideline. it is your job to explain: why the data
set is sufficient for your findings. say, if you want to analyze the long-term behavior of
a service, one day of data is clearly NOT sufficient.
b. For "new", please try to choose the data set released within a year. we are not
interested in the data sets that have been widely studied, unless you can justify that
you will study an old data set from a NEW angle.
(g) Oral presentation and demo: it will be a strong plus to have a demo. Your presentation
should be well prepared.
9. Submission Policies
For all submissions related to course project:
No multiple submissions allowed: Each team should make sure that the team only submit
exactly once. If a team submits two versions, we will retain the earliest version and discard all
later versions. The team will then be grade on the earliest version. For such a reason, if you
want to update your submission, you should delete your old submission first and then submit a
new one.
Policy on late submission: For fairness, reports submitted after the deadline but no more than
48 hours after the deadline will still be graded, with a penalty of 20%. Namely, I will first grade the
report normally, and then multiple the mark by 80% to get the final mark for that report. Reports
submitted more than 48 hours after the deadline will not be accepted and will get 0 mark.
10. Plagiarism
You are reminded that plagiarism is a very SERIOUS offense, and disciplinary action
(including possibility of expulsion from the university) will be taken against any individual or team
5
found plagiarizing. The individual or team that is being plagiarized will also be punished if it is
found to have allowed the work to be plagiarized voluntarily.