Department of Information and Computing Sciences

Departement Informatica Onderwijs
Bachelor Informatica Informatiekunde Kunstmatige intelligentie Master Computing Science Game&Media Technology Artifical Intelligence Human Computer Interaction Business Informatics

Onderwijs Informatica en Informatiekunde

Vak-informatie Informatica en Informatiekunde

Data science and society

Website:website containing additional information
Course code:INFOMDSS
Credits:7.5 ECTS
Period:period 1 (week 36 through 45, i.e., 3-9-2018 through 9-11-2018; retake week 1)
Participants:up till now 120 subscriptions
Schedule:Official schedule representation can be found in Osiris
innovatie          Marco Spruit
lecture   Tue 15.15-17.0037-44 RUPPERT-PAARS Marco Spruit
Matthieu Brinkhuis
Thu 11.00-12.4536 UNNIK-GROEN
tutorial group 1 Tue 13.15-15.0037-44 RUPPERT-111 #SSOI
studentassistent MK
group 2 Tue 13.15-15.0037-44 RUPPERT-B
group 3 Tue 13.15-15.0037-44 RUPPERT-C
Contents:At the end of this course, you will be able to:
  1. Understand the role of data science and its societal impact
  2. Recognise the knowledge discovery processes in applied data science
  3. Identify trends and developments in big data technologies
  4. Apply selected big data technologies to solve real-world problems

The following study materials are required readings for the written exams, next to all lecture slides:

Mid-term End-term REQUIRED Literature
X - White, J. (2015). Hadoop: The Definitive Guide. 4th edition. O'Reilly.
[CH 1 (Hadoop), CH 2 (MapReduce), CH 3 (HDFS), CH 19 (Spark)]
- X Chambers, B., & Zaharia, M. (2018). Apache Spark - The Definitive Guide. O'Reilly.
[CH 1 (About), 2 (Overview), 3 (Toolset), 10 (Spark SQL)]
X X Pritzker, P., and May, W. (2015). NIST Big Data interoperability Framework (NBDIF): Volume 1: Definitions. NIST Special Publication 1500-1. Final Version 1. National Institute of Standards and Technology.
[esp. CH 2, Appendix A (Definitions)]
X - Dean, J., & Ghemawat, S. (2008). MapReduce: simplified data processing on large clusters. Communications of the ACM, 51(1), 107-113.
[All pages: provides a scientific introduction to MapReduce]
X - Spruit, M., & Lytras, M. (2018). Applied Data Science in Patient-centric Healthcare: Adaptive Analytic Systems for Empowering Physicians and Patients. Telematics and Informatics, 35(4), 643–653.
[All pages: provides a scientific introduction to the research field of applied data science, including analytic systems and meta-algorithmic modelling]
- X Lazer, D., Kennedy, R., King, G., & Vespignani, A. (2014). The parable of Google Flu: traps in big data analysis. Science, 343(6176), 1203-1205.
- X Broniatowski, D., Paul, M., & Dredze, M. (2014). Twitter: big data opportunities. Science, 345(6193), 148-148.
[Discusses Lazer et al. (2014)]

In addition, the following two materials are considered to be the course foundation, and are therefore considered to be required background reading:

Mid-term End-term REQUIRED BACKGROUND Literature
X - Davenport, T. H., & Patil, D. J. (2012). Data scientist: The Sexiest Job of the 21st Century. Harvard business review, 90(5), 70-76.
[All pages: Inspirational introduction to the course]
X - Stair, R. & Reynolds, G. (2012 or newer). Fundamentals of Information Systems. Sixth Edition. Cengage: Boston, MA. ISBN-13: 978-0-8400-6218-5.
[CH 1 (Information systems overview), CH 3 (Database systems and Business Intelligence)]
X X Chapman, P. Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., and Wirth, R. (2000). CRISP-DM 1.0 Step-by-step Data Mining Guide.
[esp. CH 1 (Introduction), CH 2 (Reference model)]

Finally, various literature is recommended troughout the course, including but not limited to:

Mid-term End-term RECOMMENDED Literature
Manyika, J. (2011). Big data: The next frontier for innovation, competition, and productivity. McKinsey Global Institute, McKinsey & Company.
Linden, A., Krensky, P., Hare, J., Idoine, C., Sicular, S., & Vashisth, S. (2017). Magic Quadrant for Data Science Platforms. Gartner.
Spruit,M., & Jagesar,R. (2016). Power to the People! Meta-algorithmic modelling in applied data science. In Fred,A. et al. (Ed.), Proc. 8th Int.Conf. on Knowledge Discovery (pp. 400–406). KDIR 2016, November 11-13, 2016, Porto, Portugal: ScitePress.
[Introduces Applied Data Science and Meta-Algorithmic Modelling]
Ghemawat, S., Gobioff, H., & Leung, S. (2003). The Google file system. SIGOPS Operating Systems Review, 37(5), 29-43.
[Provides a scientific overview of HDFS's predecessor GFS]
Cattell, R. (2011). Scalable SQL and NoSQL data stores. ACM Sigmod Record, 39(4), 12-27.
[Surveys both SQL and NoSQL database systems]
Ambrose, M. (2015). Lessons from the avalanche of numbers: big data in historical perspective. I/S: a journal of law and policy for the information society. (ISJLP), 11, 201.
[The Big Data revolution from a historical perspective]
Course form:There will be 6 contact hours per week. One workshop 2-hour slot to practice with big data tools (Hadoop and Spark with R and Python within an Azure environment), and two lecture 2-hour slots for both regular and guest lectures, to respectively investigate big data technologies and their societal impact.

The following assignments are among the key parts of the course:

  • Book review: Explore data science and its societal impact
  • Mid-term e-exam on data engineering with Hadoop
  • End-term e-exam on data analytics with Spark
The weekly course schedule can be found here:
Exam form:The graded deliverables generate the final course grade as follows:
[A] Book review
[B] Mid-term exam
[C] End-term exam
[D] Optional bonus for extraordinary participation/performance

Grade = [A]*0.10 + [B]*0.40 + [C]*0.50 + [D]

Minimum effort to qualify for 2nd chance exam:
  1. All components need to be graded with a 4.0 of higher in order to qualify for the 2nd chance exam. The 2nd chance exam grade will then replace either component B or C. It may also include an extensive market survey report assignment next to a second chance online exam.
  2. You need to have at least completed either B or C in order to qualify for the all-encompassing 2nd chance exam.
  3. If all grades are 4.0 or higher, and your final grade according to the course grade formula is 5.5 or higher, you will pass the course (without a second chance exam).
Description:This is the introductory course for the Applied Data Science profile, the Applied Data Science postgraduate MSc programme, and the Business Informatics (MBI) programme. As such, it's primary objective is to inspire and introduce you to the emerging domain of Applied Data Science from a Big Data Technologies perspective.

Communication takes place privately on MS Teams in the infomdss group.

NB: Self-study programming support is supported for free, thanks to the DataCamp for the Classroom intuitive learning platform.