Building Data Science Education Research Plan for Teacher-Scholars

Breakout session at Electronic Conference On Teaching Statistics

Mine Dogucu, Sinem Demirci, Harry Bendekgey,
Federica Zoe Ricci, Catalina Medina
University of California Irvine

2024-06-12

Hello

A headshot of a woman with curly, short, ear-length hair with green eyes and red lipstick

Mine Dogucu


minedogucu.com
mdogucu
MineDogucu
mastodon.social/@MineDogucu
minedogucu

A headshot of a woman with curly, short, shoulder-length hair with green eyes

Sinem Demirci


sinemdemirci.github.io
sinemdemirci
sinemmdemirci
drsinemdemirci
sdemirci@calpoly.edu

A picture of a smiling man outside with short brown hair

Harry Bendekgey


hbendekgey.me
hbendekgey
hbendekgey
hbendekg@uci.edu

A headshot of a smiling woman with long light-brown hair and blue eyes

Federica Zoe Ricci


federicazoe.github.io
federicazoe
federica-zoe-ricci
fzricci@uci.edu

A headshot of a smiling woman with long brown hair and hazel eyes

Catalina Medina


catalinamedina.github.io
CatalinaMedina
catalina-mari-medina
catalmm1@uci.edu

Who are you?

  • Someone who possibly teaches / will teach data science/statistics.

  • You can have 0 publications or 3422429406 publications in statistics and data science education.

  • Someone who is respectful of others in the statistics and data science education community.

Chat Drop

  • Location

  • Institution

Outline of the Session

Activity Time
Past Share Research Findings 15 min
Present Small Group Discussions in Breakout Rooms 30 min
Present Group Discussion in Main Room 20 min
Future Your contributions to the field

Goals of the Session

  • Familiarize with different research examples from undergraduate data science education literature

  • Generate your own research ideas as well as research plan through discussions in breakout rooms

  • Make connections with teacher-scholars who have similar research interests

Research Study

Funding Acknowledgment

Dogucu and Medina have been supported by NSF IIS award #2123366.

Demirci has been supported by the Scientific and Technological Research Council of Türkiye.

Bendekgey and Ricci have been supported by the HPI Research Center in Machine Learning and Data Science at UC Irvine.

Overview

What is undergraduate data science research about?

specify current evidence and knowledge gaps in undergraduate data science education

inform policymakers and data science educators/practitioners about the present status of data science education

We conducted a systematic literature review to examine this topic.

Methods

S
T
E
P

1

  • We searched “data science education” (in quotes) in at least one of the following fields: title, abstract, keywords.

  • We used six databases and included all the publications published until December 2022.

  • We had 197 publications at this step

Methods

S
T
E
P

2

  • We conducted a preliminary analysis by reading the ABSTRACT of all studies found in the previous step.

  • We excluded publications either due to format (e.g. poster) or education level (e.g. graduate education).

  • Each publication was discussed by at least two reviewers whether to be included or excluded.

  • We had 130 publications remaining at the end of this step.

Methods

S
T
E
P

3

  • We conducted an analysis by reading the FULL TEXT of all studies found in the previous step.

  • We continued to exclude publications either due to format (e.g. poster) or education level (e.g. graduate education).

  • Each publication was discussed by at least two reviewers whether to be included or excluded and reviewers tracked certain characteristics of the publications that were included.

  • We had 77 publications remaining at the end of this step and for which we will report the results.

Some Inspirational Examples

Different Topics

Topic Facilitator
Pedagogical Approach Sinem
Education Technology Catalina
Course / Class Activity Example Federica
Program Example Harry
Review of Current State of DSE Mine

In a few minutes we will join breakout rooms.

If possible, please get ready to turn on video.

If possible, please get ready to turn on microphone.

Examples - Pedagogical Approach

Vance, E. A. (2021). Using Team-Based Learning to Teach Data Science. Journal of Statistics and Data Science Education, 29(3), 277–296.

Described the essential elements of Team-Based Learning

Shared the results of a three-year experiment using Team-Based Learning to teach data science

One of their research questions:

  • What are the students’ perceptions of how well they achieved the 31 student learning outcomes?
    • Specifically, how well did students report learning communication and collaboration, relative to other topics?
    • Did students report learning beginning R concepts better than advanced R concepts?

Examples - Pedagogical Approach

Figure depicting the respondents Mean self-evaluation scores (±2 SEs) by semester per overall learning goal category. Communication/Collaboration — consisting of only two Student Learning Outcomes “Collaborating with teammates” and “Communicating findings and recommendations,” was consistently the highest rated learning goal category, with average ratings a little closer to “Very Well (5)” than “Pretty Good (4).” The Beginning R category (consisting of six Student Learning Outcomes from Chapters 1–12 of R4DS) and Workflow — consisting of the Student Learning Outcomes “RStudio,” “RMarkdown,” “OSF (Open Science Framework),” and “GitHub”—were the next highest ranked categories in each semester,with average scores somewhat better than “Pretty Good (4).”The lowest ranked categories were Statistical Thinking and Advanced R (Student Learning Outcomes from Chapters 13–25 in R4DS). The nine Statistical Thinking Student Learning Outcomes had an overall average rating of 3.79, somewhat worse than “Pretty Good (4).” The ten Advanced R Student Learning Outcomes had the lowest rating of 3.66, somewhat closer on averageto “Pretty Good (4)” than “OK (3).

Figure from Vance (2021)

Examples - Pedagogical Approach

Bhavya, B., Xiao, J., & Zhai, C. (2021, June). Scaling Up Data Science Course Projects: A Case Study. In Proceedings of the Eighth ACM Conference on Learning@ Scale (pp. 311-314).

Examined how to address two challenges with assessing data science group projects

Doudesis, D., & Manataki, A. (2022). Data science in undergraduate medicine: Course overview and student perspectives. International Journal of Medical Informatics, 159, 104668.

Explored student perspectives pertaining to data science course using three focus groups and a small experiment

Students participating in the video-based lab were generally more satisfied, considering the lab more beneficial and enjoyable

Examples - Education Technology

Cuadrado-Gallego, J. J., Demchenko, Y., Losada, M. A., & Ormandjieva, O. (2021, April). Classification and analysis of techniques and tools for data visualization teaching. In 2021 IEEE Global Engineering Education Conference (EDUCON) (pp. 1593-1599). IEEE.

Proposed a system for classifying graphical summary tools used for teaching

Sought to improve the teaching of data visualization by helping teaching choose the best tools for their class

Examples - Education Technology

Table from paper with four columns (1) package, (2) broad, (3) depth, (4) slope, and (5) documentation. The packages are separated by open source software versus propietary software and the remaining four columns are constructs defined in the manuscript.

Table from Cuadrado-Gallego et al. (2021)

Examples - Education Technology

Konkol’ová, V., & Paralič, J. (2018, November). Active learning in data science education. In 2018 16th International Conference on Emerging eLearning Technologies and Applications (ICETA) (pp. 285-290). IEEE.

Shared and evaluated Shiny app developed for active learning lectures including data manipulation, model testing, knowledge checks, and class discussions

Shapiro, B. R., Meng, A., O’Donnell, C., Lou, C., Zhao, E., Dankwa, B., & Hostetler, A. (2020, April). Re-Shape: A method to teach data ethics for data science education. In Proceedings of the 2020 CHI conference on human factors in computing systems (pp. 1-13).

Shared and evaluated developed tools and corresponding class activities which leverage personal data to teach about ethics in data science

Examples - Course / Class Activity Example

Allen, G. I. (2021, March). Experiential learning in data science: Developing an interdisciplinary, client-sponsored capstone program. In Proceedings of the 52nd ACM Technical Symposium on Computer Science Education (pp. 516-522).

  • capstone program: structure, management and assessment
  • projects’ evaluations by sponsors
  • students’ evaluations of capstone experience
  • share lessons learned, for others

Examples - Course / Class Activity Example


Table with 6 columns: (1) Sponsor Types (e.g. Company, Energy or Research, Medical); (2) Sponsor Motivations (e.g. Recruiting, Prototyping); (3) Data Science Area (e.g. Natural Language Processing, Data Wrangling); (4) Outcomes from Sponsor's point of view (ranging from Dissatisfied to Very Satisfied); (5) Outcomes from Instructor's point of view (ranging from Poor to Excellent); (6) whether project objectives where met. Highlighted are the different sponsor types, as well as the last three variables.


Table from Allen (2021)

Examples - Course / Class Activity Example

Bhavya, Boughoula, A., Green, A., & Zhai, C. (2020, February). Collective development of large scale data science products via modularized assignments: An experience report. In Proceedings of the 51st ACM Technical Symposium on Computer Science Education (pp. 1200-1206).

  • Collective and iterative development of real-world DS product

  • Verify that goals of assignment were met

Hicks, S. C., & Irizarry, R. A. (2018). A guide to teaching data science. The American Statistician, 72(4), 382-391.

  • Highlight gaps in DS courses

  • Course-building principles and course example, to overcome gaps

Examples - Program Example

Dill-McFarland, K. A., König, S. G., Mazel, F., Oliver, D. C., McEwen, L. M., Hong, K. Y. & Hallam, S. J. (2021), ‘An Integrated, Modular Approach to Data Science Education in Microbiology’, PLOS Computational Biology 17(2), e1008661.

Integrated data science modules into multiple microbiology courses

Evaluated the change in students’ self-reported interest and experience

Examples - Program Example

Dill-McFarland, K. A., König, S. G., Mazel, F., Oliver, D. C., McEwen, L. M., Hong, K. Y. & Hallam, S. J. (2021), ‘An Integrated, Modular Approach to Data Science Education in Microbiology’, PLOS Computational Biology 17(2), e1008661.

Figure depicting the respondents' self-reported interest and experience in 3 areas- Bioinformatics, Computer Science, and Statistics. Data is reported both before and after participating in data science modules. A majority of respondents claimed "medium" interest and "low" or "none" experience in Bioinformatics prior to participation, with a significant portion of those respondents moving to "high" interest or "medium"/"low" experience afterwards. Computer science had a more event split between levels of interest and experience, with only significant movement from "none" experience to "low" experience. Interest and expertise in statistics saw very little change, with most respondents reporting "medium" experience and "medium" interest both before and after participation.

Examples - Program Example

Demchenko, Y., Comminiello, L. & Reali, G. (2019), Designing Customisable Data Science Curriculum Using Ontology for Data Science Competences and Body of Knowledge, in ICBDE ’19, p. 124–128.

Showed how the EDISON Data Science Framework, an ontology of data science competencies, can be used to build a curriculum

Farahi, Arya & Stroud, Jonathan. (2018). The Michigan Data Science Team: A Data Science Education Program with Significant Social Impact. 120-124. 10.1109/DSW.2018.8439915.

Outlined an extracurricular program for increasing data science competency via service learning

Evaluated members’ satisfaction with the program, reason for joining, and willingness to participate in further program elements

Examples - Review of Current State of DSE

Song, I. Y., & Zhu, Y. (2016). Big data and data science: what should we teach?. Expert Systems, 33(4), 364-373.

Examined bachelor’s and master’s degree programs (and the courses) to understand the programs in detail

Examined programs with the name of “data science, data analytics, and analytics”

Examples - Review of Current State of DSE

Table showing counts of bachelor's and master's programmes in the United States as of August 2014. Among Bachelor's degrees 3 are in a university/joint department, 3 in computer science, 2 in data science, and 1 in a business department. Among master's 17 are in a university/joint department, 7 in information science, 3 in computer science, 3 in statistics, 1 in information technology, in operational research and 1 in a professional studies department.

Table from Song & Zhu (2016)

Examples - Review of Current State of DSE

Zakaria, M. S. (2023). Data science education programmes in Middle Eastern institutions: A survey study. IFLA journal, 49(1), 157-179.

The main point of this study is similar to that of Song & Zhu (2016). It might possibly be of interest to those who are not based in the US or interested in comparative education.

Oliver, J. C., & McNeil, T. (2021). Undergraduate data science degrees emphasize computer science and statistics but fall short in ethics training and domain-specific context. PeerJ Computer Science, 7, e441.

Used NASEM framework and GDS framework to evaluate undergraduate data science programs in 25 different universities.

Title is the conclusion of the study.

Results - Content Area

Content Area Total
Education Technology 19
Review Of Current State Of Data Science Education 16
Course Example 14
Pedagogical Approach 13
Program Example 13
Call To Action 6
Class Activity Example 6
Extra Curricular Activity Example 1
Guidelines 1

Results

Data

Qualitative Quantitative Mixed None
8 6 24 39

Research Questions

With RQ Without RQ
39 38

Demo

In this demo, we will show where to find the dataset and look for other examples.

https://github.com/mdogucu/comp-data-sci

Small Group Discussions

Warning

In breakout rooms, we will generate ideas of research questions and methods collectively. If you have any groundbreaking ideas that can fall under intellectual property or copyright, please refrain from sharing those.

Breakout Rooms

Breakout Topic Facilitator Worksheet
Room 1 Pedagogical Approach Sinem Link
Room 2 Education Technology Catalina Link
Room 3 Course / Class Activity Example Federica Link
Room 4 Program Example Harry Link
Room 5 Review of Current State of DSE Mine Link

Main Group Discussion

  • What are the opportunities and challenges associated with the type of research studies you have identified?

  • How can we overcome the challenges?

  • Any other ideas to help statistics educators contribute more to data science education research?

Resources

THANK YOU

We look forward to reading your contributions to our field. Please update us if any of these ideas come to fruition. We would like to celebrate with you.


For after session conversations please use the #ecots-2024 channel in CAUSE Slack space.