
Learning Analytics and AI
Discourse Analytics
Date
Team
2024
Individual work
My Role
Learning analyst
Target Users / Audience
Educators
Tools Used
Exploratory
Overview
This project analyzes comments from Perusall readings using Natural Language Processing (NLP) techniques. This involves loading the data, tokenizing text, cleaning the data, creating n-grams, and performing TF-IDF analysis. The aim is to explore word usage, identify key terms, and conduct sentiment analysis.
Loading and Understanding the Data:
Imported PerusallMessages.xlsx and examined comment data structure.
Data Distribution Review:
Analyzed comment distribution by papers, pages, and authors.
Identified most prolific commentators and most commented papers.
Tokenizing the Text:
Tokenized text from the comments, creating columns for individual words.
Removed stopwords, non-alphabetical text, and applied stemming to unify similar word forms.
Creating Bi-grams and Tri-grams:
Generated bi-grams and tri-grams to identify common word pairs and groups in comments.
Visualized common combinations of words, focusing on key terms like "student."
TF-IDF Analysis:
Calculated TF-IDF for both papers and authors, identifying unique, significant terms for each.
Filtered out highly frequent but uninformative words to focus on distinctive language patterns.
Clustering Authors:
Reduced dimensionality of word data and clustered authors using K-Means clustering.
Visualized relationships between authors based on their word usage.
Sentiment Analysis:
Applied sentiment analysis to determine the emotional tone of comments.
Compared sentiment scores across papers and students, identifying trends in positivity and negativity.
Process
Final Deliverable(s)
All visualizations and analyses are compiled into an Exploratory report, which is published online here:
https://exploratory.io/note/nwb6AhO8eK/Assignment-4-Report-ObY6dNy3ZZ