Learning Analytics and AI

Discourse Analytics

Date

Team

2024

Individual work

My Role

Learning analyst

Target Users / Audience

Educators

Tools Used

Exploratory

< Back

Overview

This project analyzes comments from Perusall readings using Natural Language Processing (NLP) techniques. This involves loading the data, tokenizing text, cleaning the data, creating n-grams, and performing TF-IDF analysis. The aim is to explore word usage, identify key terms, and conduct sentiment analysis.

Loading and Understanding the Data:

Imported PerusallMessages.xlsx and examined comment data structure.

Data Distribution Review:

Analyzed comment distribution by papers, pages, and authors.
Identified most prolific commentators and most commented papers.

Tokenizing the Text:

Tokenized text from the comments, creating columns for individual words.
Removed stopwords, non-alphabetical text, and applied stemming to unify similar word forms.

Creating Bi-grams and Tri-grams:

Generated bi-grams and tri-grams to identify common word pairs and groups in comments.
Visualized common combinations of words, focusing on key terms like "student."

TF-IDF Analysis:

Calculated TF-IDF for both papers and authors, identifying unique, significant terms for each.
Filtered out highly frequent but uninformative words to focus on distinctive language patterns.

Clustering Authors:

Reduced dimensionality of word data and clustered authors using K-Means clustering.
Visualized relationships between authors based on their word usage.

Sentiment Analysis:

Applied sentiment analysis to determine the emotional tone of comments.
Compared sentiment scores across papers and students, identifying trends in positivity and negativity.

Process

Final Deliverable(s)

All visualizations and analyses are compiled into an Exploratory report, which is published online here:

https://exploratory.io/note/nwb6AhO8eK/Assignment-4-Report-ObY6dNy3ZZ

View All Projects