Learning Analytics and AI

Predictive Analytics of Student Performance

Date

Team

2024

Individual work

My Role

Learning analyst

Target Users / Audience

Educators

Tools Used

Exploratory

< Back

Overview

This project focuses on predicting student performance using data from two schools related to Math and Portuguese language subjects. The data includes various predictors such as grades, demographic information, and school-related attributes. The goal is to develop and evaluate different predictive models (linear regression, logistic regression, decision tree, and random forest) to forecast final grades and identify students at risk of failing.

Data Loading and Exploration

Import CSV datasets for Math and Portuguese performance, ensuring the correct delimiter settings.
Examine the summary of both datasets and contrast them with the metadata. Identify at least two notable distributions across columns.

Predicting Final Grade (Linear Regression)

Build linear regression models to predict the final grade (G3) based on one or more previous grades (G1, G2).
Split data into 70% training and 30% testing.
Evaluate the model’s goodness-of-fit (R²) and visualize the relationship between actual and predicted values using scatter plots.
Compare models with different predictor variables (G1, G2, G1+G2, all variables) to assess which model is most accurate and useful.

Predicting Risk of Failing (Logistic Regression)

Create a new column to classify students as passing or failing based on their average grades.
Build a logistic regression model to predict “PassFail” using all other variables.
Split data into training and testing sets, and visualize the model’s performance with a confusion matrix.
Identify any issues with the model and propose improvements based on the confusion matrix.

Decision Tree for Risk Prediction

Build a decision tree model using the “PassFail” column and excluding grade variables (G1, G2, G3).
Analyze the decision tree to identify important variables and patterns or rules that predict student outcomes.
Evaluate the model’s utility in identifying students at risk and reflect on whether it should be used for interventions.

Random Forest for Risk Prediction

Apply a random forest model using the same dataset and “PassFail” column.
Adjust threshold values and assess the model’s performance through confusion matrices.
Compare the random forest’s effectiveness with other models and determine which threshold is optimal for identifying students at risk.

Process

Final Deliverable(s)

All visualizations and analyses are compiled into an Exploratory report, which is published online here:

https://exploratory.io/note/nwb6AhO8eK/Assignment-II-Report-xSM4HNA7BS

View All Projects