Abstract (EN):
This dissertation summarizes the author's research work to obtain the title of Master in Data Science and Engineering. The dissertation addresses the theme of predicting the ``Time to Degree for Students with Complex Trajectories at U.Porto'', developed within the scope of the COMPLEX TRAJECTORIES project. Deviations from the ideal pathway of a higher education student characterize complex trajectories. These can be the result of delays in reaching graduation, drop-out, pauses or transfer between programmes. The present work focus on the complex trajectories of students that changed their programme and the time they take to graduate after the transfer.
The thesis has three main objectives: to characterise students who transfer to another programme and compare their time to degree with students who have a non-complex trajectory; to propose a machine-learning model that integrates variables characterising students' previous pathways to infer time to degree after transfer and; to identify the most informative factors to predict the time to degree upon transfer.
The study follows the trajectory of 2743 students enrolled at U.Porto between the academic years 2005/06 and 2015/16 up to the year 2020. Yearly data on each student was provided by U.Porto, covering demographics, prio-academic, institutional and academic data. Twelve datasets were prepared, which cover a combination of three different sets of attributes and four different forms to generate the dataset used to train the models.
Nine regression algorithms were used to predict the time to degree based on the predicting variables. The combination of a stratified random split for the training and testing dataset with the Gradient Boosted Trees algorithm proved to have the highest performance.
As a result of this thesis, a predictive model capable of estimating the time to degree of students with complex trajectories with a mean absolute error of less than 0.6 years was produced. This tool will help programme directors to estimate the semester a student will graduate. The coefficient of determination of the models is higher than 0.7, which is indicative of high predictive power. The thesis demonstrates that the most relevant predictors are the final programme median time to degree, the number of years since the first enrollment and the number of credits completed plus those enrolled at the moment of transfer.
Language:
English
No. of pages:
118
License type: