PLATO | Students' Projects

Research Project by Nate Miller and Trevor Ouma

PLATO is a research project focused on applying Natural Language Processing (NLP) and Machine Learning (ML) techniques to analyze philosophical texts, with a primary focus on Plato’s Republic, as well as works from Aristotle, Kant, and other classical philosophers.

Task 1: Topic Modeling on Plato’s Republic

Apply classical topic modeling techniques and modern transformer-based methods to analyze Plato’s Republic dataset. The objective is to compare the performance of traditional and deep learning-based topic modeling approaches in capturing themes and arguments in philosophical texts.

Methods to Be Used

Classical Topic Models
- Latent Dirichlet Allocation (LDA) – Probabilistic model that assigns words to topics based on word co-occurrence patterns.
- Latent Semantic Analysis (LSA) – Singular value decomposition (SVD)-based method that captures latent structures in text.
- Non-Negative Matrix Factorization (NMF) – Decomposes word-document matrices into non-negative components to extract topics.
Transformer-Based Models
- BERTopic – Uses transformer embeddings and clustering to generate coherent topics.
- Top2Vec – Learns topic representations by jointly embedding documents and words in a continuous space.

Goals of Task 1

Extract dominant themes from Plato’s Republic using different topic modeling methods.
Compare topic coherence scores across classical and transformer-based models.
Analyze thematic differences between classical and modern NLP techniques.
Visualize topic distributions and relationships between themes.

This task will serve as the foundation for subsequent argument mining and complexity modeling in the PLATO project.

References

BERTopic documentation: https://maartengr.github.io/BERTopic/index.html
BERTopic GitHub repo: https://github.com/MaartenGr/BERTopic
Transformers paper: https://arxiv.org/abs/1706.03762
BERT paper: https://arxiv.org/abs/1810.04805
Top2Vec GitHub repo: https://github.com/ddangelov/Top2Vec