PLATO
Philosophical Language Analysis for Topic and Opinion Mining
- Research Project by Nate Miller and Trevor Ouma
PLATO is a research project focused on applying Natural Language Processing (NLP) and Machine Learning (ML) techniques to analyze philosophical texts, with a primary focus on Plato’s Republic, as well as works from Aristotle, Kant, and other classical philosophers.
Task 1: Topic Modeling on Plato’s Republic
Apply classical topic modeling techniques and modern transformer-based methods to analyze Plato’s Republic dataset. The objective is to compare the performance of traditional and deep learning-based topic modeling approaches in capturing themes and arguments in philosophical texts.
Methods to Be Used
- Classical Topic Models
- Latent Dirichlet Allocation (LDA) – Probabilistic model that assigns words to topics based on word co-occurrence patterns.
- Latent Semantic Analysis (LSA) – Singular value decomposition (SVD)-based method that captures latent structures in text.
- Non-Negative Matrix Factorization (NMF) – Decomposes word-document matrices into non-negative components to extract topics.
- Transformer-Based Models
- BERTopic – Uses transformer embeddings and clustering to generate coherent topics.
- Top2Vec – Learns topic representations by jointly embedding documents and words in a continuous space.
Goals of Task 1
- Extract dominant themes from Plato’s Republic using different topic modeling methods.
- Compare topic coherence scores across classical and transformer-based models.
- Analyze thematic differences between classical and modern NLP techniques.
- Visualize topic distributions and relationships between themes.
This task will serve as the foundation for subsequent argument mining and complexity modeling in the PLATO project.
References
- BERTopic documentation: https://maartengr.github.io/BERTopic/index.html
- BERTopic GitHub repo: https://github.com/MaartenGr/BERTopic
- Transformers paper: https://arxiv.org/abs/1706.03762
- BERT paper: https://arxiv.org/abs/1810.04805
- Top2Vec GitHub repo: https://github.com/ddangelov/Top2Vec