Legalis
- uni project around court case prediction
- using heavily processed German court case data
- prediction with Random Forest and BERT
About the Project
Legalis is a University project for a machine learning and data science course. I wrote a paper at the University of Oslo about court case outcome prediction and this in the continuation of the project.
I'm using bulk data from openlegaldata, which included about 250.000 cases, out of which ca. 38k are usable for me. Based on this I trained a random forest classifier to predict the outcome of court cases.
In the end I reached about a 60% accuracy, which is not great, but also not bad.
Technology & Tools
I very much enjoy all the features 🤗 huggingface provides and heavily rely on it in my machine learning projects. Especially the hosting of Dataset, Model and Apps/Spaces.
For prediction, I have trained and optimized a random forest classifier aswell as a naive bayes and a BERT model for text classificaiton.
I used ChatGPT to extract the outcome as a binary label for 2800 cases and trained the models on that. It works great for the extraction of certain information from longer text (if you're willing to pay or have short texts).
Future
I'm planning to update this to use a llama2 or mistral based german language model for classification, as the BERT performance was already pretty promising.