High-quality corpora for under-resourced languages

What is MaCoCu?

International sources

We collect data across the internet in order to retrieve sentences in any of the under-resourced languages of the European Union

Top-tier corpora

We use advanced filtering techniques to obtain the finest results among all the crawled data

Extra knowledge

Our corpora is shipped with additional information for each sentence, like language variants or domain identification

Feel free to explore, review and contribute to our code

