Innovation through
open collaboration

Open-source research is part of our DNA. Here we share what we're building across language models, evaluation, and AI agents.

Recent outputs

Latest research artifacts

Model | Dataset | Paper arXiv:2602.12414

propella-1

A family of small multilingual LLMs for annotating text documents across six categories: core content, classification, quality and value, audience and purpose, safety and compliance, and geographic relevance. These annotations help filter, select, and curate LLM training data at scale. The models outperform much larger general-purpose baselines.

DATA-FM @ ICLR 2026 (Spotlight) 57 languages Any text, any format 10B+ documents annotated fp8 Apache-2.0 / CC-BY 4.0

Models on Hugging Face Dataset on Hugging Face Paper on arXiv

propella-1: propel your data curation to the next level

Model | Paper arXiv:2601.08472

sui-1: Summarization with Unique Identifiers

A 24B-parameter LLM for abstractive summarization with inline citations. Every claim is traceable to its source sentence. It supports documents with more than 2M tokens and outperforms all tested open-weight baselines, including models with 3x more parameters.

24B params 5 languages inline citations very long documents fp8 Apache-2.0

View on Hugging Face Paper on arXiv

Benchmark Apache-2.0

base-eval

Curated lm-evaluation-harness task configurations for evaluating English and German base models. Every task is validated against reference models, enabling benchmark suites for early-stage pretraining and in-loop evaluation.

English German lm-eval 47 benchmarks 730+ task configs

View on GitHub

Tool Apache-2.0

inference-hive

Distributed LLM inference at scale for SLURM clusters. Configure cluster, server, and data settings, then scale across thousands of GPUs with near-linear throughput.

SLURM native OpenAI API vLLM / SGLang

View on GitHub

Publicly funded research projects

OpenEuroLLM

Transparent AI for Europe

ellamind is proud to be part of a consortium of 20 leading European research institutions, companies, and EuroHPC centres building a family of high-performing multilingual foundation models for commercial, industrial, and public-sector use. These transparent, compliant open-source models aim to democratize access to high-quality AI and strengthen Europe's competitiveness.

Learn more OpenEuroLLM funding acknowledgement

LLMs4EU

Building Europe's AI future

As a partner in LLMs4EU, ellamind helps develop cutting-edge language models that put European languages, values, and innovation at the center. This EU-funded consortium combines expertise from across the continent to build AI technologies that genuinely serve European needs. Our open-source approach helps organizations of all sizes access and benefit from advanced AI.

Learn more LLMs4EU funding acknowledgement

LLM4KMU

Optimized Use of Open-Source LLMs in SMEs

LLM4KMU brings together leading research institutions, companies, and innovation partners in North Rhine-Westphalia to broaden small and medium-sized enterprises' access to large language models. Through a collaborative experimentation platform, shared know-how, and prototypical use cases, the project helps SMEs apply open-source AI to real products and services.

Learn more LLM4KMU funding acknowledgement

SOOFI

Sovereign & Open Models for Europe

ellamind is part of SOOFI (Sovereign Open Source Foundation Models), a German consortium of research institutions and start-ups developing open, sovereign AI language models as a European alternative to existing systems. SOOFI aims to build a powerful open-source foundation model aligned with European values and regulatory requirements from day one.

We actively collaborate with research communities and partners, such as LAION, Open-Sci, ontocord.ai, EuroLLM, Hessian.AI, AlignmentLab AI, DFKI, and others to pool resources, share insights & knowledge, and advance the collective understanding of LLMs.

Where we come from

Built on a foundation of
open-source AI research

ellamind grew out of the open-source AI community. Our team trained and released some of the earliest and most widely used open German large language models, downloaded more than 1,000,000 times on Hugging Face. That hands-on experience in training, evaluating, and applying LLMs across languages, domains, and use cases is the foundation everything at ellamind is built on.

Open German LLMs

We pioneered open-weight German language models at a time when high-quality non-English LLMs were scarce, helping establish a vibrant ecosystem for German AI.

From research to product

Our deep expertise in model training and evaluation directly informs how we build our products. We understand LLMs & agents from the inside out, not just as API consumers.

Pre-training & fine-tuning expertise

We have deep, hands-on experience in continuous pre-training and fine-tuning of language models, allowing us to optimize and adapt models for specific tasks and languages.

Evaluation & open datasets

We develop advanced evaluation techniques and publish open-source datasets and benchmarks that the community uses to improve existing and new models.

Community-driven

Through DiscoResearch and collaborations with Hessian.AI, LAION, and DFKI, we have always believed that openness and collaboration drive the best AI outcomes.

FAQs

Frequently asked questions

Find answers to frequently asked questions about our research. If you still can't find your question here, feel free to contact us.

How can I collaborate with ellamind on research? +

We're always open to new collaborations. Reach out through our contact form or get in touch with our research team directly.

Are ellamind's research outputs publicly available? +

Yes. We publish models, datasets, benchmarks, and papers as open resources. You can find most of them on Hugging Face, GitHub, and arXiv.

What role does ellamind play in the EU-funded consortia? +

We bring hands-on expertise across the full LLM lifecycle: data curation, model training, evaluation, and agentic applications. Across projects like OpenEuroLLM, LLMs4EU, SOOFI, and LLM4KMU, our work helps ensure that Europe's next generation of foundation models are open, high-quality, and built to meet European regulatory standards.

Can I use ellamind's open-source models in my own projects? +

Yes. Many of our research outputs are released under permissive open-source licenses and can be used in both commercial and non-commercial projects. Please check the license for each model, dataset, or repository for the exact terms.

Innovation through open collaboration