Jump to content

Meetween

From OpenVerse Wiki
Revision as of 13:23, 22 April 2026 by Admin (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Meetween Project

CORDIS Reference Start date End date Coordinator
https://cordis.europa.eu/project/id/101135798 01/01/2024 31/12/2027 TRANSLATED SRL / Rome, Italy

Project description

In a world of increasing preoccupation with artefacts, interacting with our fellow human beings remains one of our most enjoyable, but also one of our practically most critical activities. We derive inspiration from each other, solve problems and chart our future together. Yet our interaction with fellow humans is far from seamless or frictionless: despite much greater world-wide reach, we suffer (perhaps more than ever) from isolation, barriers and separation due to language, culture, physical distances, time-zones, scheduling conflicts, and distractions to our attention. With greater freedom, reach and flexibility, our isolation and complexities also appear to increase. In our proposed project “Meetween”, we aim to find solutions to these problems. Rather than artificial intelligence (AI) getting in the way of the human experience, we harness its power to make human-human interaction more seamless and natural, eliminate language barriers, replace the techno-clutter with support. The project aims to 1) build the science-based technology solutions needed to power the next generation of videoconferencing platforms for Europe, to support smooth, engaging, barrier-free collaboration across languages; 2) exploit the all-round, integrated algorithmic capabilities offered by foundation models and self-supervised training on large datasets to nimbly adapt to participant context, cultural and regional specificities, including linguistic ones; 3) foster and facilitate business collaboration throughout the European Union by providing real-time machine-learning-powered speech-to-speech translation, summarization and virtual assistant services for online meetings; 4) defend a European vision for AI with regard to safety, privacy, social and ethical approaches, anchored in our regulations, data standards and shared initiatives and resources.

Project outputs

Publications

Domain Type of output Title DOI URL
Audio, Speech & NLP Conference proceedings Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison https://doi.org/10.18653/V1/2025.NAACL-LONG.153
Audio, Speech & NLP Conference proceedings What the Harm? Quantifying the Tangible Impact of Gender Bias in Machine Translation with a Human-centered Study https://doi.org/10.18653/V1/2024.EMNLP-MAIN.1002
Audio, Speech & NLP Conference proceedings Findings of the Quality Estimation Shared Task at WMT 2024: Are LLMs Closing the Gap in QE? https://doi.org/10.18653/V1/2024.WMT-1.3
Audio, Speech & NLP Conference proceedings FINDINGS OF THE IWSLT 2024 EVALUATION CAMPAIGN https://doi.org/10.18653/V1/2024.IWSLT-1.1
Audio, Speech & NLP Conference proceedings NUTSHELL: A Dataset for Abstract Generation from Scientific Talks https://doi.org/10.18653/V1/2025.IWSLT-1.2
Audio, Speech & NLP Conference proceedings StreamAtt: Direct Streaming Speech-to-Text Translation with Attention-based Audio History Selection https://doi.org/10.18653/V1/2024.ACL-LONG.202
Audio, Speech & NLP Conference proceedings Decoupled Vocabulary Learning Enables Zero-Shot Translation from Unseen Languages https://doi.org/10.5445/IR/1000174872
Audio, Speech & NLP Conference proceedings FBK@IWSLT Test Suites Task: Gender Bias evaluation with MuST-SHE https://doi.org/10.18653/V1/2024.IWSLT-1.10
Audio, Speech & NLP Conference proceedings Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing? https://doi.org/10.18653/V1/2024.ACL-LONG.789
Audio, Speech & NLP Conference proceedings SimulSeamless: FBK at IWSLT 2024 Simultaneous Speech Translation https://doi.org/10.18653/V1/2024.IWSLT-1.11
Audio, Speech & NLP Conference proceedings Quality Estimation with $k$-nearest Neighbors and Automatic Evaluation for Model-specific Quality Estimation https://doi.org/10.5445/IR/1000174743
Audio, Speech & NLP Conference proceedings From Speech to Summary: A Comprehensive Survey of Speech Summarization https://doi.org/10.5445/IR/1000180972
Audio, Speech & NLP Conference proceedings Factorized-VITS: Decoupling Prosody and Text in End-to-End Speech Synthesis without External or Secondary Aligner https://doi.org/10.1109/ICASSP49660.2025.10890003
Audio, Speech & NLP Conference proceedings MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages https://doi.org/10.18653/V1/2024.EMNLP-MAIN.771
Audio, Speech & NLP Conference proceedings Optimizing Rare Word Accuracy in Direct Speech Translation with a Retrieval-and-Demonstration Approach https://doi.org/10.48550/ARXIV.2409.09009
Audio, Speech & NLP Peer reviewed articles A decade of gender bias in machine translation https://doi.org/10.1016/J.PATTER.2025.101257
Audio, Speech & NLP Peer reviewed articles How “Real” is Your Real-Time Simultaneous Speech-to-Text Translation System? https://doi.org/10.1162/TACL_A_00740
Computer Vision, 3D Modeling & Rendering Other Facial Attribute Based Text Guided Face Anonymization https://doi.org/10.48550/ARXIV.2505.21002

Technological assets

Title Type of Asset Link / DOI Description
Speech LMM open release - V1 AI Model https://huggingface.co/meetween/Llama-speechlmm-1.0-l Open release of the Speech Large Multimodal Model (SpeechLMM) created by the project.
Mumospee open release - V1 Dataset https://huggingface.co/datasets/meetween/mumospee One of the largest open multimodal datasets created to train the SpeechLMM.
MOSEL Dataset https://doi.org/10.18653/V1/2024.EMNLP-MAIN.771 950,000 hours of speech data utilized for open-source speech foundation model training on EU languages.
NUTSHELL Dataset https://doi.org/10.18653/V1/2025.IWSLT-1.2 A dataset built specifically for abstract generation from scientific talks.
FAMA Foundation Model https://doi.org/10.48550/ARXIV.2505.22759 The first large-scale open-science speech foundation model for Italian and English.