Meetween

Meetween Project

CORDIS Reference	Start date	End date	Coordinator
https://cordis.europa.eu/project/id/101135798	01/01/2024	31/12/2027	TRANSLATED SRL / Rome, Italy

Project description

In a world of increasing preoccupation with artefacts, interacting with our fellow human beings remains one of our most enjoyable, but also one of our practically most critical activities. We derive inspiration from each other, solve problems and chart our future together. Yet our interaction with fellow humans is far from seamless or frictionless: despite much greater world-wide reach, we suffer (perhaps more than ever) from isolation, barriers and separation due to language, culture, physical distances, time-zones, scheduling conflicts, and distractions to our attention. With greater freedom, reach and flexibility, our isolation and complexities also appear to increase. In our proposed project “Meetween”, we aim to find solutions to these problems. Rather than artificial intelligence (AI) getting in the way of the human experience, we harness its power to make human-human interaction more seamless and natural, eliminate language barriers, replace the techno-clutter with support. The project aims to 1) build the science-based technology solutions needed to power the next generation of videoconferencing platforms for Europe, to support smooth, engaging, barrier-free collaboration across languages; 2) exploit the all-round, integrated algorithmic capabilities offered by foundation models and self-supervised training on large datasets to nimbly adapt to participant context, cultural and regional specificities, including linguistic ones; 3) foster and facilitate business collaboration throughout the European Union by providing real-time machine-learning-powered speech-to-speech translation, summarization and virtual assistant services for online meetings; 4) defend a European vision for AI with regard to safety, privacy, social and ethical approaches, anchored in our regulations, data standards and shared initiatives and resources.

Project outputs

Publications

Domain	Type of output	Title	DOI URL
Audio, Speech & NLP	Conference proceedings	Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison	https://doi.org/10.18653/V1/2025.NAACL-LONG.153
Audio, Speech & NLP	Conference proceedings	What the Harm? Quantifying the Tangible Impact of Gender Bias in Machine Translation with a Human-centered Study	https://doi.org/10.18653/V1/2024.EMNLP-MAIN.1002
Audio, Speech & NLP	Conference proceedings	Findings of the Quality Estimation Shared Task at WMT 2024: Are LLMs Closing the Gap in QE?	https://doi.org/10.18653/V1/2024.WMT-1.3
Audio, Speech & NLP	Conference proceedings	FINDINGS OF THE IWSLT 2024 EVALUATION CAMPAIGN	https://doi.org/10.18653/V1/2024.IWSLT-1.1
Audio, Speech & NLP	Conference proceedings	NUTSHELL: A Dataset for Abstract Generation from Scientific Talks	https://doi.org/10.18653/V1/2025.IWSLT-1.2
Audio, Speech & NLP	Conference proceedings	StreamAtt: Direct Streaming Speech-to-Text Translation with Attention-based Audio History Selection	https://doi.org/10.18653/V1/2024.ACL-LONG.202
Audio, Speech & NLP	Conference proceedings	Decoupled Vocabulary Learning Enables Zero-Shot Translation from Unseen Languages	https://doi.org/10.5445/IR/1000174872
Audio, Speech & NLP	Conference proceedings	FBK@IWSLT Test Suites Task: Gender Bias evaluation with MuST-SHE	https://doi.org/10.18653/V1/2024.IWSLT-1.10
Audio, Speech & NLP	Conference proceedings	Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing?	https://doi.org/10.18653/V1/2024.ACL-LONG.789
Audio, Speech & NLP	Conference proceedings	SimulSeamless: FBK at IWSLT 2024 Simultaneous Speech Translation	https://doi.org/10.18653/V1/2024.IWSLT-1.11
Audio, Speech & NLP	Conference proceedings	Quality Estimation with $k$-nearest Neighbors and Automatic Evaluation for Model-specific Quality Estimation	https://doi.org/10.5445/IR/1000174743
Audio, Speech & NLP	Conference proceedings	From Speech to Summary: A Comprehensive Survey of Speech Summarization	https://doi.org/10.5445/IR/1000180972
Audio, Speech & NLP	Conference proceedings	Factorized-VITS: Decoupling Prosody and Text in End-to-End Speech Synthesis without External or Secondary Aligner	https://doi.org/10.1109/ICASSP49660.2025.10890003
Audio, Speech & NLP	Conference proceedings	MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages	https://doi.org/10.18653/V1/2024.EMNLP-MAIN.771
Audio, Speech & NLP	Conference proceedings	Optimizing Rare Word Accuracy in Direct Speech Translation with a Retrieval-and-Demonstration Approach	https://doi.org/10.48550/ARXIV.2409.09009
Audio, Speech & NLP	Peer reviewed articles	A decade of gender bias in machine translation	https://doi.org/10.1016/J.PATTER.2025.101257
Audio, Speech & NLP	Peer reviewed articles	How “Real” is Your Real-Time Simultaneous Speech-to-Text Translation System?	https://doi.org/10.1162/TACL_A_00740
Computer Vision, 3D Modeling & Rendering	Other	Facial Attribute Based Text Guided Face Anonymization	https://doi.org/10.48550/ARXIV.2505.21002

Technological assets

Title	Type of Asset	Link / DOI	Description
Speech LMM open release - V1	AI Model	https://huggingface.co/meetween/Llama-speechlmm-1.0-l	Open release of the Speech Large Multimodal Model (SpeechLMM) created by the project.
Mumospee open release - V1	Dataset	https://huggingface.co/datasets/meetween/mumospee	One of the largest open multimodal datasets created to train the SpeechLMM.
MOSEL	Dataset	https://doi.org/10.18653/V1/2024.EMNLP-MAIN.771	950,000 hours of speech data utilized for open-source speech foundation model training on EU languages.
NUTSHELL	Dataset	https://doi.org/10.18653/V1/2025.IWSLT-1.2	A dataset built specifically for abstract generation from scientific talks.
FAMA	Foundation Model	https://doi.org/10.48550/ARXIV.2505.22759	The first large-scale open-science speech foundation model for Italian and English.