Meetween Project
Project description
In a world of increasing preoccupation with artefacts, interacting with our fellow human beings remains one of our most enjoyable, but also one of our practically most critical activities. We derive inspiration from each other, solve problems and chart our future together. Yet our interaction with fellow humans is far from seamless or frictionless: despite much greater world-wide reach, we suffer (perhaps more than ever) from isolation, barriers and separation due to language, culture, physical distances, time-zones, scheduling conflicts, and distractions to our attention. With greater freedom, reach and flexibility, our isolation and complexities also appear to increase. In our proposed project “Meetween”, we aim to find solutions to these problems. Rather than artificial intelligence (AI) getting in the way of the human experience, we harness its power to make human-human interaction more seamless and natural, eliminate language barriers, replace the techno-clutter with support. The project aims to 1) build the science-based technology solutions needed to power the next generation of videoconferencing platforms for Europe, to support smooth, engaging, barrier-free collaboration across languages; 2) exploit the all-round, integrated algorithmic capabilities offered by foundation models and self-supervised training on large datasets to nimbly adapt to participant context, cultural and regional specificities, including linguistic ones; 3) foster and facilitate business collaboration throughout the European Union by providing real-time machine-learning-powered speech-to-speech translation, summarization and virtual assistant services for online meetings; 4) defend a European vision for AI with regard to safety, privacy, social and ethical approaches, anchored in our regulations, data standards and shared initiatives and resources.
Project outputs
Publications
| Domain |
Type of output |
Title |
DOI URL
|
| Audio, Speech & NLP |
Conference proceedings |
Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison |
https://doi.org/10.18653/V1/2025.NAACL-LONG.153
|
| Audio, Speech & NLP |
Conference proceedings |
What the Harm? Quantifying the Tangible Impact of Gender Bias in Machine Translation with a Human-centered Study |
https://doi.org/10.18653/V1/2024.EMNLP-MAIN.1002
|
| Audio, Speech & NLP |
Conference proceedings |
Findings of the Quality Estimation Shared Task at WMT 2024: Are LLMs Closing the Gap in QE? |
https://doi.org/10.18653/V1/2024.WMT-1.3
|
| Audio, Speech & NLP |
Conference proceedings |
FINDINGS OF THE IWSLT 2024 EVALUATION CAMPAIGN |
https://doi.org/10.18653/V1/2024.IWSLT-1.1
|
| Audio, Speech & NLP |
Conference proceedings |
NUTSHELL: A Dataset for Abstract Generation from Scientific Talks |
https://doi.org/10.18653/V1/2025.IWSLT-1.2
|
| Audio, Speech & NLP |
Conference proceedings |
StreamAtt: Direct Streaming Speech-to-Text Translation with Attention-based Audio History Selection |
https://doi.org/10.18653/V1/2024.ACL-LONG.202
|
| Audio, Speech & NLP |
Conference proceedings |
Decoupled Vocabulary Learning Enables Zero-Shot Translation from Unseen Languages |
https://doi.org/10.5445/IR/1000174872
|
| Audio, Speech & NLP |
Conference proceedings |
FBK@IWSLT Test Suites Task: Gender Bias evaluation with MuST-SHE |
https://doi.org/10.18653/V1/2024.IWSLT-1.10
|
| Audio, Speech & NLP |
Conference proceedings |
Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing? |
https://doi.org/10.18653/V1/2024.ACL-LONG.789
|
| Audio, Speech & NLP |
Conference proceedings |
SimulSeamless: FBK at IWSLT 2024 Simultaneous Speech Translation |
https://doi.org/10.18653/V1/2024.IWSLT-1.11
|
| Audio, Speech & NLP |
Conference proceedings |
Quality Estimation with $k$-nearest Neighbors and Automatic Evaluation for Model-specific Quality Estimation |
https://doi.org/10.5445/IR/1000174743
|
| Audio, Speech & NLP |
Conference proceedings |
From Speech to Summary: A Comprehensive Survey of Speech Summarization |
https://doi.org/10.5445/IR/1000180972
|
| Audio, Speech & NLP |
Conference proceedings |
Factorized-VITS: Decoupling Prosody and Text in End-to-End Speech Synthesis without External or Secondary Aligner |
https://doi.org/10.1109/ICASSP49660.2025.10890003
|
| Audio, Speech & NLP |
Conference proceedings |
MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages |
https://doi.org/10.18653/V1/2024.EMNLP-MAIN.771
|
| Audio, Speech & NLP |
Conference proceedings |
Optimizing Rare Word Accuracy in Direct Speech Translation with a Retrieval-and-Demonstration Approach |
https://doi.org/10.48550/ARXIV.2409.09009
|
| Audio, Speech & NLP |
Peer reviewed articles |
A decade of gender bias in machine translation |
https://doi.org/10.1016/J.PATTER.2025.101257
|
| Audio, Speech & NLP |
Peer reviewed articles |
How “Real” is Your Real-Time Simultaneous Speech-to-Text Translation System? |
https://doi.org/10.1162/TACL_A_00740
|
| Computer Vision, 3D Modeling & Rendering |
Other |
Facial Attribute Based Text Guided Face Anonymization |
https://doi.org/10.48550/ARXIV.2505.21002
|
Technological assets