GTH-UPM at DETOXIS-IberLEF 2021: a utomatic detection of toxic comments in social networks
- Sadly, the presence of toxic messages on social networks,
whether in the form of stereotypes, sarcasm, mockery, insult, inappropriate language, aggressiveness, intolerance, or typical of hate speech against immigrants and / or women, among others, is relatively frequent. This presence should not be ignored by the scientific community, since it is their responsibility to develop tools and systems that allow their automatic detection and elimination. In this paper, we present an exploratory analysis in which different deep learning (DL) models for the detection of toxic expressions have been evaluated on the DETOXISIberLEF 2021 challenge using the official release of the NewsCom-TOX corpus. Particularly, we compare traditional RNN and state-of-the-art transformer models. Our experiments confirmed that optimum performance can be obtained from transformer models. Specifically, top performance was achieved by fine tuning a BETO model (the pre-trained BERT model for the Spanish languageSadly, the presence of toxic messages on social networks,
whether in the form of stereotypes, sarcasm, mockery, insult, inappropriate language, aggressiveness, intolerance, or typical of hate speech against immigrants and / or women, among others, is relatively frequent. This presence should not be ignored by the scientific community, since it is their responsibility to develop tools and systems that allow their automatic detection and elimination. In this paper, we present an exploratory analysis in which different deep learning (DL) models for the detection of toxic expressions have been evaluated on the DETOXISIberLEF 2021 challenge using the official release of the NewsCom-TOX corpus. Particularly, we compare traditional RNN and state-of-the-art transformer models. Our experiments confirmed that optimum performance can be obtained from transformer models. Specifically, top performance was achieved by fine tuning a BETO model (the pre-trained BERT model for the Spanish language from the Universidad de Chile) for the toxicity detection tasks. Another contribution of this analysis is
the validation of the proposed method for adding task-specific vocabulary (new tokens) that could help to effectively extend the original vocabulary of the pre-trained models.…
Author: | Sergio Esteban Romero, Ricardo Kleinlein, Cristina Luna-JiménezORCiDGND, Juan Manuel Montero, Fernando Fernández-Martínez |
---|---|
URN: | urn:nbn:de:bvb:384-opus4-1226912 |
Frontdoor URL | https://opus.bibliothek.uni-augsburg.de/opus4/122691 |
URL: | https://nbn-resolving.org/urn:nbn:de:0074-2943-3 |
ISSN: | 1613-0073OPAC |
Parent Title (English): | IberLEF 2021 - Iberian Languages Evaluation Forum 2021: proceedings of the Iberian Languages Evaluation Forum (IberLEF 2021) co-located with the Conference of the Spanish Society for Natural Language Processing (SEPLN 2021), XXXVII International Conference of the Spanish Society for Natural Language Processing, Málaga, Spain, September, 2021 |
Publisher: | CEUR-WS |
Place of publication: | Aachen |
Editor: | Manuel Montes, Paolo Rosso, Julio Gonzalo, Mario Ezra Aragón, Rodrigo Agerri, Miguel Ángel Álvarez-Carmona, Elena Álvarez Mellado, Jorge Carrillo-de-Albornoz, Luis Chiruzzo, Larissa Freitas, Helena Gómez Adorno, Yoan Gutiérrez, Salud María Jiménez Zafra, Salvador Lima-López, Flor Miriam Plaza-de-Arco, Mariona Taulé |
Type: | Conference Proceeding |
Language: | English |
Date of Publication (online): | 2025/06/04 |
Year of first Publication: | 2021 |
Publishing Institution: | Universität Augsburg |
Release Date: | 2025/06/05 |
First Page: | 533 |
Last Page: | 546 |
Series: | CEUR Workshop Proceedings ; 2943 |
Institutes: | Fakultät für Angewandte Informatik |
Fakultät für Angewandte Informatik / Institut für Informatik | |
Fakultät für Angewandte Informatik / Institut für Informatik / Lehrstuhl für Menschzentrierte Künstliche Intelligenz | |
Dewey Decimal Classification: | 0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik |
Licence (German): | ![]() |