• search hit 42 of 860
Back to Result List

GTH-UPM at DETOXIS-IberLEF 2021: a utomatic detection of toxic comments in social networks

  • Sadly, the presence of toxic messages on social networks, whether in the form of stereotypes, sarcasm, mockery, insult, inappropriate language, aggressiveness, intolerance, or typical of hate speech against immigrants and / or women, among others, is relatively frequent. This presence should not be ignored by the scientific community, since it is their responsibility to develop tools and systems that allow their automatic detection and elimination. In this paper, we present an exploratory analysis in which different deep learning (DL) models for the detection of toxic expressions have been evaluated on the DETOXISIberLEF 2021 challenge using the official release of the NewsCom-TOX corpus. Particularly, we compare traditional RNN and state-of-the-art transformer models. Our experiments confirmed that optimum performance can be obtained from transformer models. Specifically, top performance was achieved by fine tuning a BETO model (the pre-trained BERT model for the Spanish languageSadly, the presence of toxic messages on social networks, whether in the form of stereotypes, sarcasm, mockery, insult, inappropriate language, aggressiveness, intolerance, or typical of hate speech against immigrants and / or women, among others, is relatively frequent. This presence should not be ignored by the scientific community, since it is their responsibility to develop tools and systems that allow their automatic detection and elimination. In this paper, we present an exploratory analysis in which different deep learning (DL) models for the detection of toxic expressions have been evaluated on the DETOXISIberLEF 2021 challenge using the official release of the NewsCom-TOX corpus. Particularly, we compare traditional RNN and state-of-the-art transformer models. Our experiments confirmed that optimum performance can be obtained from transformer models. Specifically, top performance was achieved by fine tuning a BETO model (the pre-trained BERT model for the Spanish language from the Universidad de Chile) for the toxicity detection tasks. Another contribution of this analysis is the validation of the proposed method for adding task-specific vocabulary (new tokens) that could help to effectively extend the original vocabulary of the pre-trained models.show moreshow less

Download full text files

Export metadata

Statistics

Number of document requests

Additional Services

Share in Twitter Search Google Scholar
Metadaten
Author:Sergio Esteban Romero, Ricardo Kleinlein, Cristina Luna-JiménezORCiDGND, Juan Manuel Montero, Fernando Fernández-Martínez
URN:urn:nbn:de:bvb:384-opus4-1226912
Frontdoor URLhttps://opus.bibliothek.uni-augsburg.de/opus4/122691
URL:https://nbn-resolving.org/urn:nbn:de:0074-2943-3
ISSN:1613-0073OPAC
Parent Title (English):IberLEF 2021 - Iberian Languages Evaluation Forum 2021: proceedings of the Iberian Languages Evaluation Forum (IberLEF 2021) co-located with the Conference of the Spanish Society for Natural Language Processing (SEPLN 2021), XXXVII International Conference of the Spanish Society for Natural Language Processing, Málaga, Spain, September, 2021
Publisher:CEUR-WS
Place of publication:Aachen
Editor:Manuel Montes, Paolo Rosso, Julio Gonzalo, Mario Ezra Aragón, Rodrigo Agerri, Miguel Ángel Álvarez-Carmona, Elena Álvarez Mellado, Jorge Carrillo-de-Albornoz, Luis Chiruzzo, Larissa Freitas, Helena Gómez Adorno, Yoan Gutiérrez, Salud María Jiménez Zafra, Salvador Lima-López, Flor Miriam Plaza-de-Arco, Mariona Taulé
Type:Conference Proceeding
Language:English
Date of Publication (online):2025/06/04
Year of first Publication:2021
Publishing Institution:Universität Augsburg
Release Date:2025/06/05
First Page:533
Last Page:546
Series:CEUR Workshop Proceedings ; 2943
Institutes:Fakultät für Angewandte Informatik
Fakultät für Angewandte Informatik / Institut für Informatik
Fakultät für Angewandte Informatik / Institut für Informatik / Lehrstuhl für Menschzentrierte Künstliche Intelligenz
Dewey Decimal Classification:0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Licence (German):CC-BY 4.0: Creative Commons: Namensnennung (mit Print on Demand)