Test
CCSpod
what is it about …?
Tune in to our guest talks with Emese & Mario
Exploring the impact of ‘big data’ and algorithms on communication, this podcast by Emese Domahidi and Mario Haim delves into the challenges at the crossroads of society and technology, focusing on wellbeing in the smartphone era and the influence of large platforms on news. It offers a unique perspective, blending social and computer sciences, to guide students and young scholars in Computational Communication Science.
In this episode, Prof. Jamal Abdul Nasir from the University of Galway reveals why pre-processing is the backbone of all text analysis. He breaks down key steps like defining documents, tokenization, removing stop words, unification, and stemming vs. lemmatization. Jamal also explains unigrams vs. bigrams and how modern NLP techniques like byte-pair encoding are changing the game. Plus, he shares practical tips for making your pre-processing transparent and reproducible, helping your research stand strong and scale up.
