Information Extraction from Social Media: A Hands-On Tutorial on Tasks, Data, and Open Source Tools - Citegraph

Paper Info

Title
Information Extraction from Social Media: A Hands-On Tutorial on Tasks, Data, and Open Source Tools

Abstract
Information extraction (IE) is a common sub-area of natural language processing that focuses on identifying structured data from unstructured data. The community of Information Retrieval (IR) relies on accurate and high-performance IE to be able to retrieve high quality results from massive datasets. One example of IE is to identify named entities in a text, e.g., “Barack Obama served as the president of the USA”. Here, Barack Obama and USA are named entities of types of PERSON and LOCATION, respectively. Another example is to identify sentiment expressed in a text, e.g., “This movie was awesome”. Here, the sentiment expressed is positive. Finally, identifying various linguistic aspects of a text, e.g., part of speech tags, noun phrases, dependency parses, etc., which can serve as features for additional IE tasks. This tutorial introduces participants to a) the usage of Python based, open-source tools that support IE from social media data (mainly Twitter), and b) best practices for ensuring the reproducibility of research. Participants will learn and practice various semantic and syntactic IE techniques that are commonly used for analyzing tweets. Additionally, participants will be familiarized with the landscape of publicly available tweet data, and methods for collecting and preparing them for analysis. Finally, participants will be trained to use a suite of open source tools ( SAIL for active learning, TwitterNER for named entity recognition3, and SocialMediaIE for multi task learning), which utilize advanced machine learning techniques (e.g., deep learning, active learning with human-in-the-loop, multi-lingual, and multi-task learning) to perform IE on their own or existing datasets. Participants will also learn how social context can be integrated in Information Extraction systems to make them better. The tools introduced in the tutorial will focus on the three main stages of IE, namely, collection of data (including annotation), data processing and analytics, and visualization of the extracted information. More details can be found at: https://socialmediaie.github.io/tutorials/ .

Year	DOI	Venue
2022	10.1007/978-3-030-99739-7_74	Advances in Information Retrieval
Keywords	DocType	Volume
Information extraction, Multi-task learning, Natural language processing, Social media data, Twitter, Machine learning bias	Conference	13186
ISSN	Citations	PageRank
0302-9743	0	0.34
References	Authors
0	3

Authors (3 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Shubhanshu Mishra	1	0	0.34
Rezvaneh Rezapour	2	0	0.34
Jana Diesner	3	22	5.86

1