Accepted Tutorials

TITLE

ORGANIZERS

DURATION

Big Data Analytics for Semantic Data (BigSem)

Charalampos Chelmis and Bedirhan Gergin

Half Day

Recent advances in Graph Data Management (GraphDat)

Domagoj Vrgoc

Half Day

Wikidata Wizardry 101: From Query Spells to Data Charms (WiWi 101)

Nicolas Ferranti, Daniil Dobriy and Axel Polleres

Half Day

Semantic Table Interpretation: from Heuristic to LLM-based approaches (TUTSTI)

Marco Cremaschi, Fabio D’Adda, Matteo Palmonari and Ernesto Jimenez-Ruiz

Half Day

Same Data; Different Models (SDDM)

Cogan Shimizu, Eva Blomqvist and Andrea Giovanni Nuzzolese

Half Day

Shaping Knowledge Graphs (ShapingKGs)

José Emilio Labra Gayo

Half Day

Streaming Linked Data Tutorial (SLDT)

Pieter Bonte and Riccardo Tommasini

Half Day

Neurosymbolic Customized and Compact CoPilotsn #1

Kaushik Roy, Megha Chakraborty, Yuxin Zi, Manas Gaur and Amit Sheth

Half Day

Ontology Engineering for Industry Adoption (OEIA)

Elisa Kendall and Pawel Garbacz

Half Day

Knowledge-Enhanced Retrieval Augmented Generation for Large Language Models (KELLM)

Stefan Decker, Jens Lehmann, Maria-Esther Vidal, Sahar Vahdati, Diego Collarana

Half Day

 

Big Data Analytics for Semantic Data


Website: https://www.cs.albany.edu/~cchelmis/tutorials/iswc/2024/

Organizers: Charalampos Chelmis and Bedirhan Gergin

Duration: Half Day

Researchers, scientists and companies alike increasingly leverage semantically enriched, linked datasets to train machine learning models for tasks ranging from discovering new vaccines and materials, to recommending products and services, to building virtual personal assistants. At the same time, big-data analytics engines are increasingly adopted to store and process the ever increasing volumes of data efficiently at scale. Until recently however, the Semantic Web, big data analytics and machine learning communities were separated, since big-data analytics engines could not process Knowledge Graphs (KGs). This tutorial aims to raise awareness of the gap between the big data analytics and machine learning communities and the Semantic Web community. By providing an overview of the state of the art in scalable analytics for semantic data, its goal is to help promote the synergy between these communities and encourage the discussion and exchange of ideas about this timely topic. Hands-on activities covering statistical analytics and inferencing over KGs, using simple use-cases will be provided.

 

Recent advances in Graph Data Management


Website: https://domagojvrgoc.github.io/ISWC2024-GraphDB/

Organizer: Domagoj Vrgoc

Duration: Half Day

Graph databases have received a lot of attention in recent years, mostly due to their role as the underlying storage and query mechanism for knowledge graphs. This led to different graph solutions being developed throughout the years, with the RDF data format together with the SPARQL query standard being widely used in large open knowledge graphs such as Wikidata or DBpedia, while commercial systems generally deploy the property graph data model, with the recent GQL ISO standard formalizing query languages in this setting. In this tutorial, we give a detailed overview of graph data models and query standards, followed by a deep dive into recent techniques for graph query processing, most notably the use of worst-case optimal algorithms and automata-guided path retrieval.

 

Wikidata Wizardry 101: From Query Spells to Data Charms


Website: https://ww101.ai.wu.ac.at/

Organizers: Nicolas Ferranti, Daniil Dobriy and Axel Polleres

Duration: Half Day

Wikidata Wizardry 101 provides a comprehensive introduction to the Wikidata knowledge graph for enthusiasts and professionals. This tutorial explores essential concepts and practical applications of Wikidata from a Semantic Web practitioner’s point of view. Participants will gain foundational skills in querying and managing Wikidata, enabling them to effectively utilize this resource in their projects.

 

Semantic Table Interpretation: from Heuristic to LLM-based approaches


Website: https://unimib-datai.github.io/sti-website/tutorial/

Organizers: Marco Cremaschi, Fabio D’Adda, Matteo Palmonari and Ernesto Jimenez-Ruiz

Duration: Half Day

The Sci-K workshop provides a forum for researchers to collaborate on improving scientific knowledge representation, discovery, and assessment. A key focus is developing Scientific Knowledge Graphs (SKGs) that provide flexible yet structured ways to represent scholarly information. This involves creating ontologies that effectively capture the complexities of knowledge across different SKGs. Making information within SKGs easily discoverable is crucial, necessitating solutions for extracting key elements, integrating diverse sources, pinpointing connections, and resolving inconsistencies. Finally, due to the ever-growing volume of research, comprehensive systems are needed to measure the impact and value of diverse scientific outputs and researchers. The tutorial introduces the topic of Semantic Table Interpretation (STI), covering theoretical and practical considerations. In particular, the tutorial will provide a comprehensive analysis of how the approaches to STI have evolved from heuristic-based to ML-based, to the most recent LLM-based approaches. The analysis will consider the specific characteristics of these different classes, providing insights into their respective advantages and limitations to identify the contexts of use. The final part will describe a case study to demonstrate the application of two state-of-the-art approaches.

 

Same Data; Different Models


Website: https://the-praxis-initiative.github.io/comparative-ontology-modeling/

Organizers: Cogan Shimizu, Eva Blomqvist and Andrea Giovanni Nuzzolese

Duration: Half Day

As knowledge graph development (and in particular, the development of their schemas) grows commensurately with the importance of knowledge graphs in industry and academia, it follows that choosing a development methodology to fit the application scenario and domain is correspondingly important. We have thus organized a 3-hour, hands-on tutorial for the purposes of comparing and contrasting three distinct ontology modeling methodologies: Graphical Modular Ontology Modeling (GraphMOMo), Extreme Design for Ontology Engineering (xD), and LLM-assisted Knowledge Engineering (copilot). Attendees will have the opportunity to execute each methodology. The tutorial will culminate in a retrospective of the different sub-tutorials.

 

Shaping Knowledge Graphs


Website: https://www.validatingrdf.com/tutorial/iswc2024/

Organizer: José Emilio Labra Gayo

Duration: Half Day

Knowledge Graphs are increasingly being employed to improve data interoperability, search, and recommendation, alongside fostering the adoption of semantic web technologies. The quality of data within these graphs is pivotal, often validated against expected data models or shapes to enhance accuracy. Various technologies implement knowledge graphs; RDF-based triplestores are canonical in the Semantic Web, while in the graph databases context, Property Graphs are also considered for Knowledge Graphs. Wikidata, a popular Knowledge Graph, offers RDF through its SPARQL query service, but its data model aligns closely with Property Graphs using qualifiers and references, and the recent proposal of RDF-Star can bridge the gap between RDF and Property Graphs.

Shape Expressions (ShEx) and Shapes Constraint Language (SHACL) were proposed for RDF validation, while in the case of Property Graphs, PGSchema was proposed, as well as other proposals like PShEx or ProGS. Wikidata adopted Entity Schemas, which are based on ShEx as well as its own property constraint system, and there is a proposal called WShEx. This tutorial explores different types of Knowledge Graphs and approaches for their validation. We will also review practical applications like inferring shapes from existing data and creating conforming subsets of Knowledge Graphs.

 

Streaming Linked Data Tutorial


Website: https://streamreasoning.org/events/sldt2024/

Organizers: Pieter Bonte and Riccardo Tommasini

Duration: Half Day

This tutorial provides a comprehensive introduction to Streaming Linked Data, including some fundamental aspects of Stream Processing and Stream Reasoning. Moreover, the tutorial covers all the stages of Streaming Linked Data lifecycle. Central to the tutorial is the recently published book “Streaming Linked Data: From Vision to Practice” and the recently renewed RDF4J library, which uniforms the interaction with existing Streaming Linked Data engines. In practice, the tutorial will include

(i)  A survey on existing research outcomes from Stream Reasoning and Streaming Linked Data, i.e., continuous querying, reactive reasoning over highly dynamic graph data;

(ii) The introduction of the Streaming Linked Data lifecycle for modelling, publishing, serving, and processing streaming data

(iii) The positioning of existing Streaming Linked Data engines to build and maintain Streaming Linked Data applications.

The tutorial will include several examples and exercises built around a relevant use case. Moreover, we plan to release the material together with a number of exercises for the attendees.

 

Neurosymbolic, Customized, and Compact CoPilots


Website: https://kauroy1994.github.io/ISWC-2024-Tutorial-Neurosymbolic-Customized-and-Compact-CoPilots/

Organizers: Kaushik Roy, Megha Chakraborty, Yuxin Zi, Manas Gaur and Amit Sheth

Duration: Half Day

Large Language Models (LLMs) are credible with open-domain interactions such as question answering, summarization, and explanation generation. LLM reasoning is based on parametrized knowledge, and as a consequence, the models often produce absurdities and inconsistencies in outputs (e.g., hallucinations and confirmation biases). In essence, they are fundamentally hard to control to prevent off-the-rails behaviors, are hard to fine-tune, customize for tailored needs, prompt effectively (due to the “tug-of-war” between external and parametric memory), and extremely resource-hungry due to the enormous size of their extensive parametric configurations. Thus, significant challenges arise when these models are required to perform in critical applications in domains such as healthcare and finance that need better guarantees and, in turn, need to support grounding, alignment, and instructibility. AI models for such critical applications should be customizable or tailored as appropriate for supporting user assistance in various tasks, compact to perform in real-world resource-constraint settings, and capable of controlled, robust, reliable, interpretable, and grounded reasoning (grounded in rules, guidelines, and protocols). This special session explores the development of compact, custom neurosymbolic AI models and their use through human-in-the-loop co-pilots for use in critical applications.

 

Ontology Engineering for Industry Adoption


Industry-wide collaborative ontology development efforts can distribute development costs over organizations, address a wider range of use cases, and have the potential to be of higher quality than many project or application-specific ontologies. Based on our experience with several industry ontologies, we will present several of the most important lessons learned in developing ontologies for industry applications, ranging from establishing critical policies from the outset to reusing standards-based patterns to leveraging collaborative tools for integration and testing. Participants will select example use cases as the basis for an in-class ontology, reuse example patterns, and test their work using open-source tools for serializing ontologies as well as tools that check for syntactic and semantic issues that well-known tools such as Protégé miss, providing direct experience with capabilities that are found essential for industry-standard ontology development.

 

Knowledge-Enhanced Retrieval Augmented Generation for Large Language Models


Website: https://kellm-fit.github.io/

Organizers: Stefan Decker, Jens Lehmann, Maria-Esther Vidal, Sahar Vahdati, Diego Collarana

Duration: Half Day

Large Language Models (LLMs) have significantly advanced the field of Artificial Intelligence, enabling applications ranging from conversational assistants to automated content generation. Despite their impressive capabilities, LLMs often face challenges in scenarios that require high factual accuracy, provenance, and updateability. This tutorial addresses these limitations by integrating LLMs with Knowledge Graphs (KGs) to create Knowledge-enhanced Large Language Models (KELLMs). KELLMs combine the linguistic prowess of LLMs with the structured, factual knowledge provided by KGs, resulting in more reliable and context-aware AI systems.