Accepted Tutorials

Same Data; Different Models (SDDM)

Shaping Knowledge Graphs (ShapingKGs)

Streaming Linked Data Tutorial (SLDT)

Pieter Bonte and Riccardo Tommasini

Neurosymbolic Customized and Compact CoPilotsn #1

Researchers, scientists and companies alike increasingly leverage semantically enriched, linked datasets to train machine learning models for tasks ranging from discovering new vaccines and materials, to recommending products and services, to building virtual personal assistants. At the same time, big-data analytics engines are increasingly adopted to store and process the ever increasing volumes of data efficiently at scale. Until recently however, the Semantic Web, big data analytics and machine learning communities were separated, since big-data analytics engines could not process Knowledge Graphs (KGs). This tutorial aims to raise awareness of the gap between the big data analytics and machine learning communities and the Semantic Web community. By providing an overview of the state of the art in scalable analytics for semantic data, its goal is to help promote the synergy between these communities and encourage the discussion and exchange of ideas about this timely topic. Hands-on activities covering statistical analytics and inferencing over KGs, using simple use-cases will be provided.

Graph databases have received a lot of attention in recent years, mostly due to their role as the underlying storage and query mechanism for knowledge graphs. This led to different graph solutions being developed throughout the years, with the RDF data format together with the SPARQL query standard being widely used in large open knowledge graphs such as Wikidata or DBpedia, while commercial systems generally deploy the property graph data model, with the recent GQL ISO standard formalizing query languages in this setting. In this tutorial, we give a detailed overview of graph data models and query standards, followed by a deep dive into recent techniques for graph query processing, most notably the use of worst-case optimal algorithms and automata-guided path retrieval.

The Sci-K workshop provides a forum for researchers to collaborate on improving scientific knowledge representation, discovery, and assessment. A key focus is developing Scientific Knowledge Graphs (SKGs) that provide flexible yet structured ways to represent scholarly information. This involves creating ontologies that effectively capture the complexities of knowledge across different SKGs. Making information within SKGs easily discoverable is crucial, necessitating solutions for extracting key elements, integrating diverse sources, pinpointing connections, and resolving inconsistencies. Finally, due to the ever-growing volume of research, comprehensive systems are needed to measure the impact and value of diverse scientific outputs and researchers. The tutorial introduces the topic of Semantic Table Interpretation (STI), covering theoretical and practical considerations. In particular, the tutorial will provide a comprehensive analysis of how the approaches to STI have evolved from heuristic-based to ML-based, to the most recent LLM-based approaches. The analysis will consider the specific characteristics of these different classes, providing insights into their respective advantages and limitations to identify the contexts of use. The final part will describe a case study to demonstrate the application of two state-of-the-art approaches.

As knowledge graph development (and in particular, the development of their schemas) grows commensurately with the importance of knowledge graphs in industry and academia, it follows that choosing a development methodology to fit the application scenario and domain is correspondingly important. We have thus organized a 3-hour, hands-on tutorial for the purposes of comparing and contrasting three distinct ontology modeling methodologies: Graphical Modular Ontology Modeling (GraphMOMo), Extreme Design for Ontology Engineering (xD), and LLM-assisted Knowledge Engineering (copilot). Attendees will have the opportunity to execute each methodology. The tutorial will culminate in a retrospective of the different sub-tutorials.

Knowledge Graphs are increasingly being employed to improve data interoperability, search, and recommendation, alongside fostering the adoption of semantic web technologies. The quality of data within these graphs is pivotal, often validated against expected data models or shapes to enhance accuracy. Various technologies implement knowledge graphs; RDF-based triplestores are canonical in the Semantic Web, while in the graph databases context, Property Graphs are also considered for Knowledge Graphs. Wikidata, a popular Knowledge Graph, offers RDF through its SPARQL query service, but its data model aligns closely with Property Graphs using qualifiers and references, and the recent proposal of RDF-Star can bridge the gap between RDF and Property Graphs.

Shape Expressions (ShEx) and Shapes Constraint Language (SHACL) were proposed for RDF validation, while in the case of Property Graphs, PGSchema was proposed, as well as other proposals like PShEx or ProGS. Wikidata adopted Entity Schemas, which are based on ShEx as well as its own property constraint system, and there is a proposal called WShEx. This tutorial explores different types of Knowledge Graphs and approaches for their validation. We will also review practical applications like inferring shapes from existing data and creating conforming subsets of Knowledge Graphs.

This tutorial provides a comprehensive introduction to Streaming Linked Data, including some fundamental aspects of Stream Processing and Stream Reasoning. Moreover, the tutorial covers all the stages of Streaming Linked Data lifecycle. Central to the tutorial is the recently published book “Streaming Linked Data: From Vision to Practice” and the recently renewed RDF4J library, which uniforms the interaction with existing Streaming Linked Data engines. In practice, the tutorial will include

(i) A survey on existing research outcomes from Stream Reasoning and Streaming Linked Data, i.e., continuous querying, reactive reasoning over highly dynamic graph data;

(ii) The introduction of the Streaming Linked Data lifecycle for modelling, publishing, serving, and processing streaming data

(iii) The positioning of existing Streaming Linked Data engines to build and maintain Streaming Linked Data applications.

The tutorial will include several examples and exercises built around a relevant use case. Moreover, we plan to release the material together with a number of exercises for the attendees.

Large Language Models (LLMs) are credible with open-domain interactions such as question answering, summarization, and explanation generation. LLM reasoning is based on parametrized knowledge, and as a consequence, the models often produce absurdities and inconsistencies in outputs (e.g., hallucinations and confirmation biases). In essence, they are fundamentally hard to control to prevent off-the-rails behaviors, are hard to fine-tune, customize for tailored needs, prompt effectively (due to the “tug-of-war” between external and parametric memory), and extremely resource-hungry due to the enormous size of their extensive parametric configurations. Thus, significant challenges arise when these models are required to perform in critical applications in domains such as healthcare and finance that need better guarantees and, in turn, need to support grounding, alignment, and instructibility. AI models for such critical applications should be customizable or tailored as appropriate for supporting user assistance in various tasks, compact to perform in real-world resource-constraint settings, and capable of controlled, robust, reliable, interpretable, and grounded reasoning (grounded in rules, guidelines, and protocols). This special session explores the development of compact, custom neurosymbolic AI models and their use through human-in-the-loop co-pilots for use in critical applications.

Industry-wide collaborative ontology development efforts can distribute development costs over organizations, address a wider range of use cases, and have the potential to be of higher quality than many project or application-specific ontologies. Based on our experience with several industry ontologies, we will present several of the most important lessons learned in developing ontologies for industry applications, ranging from establishing critical policies from the outset to reusing standards-based patterns to leveraging collaborative tools for integration and testing. Participants will select example use cases as the basis for an in-class ontology, reuse example patterns, and test their work using open-source tools for serializing ontologies as well as tools that check for syntactic and semantic issues that well-known tools such as Protégé miss, providing direct experience with capabilities that are found essential for industry-standard ontology development.

Large Language Models (LLMs) have significantly advanced the field of Artificial Intelligence, enabling applications ranging from conversational assistants to automated content generation. Despite their impressive capabilities, LLMs often face challenges in scenarios that require high factual accuracy, provenance, and updateability. This tutorial addresses these limitations by integrating LLMs with Knowledge Graphs (KGs) to create Knowledge-enhanced Large Language Models (KELLMs). KELLMs combine the linguistic prowess of LLMs with the structured, factual knowledge provided by KGs, resulting in more reliable and context-aware AI systems.

International Semantic Web Conference 2024

College of Engineering and Information Technology

International Semantic Web Conference 2024

Accepted Tutorials

TITLE

ORGANIZERS

DURATION

Big Data Analytics for Semantic Data

Recent advances in Graph Data Management

Wikidata Wizardry 101: From Query Spells to Data Charms

Semantic Table Interpretation: from Heuristic to LLM-based approaches

Same Data; Different Models

Shaping Knowledge Graphs

Streaming Linked Data Tutorial

Neurosymbolic, Customized, and Compact CoPilots

Ontology Engineering for Industry Adoption

Knowledge-Enhanced Retrieval Augmented Generation for Large Language Models

International Semantic Web Conference 2024

TITLE

ORGANIZERS

DURATION

Big Data Analytics for Semantic Data

Recent advances in Graph Data Management

Wikidata Wizardry 101: From Query Spells to Data Charms

Semantic Table Interpretation: from Heuristic to LLM-based approaches

Same Data; Different Models

Shaping Knowledge Graphs

Streaming Linked Data Tutorial

Neurosymbolic, Customized, and Compact CoPilots

Ontology Engineering for Industry Adoption

Knowledge-Enhanced Retrieval Augmented Generation for Large Language Models

Subscribe to UMBC Weekly Top Stories

I am interested in: