New Database Back End for WDQS

Wikidata:SPARQL query service/WDQS backend update/Backend Replacement - Wikidata

Jump to content

Donate

Create account

Personal tools

Donate

Create account

Wikidata:SPARQL query service/WDQS backend update/Backend Replacement

From Wikidata

Wikidata:SPARQL query service | WDQS backend update

Home

Updates

Backend Replacement

Architecture

High-Impact Use Cases

Discussion

Proposal: New Database Backend for WDQS [edit]

This document is intended for community review. Please share all feedback on the WDQS Migration talk page.

Summary [edit]

This document outlines the proposed selection of QLever as the new backend database for the Wikidata Platform. Blazegraph and its known limitations are the cause of much of the volatility of the Wikidata Query Service (WDQS) and is unable to scale to meet the growth targets of Wikidata. After testing deployments on AWS and on prem infrastructure, repeated ingestion and index update cycles, measuring latency and throughput on a limited set of publicly available and rewritten queries from October-March of FY26, and incorporating community feedback, we believe QLever will put us on the best path to ensuring reliable, sustainable, and scalable access to Wikidata now and into the future.

Context and Problem Statement [edit]

The Wikidata platform and the Wikidata Query Service (WDQS) have long been approaching the upper limits of their technical capacity, as demonstrated by the number of SLO-impacting incidents (~1 per week) and the erratic trends of performance indicators like query latency and throughput. The platform uses Blazegraph as the database application for the knowledge graph infrastructure, an open source project that has gone unmaintained since being acquired by Amazon in 2018. Meanwhile, the number of users, requests, edits, and data points in the knowledge graph has continued to increase.

To ensure the sustainability of WDQS, the Wikidata Platform team was established to improve and maintain the performance and stability of the service. Previous stewards of the platform explored many solutions to the scaling issue, most notably a split of the graph to reduce cost of query execution for WDQS. All solutions evaluated and implemented were acknowledged as insufficient for addressing the core issues with Blazegraph and were designed to provide additional runway until such time that a dedicated team could execute a migration to a new RDF database.

Assumptions [edit]

We used metrics in line with our migration goals to evaluate candidates. The tests we conducted evaluated a set of both quantitative (e.g. query latency) and qualitative (e.g. community support) metrics. We believe our methodology accurately captures the vendor qualities that will be most important in solving problems throughout the migration and beyond.

We were able to accurately assess the qualitative criteria for our choice. Some of our evaluation dimensions, as mentioned above, were focused on a vendor’s community activity and project governance. We believe we evaluated these dimensions as thoroughly as possible.

Replacing the backend database will drive meaningful improvement on our migration metrics and on the experience of users. The data ingestion, run time benchmarking and production-replay traffic analyses (as referenced above, even if limited), validated this assumption, demonstrating that QLever drove meaningful improvements across all identified criteria when compared to Blazegraph.

The implementation of our proposed technical architecture will protect against similar problems of abandonware in the future. As outlined in our design document for the technical architecture of our new endpoints, we plan to decouple the service and application layers of the Wikidata platform. Doing so will make future evaluations of backend replacements less dependent on large-scale changes.

Recommendation [edit]

After thorough evaluation, we recommend QLever as the new RDF database for the Wikidata Platform. Benchmarking and initial production-replay testing conducted from October-March of FY26 determined the system was capable of loading the entirety of the Wikidata dataset, supporting existing functionality of our platform, and meeting or exceeding all target performance indicators compared to Blazegraph.

Risks and Mitigations [edit]

Internal Risks (Challenges to Assumptions) [edit]

We did not use the right metrics to evaluate candidates Mitigation : If we discover a blindspot in our analyses or identify performance indicators that better capture our target changes, we will have a more dynamic platform infrastructure that will enable agile updates and iteration. Additionally, the QLever team has demonstrated a willingness to prioritize features that support the Wikidata use case.

We were not able to accurately assess the qualitative dimensions Mitigation : In the event that our assessments of qualitative dimensions were inaccurate, we will have the...

New Database Back End for WDQS

Related Articles

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play

Old Reddit Is Down

The ultimate female fantasy – A feminist critique of Beauty and the Beast