Benchmarking Graph Database Backends—What Works Well with Wikidata?
Abstract
Knowledge bases often utilize graphs as logical model. RDF-based knowledge bases (KB) are prime examples, as RDF (Resource Description Framework) does use graph as logical model. Graph databases are an emerging breed of NoSQL-type databases, offering graph as the logical model. Although there are specialized databases, the so-called triple stores, for storing RDF data, graph databases can also be promising candidates for storing knowledge. In this paper, we benchmark different graph database implementations loaded with Wikidata, a real-life, large-scale knowledge base. Graph databases come in all shapes and sizes, offer different APIs and graph models. Hence we used a measurement system, that can abstract away the API differences. For the modeling aspect, we made measurements with different graph encodings previously suggested in the literature, in order to observe the impact of the encoding aspect on the overall performance.
Downloads
References
Blazegraph products. https://web.archive.org/web/20171125161035/https://www.blazegraph.com/product/ Accessed: 2018-03-13.
Blazegraph TinkerPop implementation. https://web.archive.org/web/20180611150556/https://github.com/blazegraph/tinkerpop3 Accessed: 2018-09-09.
DB-Engines ranking of graph DBMS. https://web.archive.org/web/20180911002043/https://db-engines.com/en/ranking/graph+dbms Accessed: 2018-03-13.
Grakn.AI - the knowledge graph. https://web.archive.org/web20180918085112/http://www.grakn.ai/grakn-core Accessed: 2018-08-02.
HypergraphDB - a graph database. https://web.archive.org/web/20180809121925/http://hypergraphdb.org/ Accessed: 2018-08-02.
Introduction to Azure Cosmos DB: Graph API. https://web.archive.org/web/20180911002034/https://docs.microsoft.com/en-us/azure/cosmos-db/graph-introduction Accessed: 2018-08-02.
JanusGraph. https://web.archive.org/web/20180919165133/http://janusgraph.org/ Accessed: 2018-03-13.
JanusGraph storage backends. https://web.archive.org/web/20180209145536/http://docs.janusgraph.org:80/latest/storage-backends.html Accessed: 2018-03-13.
Linked Data Benchmark Council. https://web.archive.org/web/20181228154821/http://ldbcouncil.org/ Accessed: 2018-08-06.
OrientDB. https://web.archive.org/web/20181016165245/https://orientdb.com Accessed: 2018-12-07.
Planner hints and the USING keyword. https://web.archive.org/web/20180206081105/http://neo4j.com:80/docs/developer-manual/current/cypher/query-tuning/using/ Accessed: 2018-08-03.
RDF schema 1.1 - reification vocabulary. https://web.archive.org/web/20180920184035/https://www.w3.org/TR/rdf-schema/. Accessed: 2018-08-06.
TinkerPop3 documentation. https://web.archive.org/web/20180923130832/http://tinkerpop.apache.org/docs/3.3.3/ Accessed: 2018-08-06.
Titan Graph Database. https://web.archive.org/web/20180910214447/http://titan.thinkaurelius.com/ Accessed: 2018-09-09.
What is RDF triplestore?. https://web.archive.org/web/20170506152814/http://ontotext.com:80/knowledgehub/fundamentals/what-is-rdf-triplestore/ Accessed: 2018-09-11.
Wikidata Query Service. https://query.wikidata.org/ Accessed: 2018-09-09.
Wikidata Query Service - user manual. https://web.archive.org/web/20180917181601/https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual Accessed: 2018-09-09.
Wikidata statistics dashboard for references. https://grafana.wikimedia.org/d/000000182/wikidata-datamodel-references?orgId=1&from=1514836723618&to=1543694323619 Accessed: 2018-12-11.
Angles, Renzo. A comparison of current graph database models. In Proceedings of the 2012 IEEE 28th International Conference on Data Engineering Workshops, ICDEW '12, pages 171--177, Washington, DC, USA, 2012. IEEE Computer Society. DOI: 10.1109/ICDEW.2012.31.
Angles, Renzo, Prat-Pérez, Arnau, Dominguez-Sal, David, and Larriba-Pey, Josep-Lluis. Benchmarking database systems for social network applications. In First International Workshop on Graph Data Management Experiences and Systems, GRADES '13, pages 15:1--15:7, New York, NY, USA, 2013. ACM. DOI: 10.1145/2484425.2484440.
Cyganiak, Richard, Wood, David, Lanthaler, Markus, Klyne, Graham, Carroll, Jeremy J, and McBride, Brian. RDF 1.1 concepts and abstract syntax. W3C recommendation, 25(02), 2014.
Duan, Songyun, Kementsietsidis, Anastasios, Srinivas, Kavitha, and Udrea, Octavian. Apples and oranges: A comparison of RDF benchmarks and real RDF datasets. pages 145-156, 01 2011. DOI: 10.1145/1989323.1989340.
Ehrlinger, Lisa and Wöss, Wolfram. Towards a definition of knowledge graphs. In SEMANTiCS, 2016.
Erling, Orri, Averbuch, Alex, Larriba-Pey, Josep, Chafi, Hassan, Gubichev, Andrey, Prat, Arnau, Pham, Minh-Duc, and Boncz, Peter. The LDBC social network benchmark: Interactive workload. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD '15, pages 619--630, New York, NY, USA, 2015. ACM. DOI: 10.1145/2723372.2742786.
Erxleben, Fredo, Günther, Michael, Krötzsch, Markus, Mendez, Julian, and Vrandecic, Denny. Introducing Wikidata to the linked data web. In Proceedings of the 13th International Semantic Web Conference (ISWC'14), volume 8796 of LNCS, pages 50--65. Springer, 2014.
Farber, Michael, Bartscherer, Frederic, Menne, Carsten, and Rettinger, Achim. Linked data quality of DBpedia, Freebase, OpenCyc, Wikidata, and Yago. Semantic Web, pages 1--53, 2016.
Harris, Steve, Seaborne, Andy, and Prud’hommeaux, Eric. SPARQL 1.1 query language. W3C recommendation, 21(10), 2013.
Hartig, O. and Thompson, B. Foundations of an Alternative Approach to Reification in RDF. ArXiv e-prints, June 2014.
Hernández, Daniel, Hogan, Aidan, and Krötzsch, Markus. Reifying RDF: what works well with Wikidata?. In Proceedings of the 11th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS 2015), volume 1457 of CEUR Workshop Proceedings. CEUR-WS.org, 2015.
Hern'andez, Daniel, Hogan, Aidan, Riveros, Cristian, Rojas, Carlos, and Zerega, Enzo. Querying wikidata: Comparing SPARQL, relational and graph databases. In International Semantic Web Conference, pages 88--103. Springer, 2016.
Iosup, Alexandru, Hegeman, Tim, Ngai, Wing~Lung, Heldens, Stijn, Prat-Pérez, Arnau, Manhardto, Thomas, Chafio, Hassan, Capota, Mihai, Sundaram, Narayanan, Anderson, Michael, Tanase, Ilie Gabriel, Xia, Yinglong, Nai, Lifeng, and Boncz, Peter. LDBC graphalytics: A benchmark for large-scale graph analysis on parallel and distributed platforms. Proc. VLDB Endow., 9(13):1317--1328, September 2016. DOI: 10.14778/3007263.3007270.
Jouili, S. and Vansteenberghe, V. An empirical comparison of graph databases. In 2013 International Conference on Social Computing, pages 708-715, Sept 2013. DOI: 10.1109/SocialCom.2013.106.
Kotsev, Venelin, Minadakis, Nikos, Papakonstantinou, Vassilis, Erling, Orri, Fundulaki, Irini, and Kiryakov, Atanas. Benchmarking RDF query engines: The LDBC semantic publishing benchmark.. In BLINK@ ISWC, 2016.
Kovács, Tibor. Nagyméretű szemantikus adathalmazok tárolási megoldásainak teljesítményközpontú összehasonlítása. In BME-VIK TDK, 2017.
Morsey, Mohamed, Lehmann, Jens, Auer, Sören, and Ngonga Ngomo, Axel-Cyrille. Dbpedia SPARQL benchmark -- performance assessment with real queries
on real data. In Aroyo, Lora, Welty, Chris, Alani, Harith, Taylor, Jamie, Bernstein, Abraham, Kagal, Lalana, Noy, Natasha, and Blomqvist, Eva, editors, The Semantic Web -- ISWC 2011, pages 454--469, Berlin, Heidelberg, 2011. Springer Berlin Heidelberg.
Nguyen, Vinh, Bodenreider, Olivier, and Sheth, Amit. Dont like RDF reification?. Proceedings of the 23rd international conference on World wide web - WWW 14, page 759–770, Apr 2014. DOI: 10.1145/2566486.2567973.
Noy, Natasha, Rector, Alan, Hayes, Pat, and Welty, Chris. Defining n-ary relations on the semantic web. W3C working group note, 12(4), 2006.
Pacaci, Anil, Zhou, Alice, Lin, Jimmy, and Özsu, M. Tamer. Do we need specialized graph databases?: Benchmarking real-time social networking applications. pages 1-7, 05 2017. DOI: 10.1145/3078447.3078459.
Pan, Zhengyu, Zhu, Tao, Liu, Hong, and Ning, Huansheng. A survey of RDF management technologies and benchmark datasets. Journal of Ambient Intelligence and Humanized Computing, 9(5):1693--1704, Oct 2018. DOI: 10.1007/s12652-018-0876-2.
Robinson, Ian, Webber, Jim, and Eifrem, Emil. Graph Databases. OReilly Media, 2015.
Rodriguez, Marko A. A letter regarding native graph databases. https://web.archive.org/web/20180828112004/https://www.datastax.com/dev/blog/a-letter-regarding-native-graph-databases, 2013.
Rodriguez, Marko A. and Neubauer, Peter. Constructions from dots and lines. CoRR, abs/1006.2361, 2010.