The website is now located at http://graphlearning.io
This page contains collected benchmark data sets for the evaluation of graph kernels. The data sets were collected by Kristian Kersting, Nils M. Kriege, Christopher Morris, Petra Mutzel, and Marion Neumann with partial support of the German Science Foundation (DFG) within the Collaborative Research Center SFB 876 “Providing Information by Resource-Constrained Data Analysis”, project A6 “Resource-efficient Graph Mining”.
Name | Source | Statistics | Labels/Attributes | Download (ZIP) | ||||||
---|---|---|---|---|---|---|---|---|---|---|
Num. of Graphs | Num. of Classes | Avg. Number of Nodes | Avg. Number of Edges | Node Labels | Edge Labels | Node Attr. (Dim.) | Edge Attr. (Dim.) | |||
AIDS | [16,17] | 2000 | 2 | 15.69 | 16.20 | + | + | + (4) | – | AIDS |
alchemy_dev | [29] | 99776 | R (12) | 9.71 | 10.02 | + | + | – | – | alchemy_dev |
alchemy_test | [29] | 15760 | – | 11.25 | 11.76 | + | + | – | – | alchemy_test |
alchemy_valid | [29] | 3951 | R (12) | 11.25 | 11.77 | + | + | – | – | alchemy_valid |
BZR | [7] | 405 | 2 | 35.75 | 38.36 | + | – | + (3) | – | BZR |
BZR_MD | [7,23] | 306 | 2 | 21.30 | 225.06 | + | + | – | + (1) | BZR_MD |
COIL-DEL | [16,18] | 3900 | 100 | 21.54 | 54.24 | – | + | + (2) | – | COIL-DEL |
COIL-RAG | [16,18] | 3900 | 100 | 3.01 | 3.02 | – | – | + (64) | + (1) | COIL-RAG |
COLLAB | [14] | 5000 | 3 | 74.49 | 2457.78 | – | – | – | – | COLLAB |
COLORS-3 | [27] | 10500 | 11 | 61.31 | 91.03 | – | – | + (4) | – | COLORS-3 |
COX2 | [7] | 467 | 2 | 41.22 | 43.45 | + | – | + (3) | – | COX2 |
COX2_MD | [7,23] | 303 | 2 | 26.28 | 335.12 | + | + | – | + (1) | COX2_MD |
Cuneiform | [25] | 267 | 30 | 21.27 | 44.80 | + | + | + (3) | + (2) | Cuneiform |
DBLP_v1 | [26] | 19456 | 2 | 10.48 | 19.65 | + | + | – | – | DBLP_v1 |
DHFR | [7] | 467 | 2 | 42.43 | 44.54 | + | – | + (3) | – | DHFR |
DHFR_MD | [7,23] | 393 | 2 | 23.87 | 283.01 | + | + | – | + (1) | DHFR_MD |
ER_MD | [7,23] | 446 | 2 | 21.33 | 234.85 | + | + | – | + (1) | ER_MD |
DD | [6,22] | 1178 | 2 | 284.32 | 715.66 | + | – | – | – | DD |
ENZYMES | [4,5] | 600 | 6 | 32.63 | 62.14 | + | – | + (18) | – | ENZYMES |
Fingerprint | [16,19] | 2800 | 4 | 5.42 | 4.42 | – | – | + (2) | + (2) | Fingerprint |
FIRSTMM_DB | [11,12,13] | 41 | 11 | 1377.27 | 3074.10 | + | – | + (1) | + (2) | FIRSTMM_DB |
FRANKENSTEIN | [15] | 4337 | 2 | 16.90 | 17.88 | – | – | + (780) | – | FRANKENSTEIN |
IMDB-BINARY | [14] | 1000 | 2 | 19.77 | 96.53 | – | – | – | – | IMDB-BINARY |
IMDB-MULTI | [14] | 1500 | 3 | 13.00 | 65.94 | – | – | – | – | IMDB-MULTI |
KKI | [26] | 83 | 2 | 26.96 | 48.42 | + | – | – | – | KKI |
Letter-high | [16] | 2250 | 15 | 4.67 | 4.50 | – | – | + (2) | – | Letter-high |
Letter-low | [16] | 2250 | 15 | 4.68 | 3.13 | – | – | + (2) | – | Letter-low |
Letter-med | [16] | 2250 | 15 | 4.67 | 4.50 | – | – | + (2) | – | Letter-med |
MCF-7 | [28] | 27770 | 2 | 26.39 | 28.52 | + | + | – | – | MCF-7 |
MCF-7H | [28] | 27770 | 2 | 47.30 | 49.43 | + | + | – | – | MCF-7H |
MOLT-4 | [28] | 39765 | 2 | 26.09 | 28.13 | + | + | – | – | MOLT-4 |
MOLT-4H | [28] | 39765 | 2 | 46.70 | 48.73 | + | + | – | – | MOLT-4H |
Mutagenicity | [16,20] | 4337 | 2 | 30.32 | 30.77 | + | + | – | – | Mutagenicity |
MSRC_9 | [13] | 221 | 8 | 40.58 | 97.94 | + | – | – | – | MSCR_9 |
MSRC_21 | [13] | 563 | 20 | 77.52 | 198.32 | + | – | – | – | MSRC_21 |
MSRC_21C | [13] | 209 | 20 | 40.28 | 96.60 | + | – | – | – | MSRC_21C |
MUTAG | [1,23] | 188 | 2 | 17.93 | 19.79 | + | + | – | – | MUTAG |
NCI1 | [8,9,22] | 4110 | 2 | 29.87 | 32.30 | + | – | – | – | NCI1 |
NCI109 | [8,9,22] | 4127 | 2 | 29.68 | 32.13 | + | – | – | – | NCI109 |
NCI-H23 | [28] | 40353 | 2 | 26.07 | 28.10 | + | + | – | – | NCI-H23 |
NCI-H23H | [28] | 40353 | 2 | 46.67 | 48.69 | + | + | – | – | NCI-H23H |
OHSU | [26] | 79 | 2 | 82.01 | 199.66 | + | – | – | – | OHSU |
OVCAR-8 | [28] | 40516 | 2 | 26.07 | 28.10 | + | + | – | – | OVCAR-8 |
OVCAR-8H | [28] | 40516 | 2 | 46.67 | 48.70 | + | + | – | – | OVCAR-8H |
P388 | [28] | 41472 | 2 | 22.11 | 23.55 | + | + | – | – | P388 |
P388H | [28] | 41472 | 2 | 40.44 | 41.88 | + | + | – | – | P388H |
PC-3 | [28] | 27509 | 2 | 26.35 | 28.49 | + | + | – | – | PC-3 |
PC-3H | [28] | 27509 | 2 | 47.19 | 49.32 | + | + | – | – | PC-3H |
Peking_1 | [26] | 85 | 2 | 39.31 | 77.35 | + | – | – | – | Peking_1 |
PTC_FM | [2,23] | 349 | 2 | 14.11 | 14.48 | + | + | – | – | PTC_FM |
PTC_FR | [2,23] | 351 | 2 | 14.56 | 15.00 | + | + | – | – | PTC_FR |
PTC_MM | [2,23] | 336 | 2 | 13.97 | 14.32 | + | + | – | – | PTC_MM |
PTC_MR | [2,23] | 344 | 2 | 14.29 | 14.69 | + | + | – | – | PTC_MR |
PROTEINS | [4,6] | 1113 | 2 | 39.06 | 72.82 | + | – | + (1) | – | PROTEINS |
PROTEINS_full | [4,6] | 1113 | 2 | 39.06 | 72.82 | + | – | + (29) | – | PROTEINS_full |
REDDIT-BINARY | [14] | 2000 | 2 | 429.63 | 497.75 | – | – | – | – | REDDIT-BINARY |
REDDIT-MULTI-5K | [14] | 4999 | 5 | 508.52 | 594.87 | – | – | – | – | REDDIT-MULTI-5K |
REDDIT-MULTI-12K | [14] | 11929 | 11 | 391.41 | 456.89 | – | – | – | – | REDDIT-MULTI-12K |
SF-295 | [28] | 40271 | 2 | 26.06 | 28.08 | + | + | – | – | SF-295 |
SF-295H | [28] | 40271 | 2 | 46.65 | 48.68 | + | + | – | – | SF-295H |
SN12C | [28] | 40004 | 2 | 26.08 | 28.11 | + | + | – | – | SN12C |
SN12CH | [28] | 40004 | 2 | 46.69 | 48.71 | + | + | – | – | SN12CH |
SW-620 | [28] | 40532 | 2 | 26.05 | 28.08 | + | + | – | – | SW-620 |
SW-620H | [28] | 40532 | 2 | 46.62 | 48.65 | + | + | – | – | SW-620H |
SYNTHETIC | [3] | 300 | 2 | 100.00 | 196.00 | – | – | + (1) | – | SYNTHETIC |
SYNTHETICnew | [3,10] | 300 | 2 | 100.00 | 196.25 | – | – | + (1) | – | SYNTHETICnew |
Synthie | [21] | 400 | 4 | 95.00 | 172.93 | – | – | + (15) | – | Synthie |
Tox21_AhR_training | [24] | 8169 | 2 | 18.09 | 18.50 | + | + | – | – | Tox21_AhR_training |
Tox21_AhR_testing | [24] | 272 | 2 | 22.13 | 23.05 | + | + | – | – | Tox21_AhR_testing |
Tox21_AhR_evaluation | [24] | 607 | 2 | 17.64 | 18.06 | + | + | – | – | Tox21_AhR_evaluation |
Tox21_AR_training | [24] | 9362 | 2 | 18.39 | 18.84 | + | + | – | – | Tox21_AR_training |
Tox21_AR_testing | [24] | 292 | 2 | 22.35 | 23.32 | + | + | – | – | Tox21_AR_testing |
Tox21_AR_evaluation | [24] | 585 | 2 | 17.99 | 18.45 | + | + | – | – | Tox21_AR_evaluation |
Tox21_AR-LBD_training | [24] | 8599 | 2 | 17.77 | 18.16 | + | + | – | – | Tox21_AR-LBD_training |
Tox21_AR-LBD_testing | [24] | 253 | 2 | 21.85 | 22.73 | + | + | – | – | Tox21_AR-LBD_testing |
Tox21_AR-LBD_evaluation | [24] | 580 | 2 | 17.09 | 17.42 | + | + | – | – | Tox21_AR-LBD_evaluation |
Tox21_ARE_training | [24] | 7167 | 2 | 16.28 | 16.52 | + | + | – | – | Tox21_ARE_training |
Tox21_ARE_testing | [24] | 234 | 2 | 21.99 | 22.91 | + | + | – | – | Tox21_ARE_testing |
Tox21_ARE_evaluation | [24] | 552 | 2 | 17.01 | 17.33 | + | + | – | – | Tox21_ARE_evaluation |
Tox21_aromatase_training | [24] | 7226 | 2 | 17.50 | 17.79 | + | + | – | – | Tox21_aromatase_training |
Tox21_aromatase_testing | [24] | 214 | 2 | 21.65 | 22.36 | + | + | – | – | Tox21_aromatase_testing |
Tox21_aromatase_evaluation | [24] | 528 | 2 | 16.74 | 16.99 | + | + | – | – | Tox21_aromatase_evaluation |
Tox21_ATAD5_training | [24] | 9091 | 2 | 17.89 | 18.30 | + | + | – | – | Tox21_ATAD5_training |
Tox21_ATAD5_testing | [24] | 272 | 2 | 21.99 | 22.89 | + | + | – | – | Tox21_ATAD5_testing |
Tox21_ATAD5_evaluation | [24] | 619 | 2 | 17.68 | 18.11 | + | + | – | – | Tox21_ATAD5_evaluation |
Tox21_ER_training | [24] | 7697 | 2 | 17.58 | 17.94 | + | + | – | – | Tox21_ER_training |
Tox21_ER_testing | [24] | 265 | 2 | 22.16 | 23.13 | + | + | – | – | Tox21_ER_testing |
Tox21_ER_evaluation | [24] | 515 | 2 | 17.66 | 18.10 | + | + | – | – | Tox21_ER_evaluation |
Tox21_ER-LBD_training | [24] | 8753 | 2 | 18.06 | 18.47 | + | + | – | – | Tox21_ER-LBD_training |
Tox21_ER-LBD_testing | [24] | 287 | 2 | 22.28 | 23.23 | + | + | – | – | Tox21_ER-LBD_testing |
Tox21_ER-LBD_evaluation | [24] | 599 | 2 | 17.75 | 18.17 | + | + | – | – | Tox21_ER-LBD_evaluation |
Tox21_HSE_training | [24] | 8150 | 2 | 16.72 | 17.04 | + | + | – | – | Tox21_HSE_training |
Tox21_HSE_testing | [24] | 267 | 2 | 22.07 | 23.00 | + | + | – | – | Tox21_HSE_testing |
Tox21_HSE_evaluation | [24] | 607 | 2 | 17.61 | 18.01 | + | + | – | – | Tox21_HSE_evaluation |
Tox21_MMP_training | [24] | 7320 | 2 | 17.49 | 17.83 | + | + | – | – | Tox21_MMP_training |
Tox21_MMP_testing | [24] | 238 | 2 | 21.68 | 22.55 | + | + | – | – | Tox21_MMP_testing |
Tox21_MMP_evaluation | [24] | 541 | 2 | 16.67 | 16.88 | + | + | – | – | Tox21_MMP_evaluation |
Tox21_p53_training | [24] | 8634 | 2 | 17.79 | 18.19 | + | + | – | – | Tox21_p53_training |
Tox21_p53_testing | [24] | 269 | 2 | 22.14 | 23.04 | + | + | – | – | Tox21_p53_testing |
Tox21_p53_evaluation | [24] | 613 | 2 | 17.34 | 17.72 | + | + | – | – | Tox21_p53_evaluation |
Tox21_PPAR-gamma_training | [24] | 8184 | 2 | 17.23 | 17.55 | + | + | – | – | Tox21_PPAR-gamma_training |
Tox21_PPAR-gamma_testing | [24] | 267 | 2 | 22.04 | 22.93 | + | + | – | – | Tox21_PPAR-gamma_testing |
Tox21_PPAR-gamma_evaluation | [24] | 602 | 2 | 17.38 | 17.77 | + | + | – | – | Tox21_PPAR-gamma_evaluation |
TRIANGLES | [27] | 45000 | 10 | 20.85 | 32.74 | – | – | – | – | TRIANGLES |
TWITTER-Real-Graph-Partial | [26] | 144033 | 2 | 4.03 | 4.98 | + | – | – | + (1) | TWITTER-Real-Graph-Partial |
UACC257 | [28] | 39988 | 2 | 26.09 | 28.12 | + | + | – | – | UACC257 |
UACC257H | [28] | 39988 | 2 | 46.68 | 48.71 | + | + | – | – | UACC257H |
Yeast | [28] | 79601 | 2 | 21.54 | 22.84 | + | + | – | – | Yeast |
YeastH | [28] | 79601 | 2 | 39.44 | 40.74 | + | + | – | – | YeastH |
All Data Sets | DS_all |
R(N) are regression datasets with N tasks per graph.
The data sets have the following format (replace DS by the name of the data set):
Let
There are optional files if the respective information is available:
The datasets can also be accessed using PyTorch Geometric and the Deep Graph Library.
We encourage you to refer to our website at http://graphkernels.cs.tu-dortmund.de if you have used the data sets for your publication. Please use the following BibTeX citation:
@misc{KKMMN2016, title = {Benchmark Data Sets for Graph Kernels}, author = {Kristian Kersting and Nils M. Kriege and Christopher Morris and Petra Mutzel and Marion Neumann}, year = {2016}, url = {http://graphkernels.cs.tu-dortmund.de} }
If your bibliography style does not support the url field, you may use this alternative:
@misc{KKMMN2016, title = {Benchmark Data Sets for Graph Kernels}, author = {Kristian Kersting and Nils M. Kriege and Christopher Morris and Petra Mutzel and Marion Neumann}, year = {2016}, note = {\url{http://graphkernels.cs.tu-dortmund.de}} }
[1] Debnath, A.K., Lopez de Compadre, R.L., Debnath, G., Shusterman, A.J., and Hansch, C. Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. Correlation with molecular orbital energies and hydrophobicity. J. Med. Chem. 34(2):786-797 (1991).
[2] Helma, C., King, R. D., Kramer, S., and Srinivasan, A. The Predictive Toxicology Challenge 2000–2001. Bioinformatics, 2001, 17, 107-108. URL: www.predictive-toxicology.org/ptc
[3] Feragen, A., Kasenburg, N., Petersen, J., de Bruijne, M., Borgwardt, K.M.: Scalable kernels for graphs with continuous attributes. In: C.J.C. Burges, L. Bottou, Z. Ghahramani, K.Q. Weinberger (eds.) NIPS, pp. 216-224 (2013).
[4] K. M. Borgwardt, C. S. Ong, S. Schoenauer, S. V. N. Vishwanathan, A. J. Smola, and H. P. Kriegel. Protein function prediction via graph kernels. Bioinformatics, 21(Suppl 1):i47–i56, Jun 2005.
[5] I. Schomburg, A. Chang, C. Ebeling, M. Gremse, C. Heldt, G. Huhn, and D. Schomburg. Brenda, the enzyme database: updates and major new developments. Nucleic Acids Research, 32D:431–433, 2004.
[6] P. D. Dobson and A. J. Doig. Distinguishing enzyme structures from non-enzymes without alignments. J. Mol. Biol., 330(4):771–783, Jul 2003.
[7] Sutherland, J. J.; O'Brien, L. A. & Weaver, D. F. Spline-fitting with a genetic algorithm: a method for developing classification structure-activity relationships. J. Chem. Inf. Comput. Sci., 2003, 43, 1906-1915.
[8] N. Wale and G. Karypis. Comparison of descriptor spaces for chemical compound retrieval and classification. In Proc. of ICDM, pages 678–689, Hong Kong, 2006.
[9] http://pubchem.ncbi.nlm.nih.gov
[10] http://image.diku.dk/aasa/papers/graphkernels_nips_erratum.pdf
[11] M. Neumann, P. Moreno, L. Antanas, R. Garnett, K. Kersting. Graph Kernels for Object Category Prediction in Task-Dependent Robot Grasping. Eleventh Workshop on Mining and Learning with Graphs (MLG-13), Chicago, Illinois, USA, 2013.
[12] http://www.first-mm.eu/data.html
[13] M. Neumann, R. Garnett, C. Bauckhage, and K. Kersting. Propagation kernels: efficient graph kernels from propagated information.Machine Learning, 102(2):209–245, 2016
[14] Pinar Yanardag and S.V.N. Vishwanathan. 2015. Deep Graph Kernels. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, New York, NY, USA, 1365-1374.
[15] Francesco Orsini, Paolo Frasconi, and Luc De Raedt. 2015 Graph invariant kernels. In Proceedings of the 24th International Conference on Artificial Intelligence (IJCAI'15), Qiang Yang and Michael Wooldridge (Eds.). AAAI Press 3756-3762.
[16] Riesen, K. and Bunke, H.: IAM Graph Database Repository for Graph Based Pattern Recognition and Machine Learning. In: da Vitora Lobo, N. et al. (Eds.), SSPR&SPR 2008, LNCS, vol. 5342, pp. 287-297, 2008.
[17] AIDS Antiviral Screen Data (2004)
[18] S. A. Nene, S. K. Nayar and H. Murase. Columbia Object Image Library (COIL-100), Technical Report, Department of Computer Science, Columbia University CUCS-006-96, Feb. 1996.
[20] Jeroen Kazius, Ross McGuire and, and Roberta Bursi. Derivation and Validation of Toxicophores for Mutagenicity Prediction, Journal of Medicinal Chemistry 2005 48 (1), 312-320
[21] Christopher Morris, Nils M. Kriege, Kristian Kersting, Petra Mutzel. Faster Kernels for Graphs with Continuous Attributes via Hashing, IEEE International Conference on Data Mining (ICDM) 2016
[22] Nino Shervashidze, Pascal Schweitzer, Erik Jan van Leeuwen, Kurt Mehlhorn, and Karsten M. Borgwardt. 2011. Weisfeiler-Lehman Graph Kernels. J. Mach. Learn. Res. 12 (November 2011), 2539-2561.
[23] Nils Kriege, Petra Mutzel. 2012. Subgraph Matching Kernels for Attributed Graphs. International Conference on Machine Learning 2012.
[24] Tox21 Data Challenge 2014
[25] Nils M. Kriege, Matthias Fey, Denis Fisseler, Petra Mutzel, Frank Weichert. Recognizing Cuneiform Signs Using Graph Based Methods. International Workshop on Cost-Sensitive Learning (COST), SIAM International Conference on Data Mining (SDM) 2018, 31-44, arXiv:1802.05908
.
[26] A Repository of Benchmark Graph Datasets for Graph Classification
[27] Boris Knyazev, Graham W. Taylor, Mohamed R. Amer. Understanding Attention and Generalization in Graph Neural Networks
[28] Chemical DataSets
[29] Alchemy: A Quantum Chemistry Dataset for Benchmarking AI Models