Open Datasets

Open Research Datasets

I am committed to open science and have released several high-quality datasets for the research community:

1. CrowdGleason Dataset

Type: Public prostate cancer dataset with crowdsourced annotations
Repository: Zenodo
Related Publication: The CrowdGleason dataset: Learning the Gleason grade from crowds and experts
Description: Multi-annotated prostate cancer histological images for learning from non-expert crowdsourced annotations with ground truth from expert pathologists.

2. Fusocelular Dataset

Type: Public skin cancer dataset with multiple annotators
Repository: Figshare
Related Publication: A fusocelular skin dataset with whole slide images for deep learning models
Description: Skin cancer dataset with fusocelular cell types annotated by multiple resident physicians, providing diverse perspectives on histological classification.

3. CR-AI4SkIN Dataset

Type: Public crowdsourced skin cancer dataset
Repository: Zenodo
Related Publication: Annotation protocol and crowdsourcing multiple instance learning classification of skin histological images: The CR-AI4SkIN dataset
Description: Comprehensive crowdsourced annotation dataset for skin cancer histological images with multiple non-expert annotators, designed for multiple instance learning approaches.


Dataset Characteristics

All released datasets feature:

  • Multi-annotator annotations for robust annotation quality assessment
  • High-resolution histological images suitable for deep learning
  • Comprehensive metadata including hospital source, tissue types, and pathology information
  • Open-source licenses for reproducible research
  • Detailed documentation and usage guidelines

These datasets have been instrumental in developing machine learning methods that can effectively learn from crowdsourced annotations and handle multiple expert opinions.