Help:WikiPathways SPARQL queries

From WikiPathways

Revision as of 09:10, 29 November 2012 by Andra (Talk | contribs)
Jump to: navigation, search

On sparql.wikipathways.org wikipathways content is replicated. Currently this SPARQL endpoint is being developed, with very irregular updates.

Contents

Resources

WikiPathways internal vocabularies

http://vocabularies.wikipathways.org

WikiPathways data as RDF

http://rdf.wikipathways.org

WikiPathways SPARQL endpoint

http://sparql.wikipathways.org

Other sparql endpoints

Prefixes

Below are example queries. For readability we have omitted the prefixes. We use the following prefixes: (Not complete yet)

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX schema: <http://schema.org/>
PREFIX wprdf:   <http://rdf.wikipathways.org/>
PREFIX wp:     <http://vocabularies.wikipathways.org/wp#>
PREFIX dcterms:   <http://purl.org/dc/terms/>
PREFIX biopax:    <http://www.biopax.org/release/biopax-level3.owl#>
PREFIX  xsd:     <http://www.w3.org/2001/XMLSchema#>
PREFIX gpml: <http://vocabularies.wikipathways.org/gpml#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

Example queries

Queries with a * requires a bit more time for results.

Data curation oriented queries

Get the pathway with the erroneous data source "null"

SELECT DISTINCT  ?identifier ?pathway ?label
WHERE {
        ?concept dc:source "null"^^xsd:string .
        ?concept dc:identifier ?identifier .
        ?concept dcterms:isPartOf ?pathway .
        ?concept rdfs:label ?label
} 

sparqlbin Execute

Get all geneproducts that lack either a DataSource or an Identifier

prefix wp:      <http://vocabularies.wikipathways.org/wp#>
prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#>
prefix dcterms:  <http://purl.org/dc/terms/>

select distinct ?pathway ?label where {?geneProduct a wp:GeneProduct . 
      ?geneProduct rdfs:label ?label .
      ?geneProduct dcterms:isPartOf ?pathway .
      
      FILTER regex(str(?geneProduct), "^node"). 
      FILTER regex(str(?pathway), "^http").
      }

sparqlbin Execute

Pathway oriented queries

Get the species currently in WikiPathways with their respective URI's

SELECT DISTINCT ?organism ?label
WHERE {
    ?concept wp:organism ?organism .
    ?organism rdfs:label ?label .
 } 

Sparqlbin

List pathways and their species

SELECT DISTINCT ?title ?label 
WHERE {
    ?pathway dc:title ?title .
    ?pathway wp:organism ?organism .
    ?organism rdfs:label ?label .
 } 

Sparqlbin

List the species captured in WikiPathways and the number of pathways per species

SELECT DISTINCT ?organism ?label count(?pathway) as ?noPathways
WHERE {
    ?pathway dc:title ?title .
    ?pathway wp:organism ?organism .
    ?organism rdfs:label ?label .
 }
ORDER BY DESC(?noPathways)

Sparqlbin

Count the pathways per pathway category

SELECT DISTINCT  ?category count(?category) as ?noCategories
WHERE {
        ?pathway wp:category ?category .
        ?pathway dc:title ?title .
} 
ORDER BY ?category

Sparqlbin execute

List all pathways of category Metabolic Process

SELECT DISTINCT  *
WHERE {
        ?pathway wp:category wp:MetabolicProcess .
        ?pathway dc:title ?title .
} 

Sparqlbin execute

Get all pathways with CYP protein

   select distinct ?pathway ?label where {
     ?geneProduct a wp:GeneProduct . 
     ?geneProduct rdfs:label ?label .
     ?geneProduct dcterms:isPartOf ?pathway .
   
     FILTER regex(str(?label), "CYP"). 
     FILTER regex(str(?pathway), "^http"). 
   }

Datasource oriented queries

Get all datasources currently captured in WikiPathways

SELECT DISTINCT ?datasource 
WHERE {
         ?concept dc:source ?datasource
} 

Execute

Get the number of entries per datasource in WikiPathways

SELECT DISTINCT ?datasource count(?datasource) as ?numberEntries 
WHERE {
        ?concept dc:source ?datasource
} 
ORDER BY DESC(?numberEntries)

Sparqlbin

Count the identifiers per data source

SELECT DISTINCT ?datasource ?identifier count(?identifier) AS ?numberEntries 
WHERE {
        ?concept dc:source ?datasource .
        ?concept dc:identifier ?identifier
} 

Sparqlbin

Count the identifiers per data source and order them from high to low

SELECT DISTINCT ?datasource ?identifier count(?identifier) AS ?numberEntries 
WHERE {
        ?concept dc:source ?datasource .
        ?concept dc:identifier ?identifier
} 
ORDER BY DESC(?numberEntries)

Sparqlbin

Return all Chembl compounds in WikiPathways and the pathways they are in

SELECT DISTINCT ?identifier ?pathway
WHERE {
        ?concept dcterms:isPartOf ?pathway .
        ?concept dc:source "ChEMBL compound"^^xsd:string .
        ?concept dc:identifier ?identifier .
        
} 

Sparqlbin

Curators oriented queries

Extract contributors

SELECT DISTINCT ?contributor  
WHERE {
       ?pathway dc:contributor ?contributor
}

Sparqlbin

Extract the amount of pathways edited per contributor

SELECT DISTINCT ?contributor, count(?pathway) as ?pathwaysEdited  
WHERE {
       ?pathway dc:contributor ?contributor
}
ORDER BY DESC(?pathwaysEdited)

Sparqlbin

find the pathways a user have edited so far.

SELECT DISTINCT ?pathway, ?pathwayLabel
WHERE {
       ?pathway dc:contributor wpuser:Andra .
       ?pathway dc:contributor ?contributor .
       ?pathway rdfs:label ?pathwayLabel .
}

Sparqlbin

Federated queries

WikiPathways with GeneWiki

SELECT DISTINCT ?wplabel ?identifier ?snp where {

                        ?s dc:identifier <http://identifiers.org/ncbigene/53975> .
                        ?s dc:identifier ?identifier .
                        ?s rdfs:label ?wplabel .
                        ?s dc:source ?source .
                    SERVICE <http://genewiki.semwebinsi.de/> {
                       ?gws dc:identifier ?identifier .
                       ?gws rdf:type ?gwtype .
                       ?gws <http://genewikiplus.org/wiki/Special:URIResolver/Property-3AHasSNP> ?snp . 
                    }

             }


prefix dc: <http://purl.org/dc/elements/1.1/>
prefix dcterms:  <http://purl.org/dc/terms/>

select distinct * where { 
            ?pwEntity dc:identifier ?identifier . 
            ?pwEntity dcterms:isPartOf ?pathway .
        SERVICE <http://genewiki.semwebinsi.de/> {
            ?concept dc:identifier <http://identifiers.org/ncbigene/12189> .
            ?concept dc:identifier ?identifier .
            ?concept <http://genewikiplus.org/wiki/Special:URIResolver/Property-3AIs_associated_with_disease> ?disease .
        }
}

WikiPathways with ChEMBL: ChEMBL compounds in WikiPathways (without BridgeDB)

SELECT *
  WHERE {{
        SELECT DISTINCT ?pathway ?concept iri(bif:concat("http://linkedchemistry.info/chembl/chemblid/", bif:regexp_substr('http://identifiers.org/chembl.compound/(.*)',?identifier, 1))) as ?ChEMBLId where {
                        ?concept dcterms:isPartOf ?pathway .
                        ?concept dc:source "ChEMBL compound"^^xsd:string .
                        ?concept dc:identifier ?identifier .     
                        FILTER regex(str(?identifier), "^http").      
        }
} SERVICE <http://rdf.farmbio.uu.se/chembl/sparql/>{
        ?ChEMBLId ?p ?o .
} }

Execute

WikiPathways with ChEMBL: all ChEMBL assays for pathways

SELECT ?pathway ?target ?assay WHERE {
{
  SELECT DISTINCT
    ?pathway ?uniprot
    iri(
      bif:concat("http://bio2rdf.org/uniprot:",
      bif:regexp_substr('http://identifiers.org/uniprot/(.*)',?uniprot, 1))
    ) as ?chembluniprot
  WHERE {
    ?s ?p ?uniprot .
    ?s dcterms:isPartOf ?pathway .
    FILTER regex(?uniprot, "uniprot")
  }
}
  SERVICE <http://rdf.farmbio.uu.se/chembl/sparql/> {
    ?target owl:sameAs ?chembluniprot .
    ?score chembl:forTarget ?target .
    ?assay chembl:hasTargetScore ?score .
}
}

Execute

WikiPathways with ChEMBL: all molecules targeting pathways

SELECT ?pathway ?target ?assay ?smiles WHERE {
{
  SELECT DISTINCT
    ?pathway ?uniprot
    iri(
      bif:concat("http://bio2rdf.org/uniprot:",
      bif:regexp_substr('http://identifiers.org/uniprot/(.*)',?uniprot, 1))
    ) as ?chembluniprot
  WHERE {
    ?s ?p ?uniprot .
    ?s dcterms:isPartOf ?pathway .
    FILTER regex(?uniprot, "uniprot")
  }
}
  SERVICE <http://rdf.farmbio.uu.se/chembl/sparql/> {
    ?target owl:sameAs ?chembluniprot .
    ?score chembl:forTarget ?target .
    ?assay chembl:hasTargetScore ?score .
    ?activity chembl:onAssay ?assay ;
      chembl:forMolecule ?molecule .
    ?molecule bo:smiles ?smiles .
    
}
}

Execute

Code examples

Java

For java we recommend the Jena Framework.

import com.hp.hpl.jena.query.Query;
import com.hp.hpl.jena.query.QueryExecution;
import com.hp.hpl.jena.query.QueryExecutionFactory;
import com.hp.hpl.jena.query.QueryFactory;
import com.hp.hpl.jena.query.QuerySolution;
import com.hp.hpl.jena.query.ResultSet;

public class javaCodeExample {

	public static void main(String[] args) {
		String sparqlQueryString = "SELECT * WHERE {?s ?p ?o} LIMIT 10";
		Query query = QueryFactory.create(sparqlQueryString);
		QueryExecution queryExecution = QueryExecutionFactory.sparqlService("http://sparql.wikipathways.org", query);
		ResultSet resultSet = queryExecution.execSelect();
		while (resultSet.hasNext()) {
			QuerySolution solution = resultSet.next();
			System.out.print(solution.get("s"));
			System.out.print("\t"+solution.get("p"));
			System.out.println("\t"+solution.get("o"));
		}
	}
}

php

For php we recommend the arc2: Easy RDF and SPARQL for LAMP systems

R

   library(rrdf)
   sparql.remote(
     "http://sparql.wikipathways.org/",
     "SELECT DISTINCT ?p WHERE { ?s ?p ?o }"
   )

Bioclipse

The below code works in both the JavaScript and the Groovy console:

   rdf.sparqlRemote(
     "http://sparql.wikipathways.org/",
     "SELECT DISTINCT ?p WHERE { ?s ?p ?o }"
   )

SPARQL from the command line

For quick and easy querying, we recommend to use curl (Linux and OS X)

curl -F "query=SELECT * WHERE {?s ?p ?o} LIMIT 10" http://sparql.wikipathways.org



Return to Help Contents

Personal tools