Help:WikiPathways SPARQL queries

From WikiPathways

(Difference between revisions)
Jump to: navigation, search
(Pathway oriented queries)
Line 104: Line 104:
</PRE>
</PRE>
[http://sparqlbin.com/#91b5da3aa17531ec4217b2bd92af45f1 Sparqlbin] [http://sparql.wikipathways.org/?default-graph-uri=&query=PREFIX+rdf%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23%3E%0D%0APREFIX+rdfs%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0D%0APREFIX+dc%3A+%3Chttp%3A%2F%2Fpurl.org%2Fdc%2Felements%2F1.1%2F%3E%0D%0APREFIX+foaf%3A+%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2F%3E%0D%0APREFIX+schema%3A+%3Chttp%3A%2F%2Fschema.org%2F%3E%0D%0APREFIX+dcterms%3A+%3Chttp%3A%2F%2Fpurl.org%2Fdc%2Fterms%2F%3E%0D%0APREFIX+xsd%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema%23%3E%0D%0APREFIX+wp%3A+%3Chttp%3A%2F%2Fvocabularies.wikipathways.org%2Fwp%23%3E%0D%0APREFIX+wprdf%3A+%3Chttp%3A%2F%2Frdf.wikipathways.org%2F%3E%0D%0A%0D%0ASELECT+DISTINCT++*%0D%0AWHERE+%7B%0D%0A++++++++%3Fpathway+wp%3Acategory+wp%3AMetabolicProcess+.%0D%0A++++++++%3Fpathway+dc%3Atitle+%3Ftitle+.%0D%0A%7D+&format=text%2Fhtml&timeout=0&debug=on execute]
[http://sparqlbin.com/#91b5da3aa17531ec4217b2bd92af45f1 Sparqlbin] [http://sparql.wikipathways.org/?default-graph-uri=&query=PREFIX+rdf%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23%3E%0D%0APREFIX+rdfs%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0D%0APREFIX+dc%3A+%3Chttp%3A%2F%2Fpurl.org%2Fdc%2Felements%2F1.1%2F%3E%0D%0APREFIX+foaf%3A+%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2F%3E%0D%0APREFIX+schema%3A+%3Chttp%3A%2F%2Fschema.org%2F%3E%0D%0APREFIX+dcterms%3A+%3Chttp%3A%2F%2Fpurl.org%2Fdc%2Fterms%2F%3E%0D%0APREFIX+xsd%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema%23%3E%0D%0APREFIX+wp%3A+%3Chttp%3A%2F%2Fvocabularies.wikipathways.org%2Fwp%23%3E%0D%0APREFIX+wprdf%3A+%3Chttp%3A%2F%2Frdf.wikipathways.org%2F%3E%0D%0A%0D%0ASELECT+DISTINCT++*%0D%0AWHERE+%7B%0D%0A++++++++%3Fpathway+wp%3Acategory+wp%3AMetabolicProcess+.%0D%0A++++++++%3Fpathway+dc%3Atitle+%3Ftitle+.%0D%0A%7D+&format=text%2Fhtml&timeout=0&debug=on execute]
 +
 +
=== Get all pathways with CYP protein ===
 +
 +
select distinct ?pathway ?label where {
 +
      ?geneProduct a wp:GeneProduct .
 +
      ?geneProduct rdfs:label ?label .
 +
      ?geneProduct dcterms:isPartOf ?pathway .
 +
 +
      FILTER regex(str(?label), "CYP").
 +
      FILTER regex(str(?pathway), "^http").
 +
}
== Datasource oriented queries ==
== Datasource oriented queries ==

Revision as of 17:34, 27 November 2012

On sparql.wikipathways.org wikipathways content is replicated. Currently this SPARQL endpoint is being developed, with very irregular updates.

Contents

Prefixes

Below are example queries. For readability we have omitted the prefixes. We use the following prefixes: (Not complete yet)

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX schema: <http://schema.org/>
PREFIX wprdf:   <http://rdf.wikipathways.org/>
PREFIX wp:     <http://vocabularies.wikipathways.org/wp#>
PREFIX dcterms:   <http://purl.org/dc/terms/>
PREFIX biopax:    <http://www.biopax.org/release/biopax-level3.owl#>
PREFIX  xsd:     <http://www.w3.org/2001/XMLSchema#>
PREFIX gpml: <http://vocabularies.wikipathways.org/gpml#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

Example queries

Queries with a * requires a bit more time for results.

Data curation oriented queries

Get the pathway with the erroneous data source "null"

SELECT DISTINCT  ?identifier ?pathway ?label
WHERE {
        ?concept dc:source "null"^^xsd:string .
        ?concept dc:identifier ?identifier .
        ?concept dcterms:isPartOf ?pathway .
        ?concept rdfs:label ?label
} 

sparqlbin Execute

Get all geneproducts that lack either a DataSource or an Identifier

prefix wp:      <http://vocabularies.wikipathways.org/wp#>
prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#>
prefix dcterms:  <http://purl.org/dc/terms/>

select distinct ?pathway ?label where {?geneProduct a wp:GeneProduct . 
      ?geneProduct rdfs:label ?label .
      ?geneProduct dcterms:isPartOf ?pathway .
      
      FILTER regex(str(?geneProduct), "^node"). 
      FILTER regex(str(?pathway), "^http").
      }

sparqlbin Execute

Pathway oriented queries

Get the species currently in WikiPathways with their respective URI's

SELECT DISTINCT ?organism ?label
WHERE {
    ?concept wp:organism ?organism .
    ?organism rdfs:label ?label .
 } 

Sparqlbin

List pathways and their species

SELECT DISTINCT ?title ?label 
WHERE {
    ?pathway dc:title ?title .
    ?pathway wp:organism ?organism .
    ?organism rdfs:label ?label .
 } 

Sparqlbin

List the species captured in WikiPathways and the number of pathways per species

SELECT DISTINCT ?organism ?label count(?pathway) as ?noPathways
WHERE {
    ?pathway dc:title ?title .
    ?pathway wp:organism ?organism .
    ?organism rdfs:label ?label .
 }
ORDER BY DESC(?noPathways)

Sparqlbin

Count the pathways per pathway category

SELECT DISTINCT  ?category count(?category) as ?noCategories
WHERE {
        ?pathway wp:category ?category .
        ?pathway dc:title ?title .
} 
ORDER BY ?category

Sparqlbin execute

List all pathways of category Metabolic Process

SELECT DISTINCT  *
WHERE {
        ?pathway wp:category wp:MetabolicProcess .
        ?pathway dc:title ?title .
} 

Sparqlbin execute

Get all pathways with CYP protein

select distinct ?pathway ?label where {

     ?geneProduct a wp:GeneProduct . 
     ?geneProduct rdfs:label ?label .
     ?geneProduct dcterms:isPartOf ?pathway .
     FILTER regex(str(?label), "CYP"). 
     FILTER regex(str(?pathway), "^http"). 

}

Datasource oriented queries

Get all datasources currently captured in WikiPathways

SELECT DISTINCT ?datasource 
WHERE {
         ?concept dc:source ?datasource
} 

Execute

Get the number of entries per datasource in WikiPathways

SELECT DISTINCT ?datasource count(?datasource) as ?numberEntries 
WHERE {
        ?concept dc:source ?datasource
} 
ORDER BY DESC(?numberEntries)

Sparqlbin

Count the identifiers per data source

SELECT DISTINCT ?datasource ?identifier count(?identifier) AS ?numberEntries 
WHERE {
        ?concept dc:source ?datasource .
        ?concept dc:identifier ?identifier
} 

Sparqlbin

Count the identifiers per data source and order them from high to low

SELECT DISTINCT ?datasource ?identifier count(?identifier) AS ?numberEntries 
WHERE {
        ?concept dc:source ?datasource .
        ?concept dc:identifier ?identifier
} 
ORDER BY DESC(?numberEntries)

Sparqlbin

Return all Chembl compounds in WikiPathways and the pathways they are in

SELECT DISTINCT ?identifier ?pathway
WHERE {
        ?concept dcterms:isPartOf ?pathway .
        ?concept dc:source "ChEMBL compound"^^xsd:string .
        ?concept dc:identifier ?identifier .
        
} 

Sparqlbin

Curators oriented queries

Extract contributors

SELECT DISTINCT ?contributor  
WHERE {
       ?pathway dc:contributor ?contributor
}

Sparqlbin

Extract the amount of pathways edited per contributor

SELECT DISTINCT ?contributor, count(?pathway) as ?pathwaysEdited  
WHERE {
       ?pathway dc:contributor ?contributor
}
ORDER BY DESC(?pathwaysEdited)

Sparqlbin

find the pathways a user have edited so far.

SELECT DISTINCT ?pathway, ?pathwayLabel
WHERE {
       ?pathway dc:contributor wpuser:Andra .
       ?pathway dc:contributor ?contributor .
       ?pathway rdfs:label ?pathwayLabel .
}

Sparqlbin

Federated queries

WikiPathways with GeneWiki

SELECT DISTINCT ?wplabel ?identifier ?snp where {

                        ?s dc:identifier <http://identifiers.org/ncbigene/53975> .
                        ?s dc:identifier ?identifier .
                        ?s rdfs:label ?wplabel .
                        ?s dc:source ?source .
                    SERVICE <http://genewiki.semwebinsi.de/> {
                       ?gws dc:identifier ?identifier .
                       ?gws rdf:type ?gwtype .
                       ?gws <http://genewikiplus.org/wiki/Special:URIResolver/Property-3AHasSNP> ?snp . 
                    }

             }


prefix dc: <http://purl.org/dc/elements/1.1/>
prefix dcterms:  <http://purl.org/dc/terms/>

select distinct * where { 
            ?pwEntity dc:identifier ?identifier . 
            ?pwEntity dcterms:isPartOf ?pathway .
        SERVICE <http://genewiki.semwebinsi.de/> {
            ?concept dc:identifier <http://identifiers.org/ncbigene/12189> .
            ?concept dc:identifier ?identifier .
            ?concept <http://genewikiplus.org/wiki/Special:URIResolver/Property-3AIs_associated_with_disease> ?disease .
        }
}

WikiPathways with ChEMBL: ChEMBL compounds in WikiPathways (without BridgeDB)

SELECT *
  WHERE {{
        SELECT DISTINCT ?pathway ?concept iri(bif:concat("http://linkedchemistry.info/chembl/chemblid/", bif:regexp_substr('http://identifiers.org/chembl.compound/(.*)',?identifier, 1))) as ?ChEMBLId where {
                        ?concept dcterms:isPartOf ?pathway .
                        ?concept dc:source "ChEMBL compound"^^xsd:string .
                        ?concept dc:identifier ?identifier .     
                        FILTER regex(str(?identifier), "^http").      
        }
} SERVICE <http://rdf.farmbio.uu.se/chembl/sparql/>{
        ?ChEMBLId ?p ?o .
} }

Execute

WikiPathways with ChEMBL: all ChEMBL assays for pathways

SELECT ?pathway ?target ?assay WHERE {
{
  SELECT DISTINCT
    ?pathway ?uniprot
    iri(
      bif:concat("http://bio2rdf.org/uniprot:",
      bif:regexp_substr('http://identifiers.org/uniprot/(.*)',?uniprot, 1))
    ) as ?chembluniprot
  WHERE {
    ?s ?p ?uniprot .
    ?s dcterms:isPartOf ?pathway .
    FILTER regex(?uniprot, "uniprot")
  }
}
  SERVICE <http://rdf.farmbio.uu.se/chembl/sparql/> {
    ?target owl:sameAs ?chembluniprot .
    ?score chembl:forTarget ?target .
    ?assay chembl:hasTargetScore ?score .
}
}

Execute

WikiPathways with ChEMBL: all molecules targeting pathways

SELECT ?pathway ?target ?assay ?smiles WHERE {
{
  SELECT DISTINCT
    ?pathway ?uniprot
    iri(
      bif:concat("http://bio2rdf.org/uniprot:",
      bif:regexp_substr('http://identifiers.org/uniprot/(.*)',?uniprot, 1))
    ) as ?chembluniprot
  WHERE {
    ?s ?p ?uniprot .
    ?s dcterms:isPartOf ?pathway .
    FILTER regex(?uniprot, "uniprot")
  }
}
  SERVICE <http://rdf.farmbio.uu.se/chembl/sparql/> {
    ?target owl:sameAs ?chembluniprot .
    ?score chembl:forTarget ?target .
    ?assay chembl:hasTargetScore ?score .
    ?activity chembl:onAssay ?assay ;
      chembl:forMolecule ?molecule .
    ?molecule bo:smiles ?smiles .
    
}
}

Execute

Code examples

Java

For java we recommend the Jena Framework.

import com.hp.hpl.jena.query.Query;
import com.hp.hpl.jena.query.QueryExecution;
import com.hp.hpl.jena.query.QueryExecutionFactory;
import com.hp.hpl.jena.query.QueryFactory;
import com.hp.hpl.jena.query.QuerySolution;
import com.hp.hpl.jena.query.ResultSet;

public class javaCodeExample {

	public static void main(String[] args) {
		String sparqlQueryString = "SELECT * WHERE {?s ?p ?o} LIMIT 10";
		Query query = QueryFactory.create(sparqlQueryString);
		QueryExecution queryExecution = QueryExecutionFactory.sparqlService("http://sparql.wikipathways.org", query);
		ResultSet resultSet = queryExecution.execSelect();
		while (resultSet.hasNext()) {
			QuerySolution solution = resultSet.next();
			System.out.print(solution.get("s"));
			System.out.print("\t"+solution.get("p"));
			System.out.println("\t"+solution.get("o"));
		}
	}
}

php

For php we recommend the arc2: Easy RDF and SPARQL for LAMP systems

R

   library(rrdf)
   sparql.remote(
     "http://sparql.wikipathways.org/",
     "SELECT DISTINCT ?p WHERE { ?s ?p ?o }"
   )

Bioclipse

The below code works in both the JavaScript and the Groovy console:

   rdf.sparqlRemote(
     "http://sparql.wikipathways.org/",
     "SELECT DISTINCT ?p WHERE { ?s ?p ?o }"
   )

SPARQL from the command line

For quick and easy querying, we recommend to use curl (Linux and OS X)

curl -F "query=SELECT * WHERE {?s ?p ?o} LIMIT 10" http://sparql.wikipathways.org



Return to Help Contents

Personal tools