Help:WikiPathways SPARQL queries
From WikiPathways
(→Extract the amount of pathways edited per contributor) |
(→Extract contributors) |
||
| Line 969: | Line 969: | ||
[http://sparql.wikipathways.org/?default-graph-uri=&query=PREFIX+identifiers%3A%3Chttp%3A%2F%2Fidentifiers.org%2Fensembl%2F%3E%0D%0APREFIX+atlas%3A+%3Chttp%3A%2F%2Frdf.ebi.ac.uk%2Fresource%2Fatlas%2F%3E%0D%0APREFIX+atlasterms%3A+%3Chttp%3A%2F%2Frdf.ebi.ac.uk%2Fterms%2Fatlas%2F%3E%0D%0APREFIX+efo%3A+%3Chttp%3A%2F%2Fwww.ebi.ac.uk%2Fefo%2F%3E%0D%0A%0D%0ASELECT+DISTINCT+%3FwpURL+%3FpwTitle+%3FexpressionValue+%3Fpvalue+where+%7B%0D%0A%0D%0ASERVICE+%3Chttp%3A%2F%2Fwww.ebi.ac.uk%2Frdf%2Fservices%2Fatlas%2Fsparql%3E+%7B%0D%0A+++++%3Ffactor+rdf%3Atype+efo%3AEFO_0001073+.+%0D%0A+++++%3Fvalue+atlasterms%3AhasFactorValue+%3Ffactor+.+%0D%0A+++++%3Fvalue+atlasterms%3AisMeasurementOf+%3Fprobe+.+%0D%0A+++++%3Fvalue+atlasterms%3ApValue+%3Fpvalue+.+%0D%0A+++++%3Fvalue+rdfs%3Alabel+%3FexpressionValue+.+%0D%0A+++++%3Fprobe+atlasterms%3AdbXref+%3FdbXref+.%0D%0A%7D%0D%0A+++++%3FpwElement+dcterms%3AisPartOf+%3Fpathway+.%0D%0A+++++%3Fpathway+dc%3Atitle+%3FpwTitle+.%0D%0A+++++%3Fpathway+dc%3Aidentifier+%3FwpURL+.%0D%0A+++++%3FpwElement+wp%3AbdbEnsembl+%3FdbXref+.%0D%0A%7D%0D%0AORDER+BY+ASC%28%3Fpvalue%29&format=text%2Fhtml&timeout=0&debug=on Execute] | [http://sparql.wikipathways.org/?default-graph-uri=&query=PREFIX+identifiers%3A%3Chttp%3A%2F%2Fidentifiers.org%2Fensembl%2F%3E%0D%0APREFIX+atlas%3A+%3Chttp%3A%2F%2Frdf.ebi.ac.uk%2Fresource%2Fatlas%2F%3E%0D%0APREFIX+atlasterms%3A+%3Chttp%3A%2F%2Frdf.ebi.ac.uk%2Fterms%2Fatlas%2F%3E%0D%0APREFIX+efo%3A+%3Chttp%3A%2F%2Fwww.ebi.ac.uk%2Fefo%2F%3E%0D%0A%0D%0ASELECT+DISTINCT+%3FwpURL+%3FpwTitle+%3FexpressionValue+%3Fpvalue+where+%7B%0D%0A%0D%0ASERVICE+%3Chttp%3A%2F%2Fwww.ebi.ac.uk%2Frdf%2Fservices%2Fatlas%2Fsparql%3E+%7B%0D%0A+++++%3Ffactor+rdf%3Atype+efo%3AEFO_0001073+.+%0D%0A+++++%3Fvalue+atlasterms%3AhasFactorValue+%3Ffactor+.+%0D%0A+++++%3Fvalue+atlasterms%3AisMeasurementOf+%3Fprobe+.+%0D%0A+++++%3Fvalue+atlasterms%3ApValue+%3Fpvalue+.+%0D%0A+++++%3Fvalue+rdfs%3Alabel+%3FexpressionValue+.+%0D%0A+++++%3Fprobe+atlasterms%3AdbXref+%3FdbXref+.%0D%0A%7D%0D%0A+++++%3FpwElement+dcterms%3AisPartOf+%3Fpathway+.%0D%0A+++++%3Fpathway+dc%3Atitle+%3FpwTitle+.%0D%0A+++++%3Fpathway+dc%3Aidentifier+%3FwpURL+.%0D%0A+++++%3FpwElement+wp%3AbdbEnsembl+%3FdbXref+.%0D%0A%7D%0D%0AORDER+BY+ASC%28%3Fpvalue%29&format=text%2Fhtml&timeout=0&debug=on Execute] | ||
| + | == Contributors - !Under Construction! == | ||
| + | Currently tracking contributions made by users is not possible | ||
===Extract contributors=== | ===Extract contributors=== | ||
<pre>SELECT DISTINCT ?contributor | <pre>SELECT DISTINCT ?contributor | ||
Revision as of 12:52, 26 February 2015
On http://sparql.wikipathways.org/ wikipathways content is replicated. Currently this SPARQL endpoint is being developed, with very irregular updates.
Resources
- WikiPathways internal vocabularies: http://vocabularies.wikipathways.org
- WikiPathways SPARQL endpoint http://sparql.wikipathways.org
- Identifiers.org: http://identifiers.org
- Sparqlbin http://sparqlbin.org
- Searches prefixes: http://prefix.cc
Submit ideas
Prefixes
Below are example queries. For readability we have omitted the prefixes. We use the following prefixes: (Not complete yet)
PREFIX gpml: <http://vocabularies.wikipathways.org/gpml#> PREFIX wp: <http://vocabularies.wikipathways.org/wp#> PREFIX wprdf: <http://rdf.wikipathways.org/> PREFIX biopax: <http://www.biopax.org/release/biopax-level3.owl#> PREFIX cas: <http://identifiers.org/cas/> PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX dcterms: <http://purl.org/dc/terms/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX ncbigene:<http://identifiers.org/ncbigene/> PREFIX pubmed: <http://www.ncbi.nlm.nih.gov/pubmed/> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX skos: <http://www.w3.org/2004/02/skos/core#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
Example queries
Queries with a * require a bit more time for results.
Pathway oriented queries
Get the species currently in WikiPathways with their respective URI's
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX wp: <http://vocabularies.wikipathways.org/wp#>
SELECT DISTINCT ?organism ?label
WHERE {
?concept wp:organism ?organism .
?organism rdfs:label ?label .
}
List pathways and their species
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX wp: <http://vocabularies.wikipathways.org/wp#>
SELECT DISTINCT ?title ?label
WHERE {
?pathway dc:title ?title .
?pathway wp:organism ?organism .
?organism rdfs:label ?label .
}
List the species captured in WikiPathways and the number of pathways per species
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX wp: <http://vocabularies.wikipathways.org/wp#>
SELECT DISTINCT ?organism ?label count(?pathway) as ?noPathways
WHERE {
?pathway dc:title ?title .
?pathway wp:organism ?organism .
?organism rdfs:label ?label .
}
ORDER BY DESC(?noPathways)
List all pathways for species "Mus musculus"
The following query list all mouse pathways. ?wpIdentifier is the link through identifiers.org, ?pathway points to the rdf version of wikipathways and ?page is the revision which is loaded in the sparql endpoint.
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX wp: <http://vocabularies.wikipathways.org/wp#>
SELECT DISTINCT ?wpIdentifier ?pathway ?page
WHERE {
?pathway dc:title ?title .
?pathway wp:organism ?organism .
?pathway foaf:page ?page .
?pathway dc:identifier ?wpIdentifier .
?organism rdfs:label "Mus musculus"^^<http://www.w3.org/2001/XMLSchema#string> .
}
ORDER BY ?wpIdentifier
List all mouse pathways that require curation attention
The following query lists all pathways for the mouse that contains elements that requires attention.. It lists the canonical identifier (ie the page that always point to the latest revision), the wiki page with the latest revision loaded in the Sparql endpoint and the last URI of that page.
PREFIX gpml: <http://vocabularies.wikipathways.org/gpml#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX wp: <http://vocabularies.wikipathways.org/wp#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
SELECT DISTINCT ?wpIdentifier ?elementneedsattention ?elementLabel
WHERE {
?pathway dc:title ?title .
?elementneedsattention a gpml:requiresCurationAttention .
?elementneedsattention dcterms:isPartOf ?pathway .
?elementneedsattention rdfs:label ?elementLabel .
?pathway wp:organism ?organism .
?pathway foaf:page ?page .
?pathway dc:identifier ?wpIdentifier .
?organism rdfs:label "Mus musculus"^^<http://www.w3.org/2001/XMLSchema#string> .
}
ORDER BY ?wpIdentifier
Count the pathways per pathway category
SELECT DISTINCT ?category count(?category) as ?noCategories
WHERE {
?pathway wp:category ?category .
?pathway dc:title ?title .
}
ORDER BY ?category
List all pathways of category Metabolic Process
SELECT DISTINCT *
WHERE {
?pathway wp:category wp:MetabolicProcess .
?pathway dc:title ?title .
}
Get all pathways with a particular gene
List all pathways per instance of a particular gene or protein (wp:GeneProduct)
select distinct ?pathway ?label where {
?geneProduct a wp:GeneProduct .
?geneProduct rdfs:label ?label .
?geneProduct dcterms:isPartOf ?pathway .
FILTER regex(str(?label), "CYP").
}
Get all groups and complexes containing a particular gene
List all groups and complexes per instance of a particular gene or protein (wp:GeneProduct)
select distinct ?pathway ?label where {
?geneProduct a wp:GeneProduct .
?geneProduct rdfs:label ?label .
?geneProduct dcterms:isPartOf ?pathway .
FILTER regex(str(?label), "CYP").
FILTER regex(str(?pathway), "group")
}
Get all the genes on a particular pathway
List all the genes and proteins (wp:GeneProduct) associated with a particular pathway WPID.
select distinct ?pathway ?label where {
?geneProduct a wp:GeneProduct .
?geneProduct rdfs:label ?label .
?geneProduct dcterms:isPartOf ?pathway .
FILTER regex(str(?pathway), "WP615").
FILTER (! regex(str(?pathway), "group"))
}
Count the number of pathways per ontology term
In WikiPathways, pathways can be tagged with ontology terms from Pathway, Cell Line and Disease ontology. The following query returns a pathway count for each term from any of the available ontologies. These terms are collectively modeled as wp:pathwayOntology; but this includes all ontologies, not just the "Pathway" ontology.
SELECT DISTINCT ?pwOntologyTerm count(?pwOntologyTerm) as ?pathwayCount
WHERE {
?pathwayRDF wp:pathwayOntology ?pwOntologyTerm .
}
ORDER BY DESC(?pathwayCount)
Get all pathways with a particular ontology term
In WikiPathways, pathways can be tagged with ontology terms from Pathway, Cell Line and Disease ontology. The following query returns a list of pathways tagged with a given term from any of the supported ontologies. These terms are collectively modeled as wp:pathwayOntology; but this includes all ontologies, not just the "Pathway" ontology.
SELECT ?label as ?pwOntologyTerm ?pathway
WHERE {
?pathwayRDF wp:pathwayOntology ?o .
?pathwayRDF foaf:page ?pathway .
?pathwayRDF dc:title ?title .
?o <http://www.w3.org/2000/01/rdf-schema#label> ?label .
?o rdfs:subClassOf ?superClass .
?superClass rdfs:label ?superClassLabel .
FILTER regex(str(?label), "^cancer$")
}
Get all ontology terms for a particular pathway
List all the ontology terms tagged on a particular pathway.
SELECT ?o as ?pwOntologyTerm ?pathway
WHERE {
?pathwayRDF wp:pathwayOntology ?o .
?pathwayRDF foaf:page ?pathway .
?pathwayRDF dc:title ?title .
FILTER regex(str(?pathway), "WP615").
FILTER (! regex(str(?pathway), "group"))
}
Get all pathways with Pubmed references
SELECT DISTINCT ?pathway ?pubmed
WHERE
{?pubmed a wp:PublicationReference .
?pubmed dcterms:isPartOf ?pathway }
ORDER BY ?pathway
Get all pathways with a particular Pubmed reference
SELECT DISTINCT ?pathway ?pubmed
WHERE {
?pubmed a wp:PublicationReference .
?pubmed dcterms:isPartOf ?pathway .
FILTER regex(str(?pubmed), "14769483")
}
ORDER BY ?pathway
Get all pathways and the number of refences per pathway
SELECT DISTINCT ?pathway COUNT(?pubmed) AS ?numberOfReferences
WHERE
{?pubmed a wp:PublicationReference .
?pubmed dcterms:isPartOf ?pathway }
ORDER BY DESC(?numberOfReferences)
Get a full dump of all pathways from the analytical set and they pathway ontological terms
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX schema: <http://schema.org/>
PREFIX wp: <http://vocabularies.wikipathways.org/wp#>
PREFIX dcterms: <http://purl.org/dc/terms/>
SELECT DISTINCT ?depicts ?title ?speciesLabel ?identifier ?ontology ?label
WHERE {
?pathway foaf:page ?depicts .
?pathway dc:title ?title .
?pathway wp:organism ?species .
?species rdfs:label ?speciesLabel .
?pathway dc:identifier ?identifier .
OPTIONAL {?pathway wp:pathwayOntology ?ontology .}
OPTIONAL {?ontology rdfs:label ?label .}
}
Get a list of Reactome pathways
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX schema: <http://schema.org/>
PREFIX wp: <http://vocabularies.wikipathways.org/wp#>
PREFIX dcterms: <http://purl.org/dc/terms/>
SELECT DISTINCT ?depicts ?title ?speciesLabel ?identifier ?ontology ?label
WHERE {
?pathway foaf:page ?depicts .
?pathway dc:title ?title .
?pathway wp:organism ?species .
?species rdfs:label ?speciesLabel .
?pathway dc:identifier ?identifier .
OPTIONAL {?pathway wp:pathwayOntology ?ontology .
?ontology rdfs:label ?label .}
}
Interaction oriented queries
Get all interactions of a particular datanode
Find all datanodes (GeneProducts, Metabolites, Pathways) that are connected to a particular datanode via any type of interaction (gpml:Interaction).
PREFIX gpml: <http://vocabularies.wikipathways.org/gpml#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT DISTINCT ?wpIdentifier ?dn2Identifier WHERE {
?pathway dc:identifier ?wpIdentifier .
{SELECT DISTINCT * WHERE {
?datanode2 dc:identifier ?dn2Identifier .
?datanode2 a gpml:DataNode .
?datanode2 dcterms:isPartOf ?pathway .
?datanode2 gpml:graphid ?dn2GraphId .
?line gpml:graphref ?dn2GraphId .
FILTER (?datanode2 != ?datanode1)
FILTER (?datanode2 != <http://commonchemistry.org/ChemicalDetail.aspx?ref=noIdentifier>)
{SELECT DISTINCT * WHERE {
?datanode1 dc:identifier <http://identifiers.org/hmdb/HMDB01586> .
?datanode1 gpml:graphid ?dn1GraphId .
?datanode1 a gpml:DataNode .
?datanode1 dcterms:isPartOf ?pathway .
?line gpml:graphref ?dn1GraphId .
?line a gpml:Interaction .
?line gpml:graphid ?lineGraphId .
?line dcterms:isPartOf ?pathway .}}
}}
}
Get all interactions per pathway
Limited to first 1000 interactions
PREFIX gpml: <http://vocabularies.wikipathways.org/gpml#>
SELECT DISTINCT ?wpIdentifier ?dn1Identifier ?dn2Identifier WHERE {
?pathway dc:identifier ?wpIdentifier .
{SELECT DISTINCT * WHERE {
?datanode2 dc:identifier ?dn2Identifier .
?datanode2 a gpml:DataNode .
?datanode2 dcterms:isPartOf ?pathway .
?datanode2 gpml:graphid ?dn2GraphId .
?line gpml:graphref ?dn2GraphId .
FILTER (?datanode2 != ?datanode1)
FILTER (!regex(str(?datanode2), "noIdentifier")) .
{SELECT DISTINCT * WHERE {
?datanode1 dc:identifier ?dn1Identifier .
?datanode1 gpml:graphid ?dn1GraphId .
?datanode1 a gpml:DataNode .
?datanode1 dcterms:isPartOf ?pathway .
?line gpml:graphref ?dn1GraphId .
?line a gpml:Interaction .
?line gpml:graphid ?lineGraphId .
?line dcterms:isPartOf ?pathway .}}
FILTER (!regex(str(?datanode1), "noIdentifier")) .
}}
}
LIMIT 1000
Get all Interactions
Limited to first 1000 interactions
PREFIX gpml: <http://vocabularies.wikipathways.org/gpml#>
SELECT DISTINCT ?dn1Identifier ?dn2Identifier WHERE {
?pathway dc:identifier ?wpIdentifier .
{SELECT DISTINCT * WHERE {
?datanode2 dc:identifier ?dn2Identifier .
?datanode2 a gpml:DataNode .
?datanode2 dcterms:isPartOf ?pathway .
?datanode2 gpml:graphid ?dn2GraphId .
?line gpml:graphref ?dn2GraphId .
FILTER (?datanode2 != ?datanode1)
FILTER (!regex(str(?datanode2), "noIdentifier")) .
{SELECT DISTINCT * WHERE {
?datanode1 dc:identifier ?dn1Identifier .
?datanode1 gpml:graphid ?dn1GraphId .
?datanode1 a gpml:DataNode .
?datanode1 dcterms:isPartOf ?pathway .
?line gpml:graphref ?dn1GraphId .
?line a gpml:Interaction .
?line gpml:graphid ?lineGraphId .
?line dcterms:isPartOf ?pathway .}}
FILTER (!regex(str(?datanode1), "noIdentifier")) .
}}
}
LIMIT 1000
Datasource oriented queries
Get all datasources currently captured in WikiPathways
SELECT DISTINCT ?datasource
WHERE {
?concept dc:source ?datasource
}
Get the number of entries per datasource in WikiPathways
SELECT DISTINCT ?datasource count(?datasource) as ?numberEntries
WHERE {
?concept dc:source ?datasource
}
ORDER BY DESC(?numberEntries)
Count the identifiers per data source
SELECT DISTINCT ?datasource ?identifier count(?identifier) AS ?numberEntries
WHERE {
?concept dc:source ?datasource .
?concept dc:identifier ?identifier
}
Count the identifiers per data source and order them from high to low
SELECT DISTINCT ?datasource ?identifier count(?identifier) AS ?numberEntries
WHERE {
?concept dc:source ?datasource .
?concept dc:identifier ?identifier
}
ORDER BY DESC(?numberEntries)
Return all Chembl compounds in WikiPathways and the pathways they are in
SELECT DISTINCT ?identifier ?pathway
WHERE {
?concept dcterms:isPartOf ?pathway .
?concept dc:source "ChEMBL compound"^^xsd:string .
?concept dc:identifier ?identifier .
}
Curators oriented queries
Get the pathway with the erroneous data source "null"
SELECT DISTINCT ?identifier ?pathway ?label
WHERE {
?concept dc:source "null"^^xsd:string .
?concept dc:identifier ?identifier .
?concept dcterms:isPartOf ?pathway .
?concept rdfs:label ?label
}
Get all geneproducts that lack either a DataSource or an Identifier
prefix wp: <http://vocabularies.wikipathways.org/wp#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix dcterms: <http://purl.org/dc/terms/>
select distinct ?pathway ?label where {?geneProduct a wp:GeneProduct .
?geneProduct rdfs:label ?label .
?geneProduct dcterms:isPartOf ?pathway .
FILTER regex(str(?geneProduct), "^node").
FILTER regex(str(?pathway), "^http").
}
Get entities with more than one identifier
select ?entity count(?identifier) as ?count where {
?entity <http://purl.org/dc/terms/identifier> ?identifier .
} order by desc(?count)
Extract contributors
SELECT DISTINCT ?contributor
WHERE {
?pathway dc:contributor ?contributor
}
Extract the amount of pathways edited per contributor
SELECT DISTINCT ?contributor, count(?pathway) as ?pathwaysEdited
WHERE {
?pathway dc:contributor ?contributor
}
ORDER BY DESC(?pathwaysEdited)
find the pathways a user have edited so far.
SELECT DISTINCT ?pathway, ?pathwayLabel
WHERE {
?pathway dc:contributor wpuser:Andra .
?pathway dc:contributor ?contributor .
?pathway rdfs:label ?pathwayLabel .
}
PubChem-compound 1004
Wrongly used for phosphate. It is the uncharged compound. Phosphate is, instead, and particularly thinkgs like "Pi", CID 1061 for ortho-phosphate, aka [PO4]2-.
prefix wp: <http://vocabularies.wikipathways.org/wp#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix dcterms: <http://purl.org/dc/terms/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
select ?pathway ?source
where {
?mb dc:source ?source ;
dcterms:isPartOf ?pathway ;
dcterms:identifier "1004"^^xsd:string .
}
Outdated HMDB identifiers
These results show HMDB identifiers used in WikiPathways but that are revoked or have become secondary identifiers.
prefix wp: <http://vocabularies.wikipathways.org/wp#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix dcterms: <http://purl.org/dc/terms/>
select distinct ?identifier
where {
?mb a wp:Metabolite ;
dc:source "HMDB"^^xsd:string ;
dc:identifier ?identifier .
OPTIONAL { ?mb wp:bdbHmdb ?bridgedb . }
FILTER (!BOUND(?bridgedb))
} order by ?identifier
Metabolites not classified as such
One can list all data sources for non-metabolites with this query.
prefix wp: <http://vocabularies.wikipathways.org/wp#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix dcterms: <http://purl.org/dc/terms/>
select ?datasource count(?identifier) as ?count
where {
?mb dc:source ?datasource ;
dcterms:identifier ?identifier .
FILTER NOT EXISTS { ?mb a wp:Metabolite }
} order by desc(?count)
That mostly lists gene identifier sources, etc, but watch out for the metabolite identifier data sources. For example, metabolites not marked as such but with a metabolite identifier can be found this way. Down the list is CAS (but genes are chemicals too...), and a few minor more:
"CTD Gene"^^<http://www.w3.org/2001/XMLSchema#string> 5 "HMDB"^^<http://www.w3.org/2001/XMLSchema#string> 4 "ChEBI"^^<http://www.w3.org/2001/XMLSchema#string> 3 "GLYCAN"^^<http://www.w3.org/2001/XMLSchema#string> 3 "COMPOUND"^^<http://www.w3.org/2001/XMLSchema#string> 3 "PubChem"^^<http://www.w3.org/2001/XMLSchema#string> 2
I would expect GLYCAN and COMPOUND to be misnomers of the matching KEGG subsets.
Non-Metabolites with CAS identifier
Note that a CAS identifier can also refer to mixtures, compound classes, etc.
prefix wp: <http://vocabularies.wikipathways.org/wp#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix dcterms: <http://purl.org/dc/terms/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct ?pathway ?mb str(?label) as ?name str(?identifier) as ?id
where {
?mb dc:source "CAS"^^xsd:string ;
rdfs:label ?label ;
dcterms:identifier ?identifier ;
dcterms:isPartOf ?pathway .
FILTER NOT EXISTS { ?mb a wp:Metabolite }
} order by ?pathway
Non-Metabolites with PubChem identifier
At the time of writing, this results in an empty set.
prefix wp: <http://vocabularies.wikipathways.org/wp#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix dcterms: <http://purl.org/dc/terms/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct ?pathway ?mb ?label ?identifier
where {
?mb dc:source "PubChem-compound"^^xsd:string ;
dcterms:identifier ?identifier ;
dcterms:isPartOf ?pathway .
OPTIONAL { ?mb rdfs:label ?label . }
FILTER NOT EXISTS { ?mb a wp:Metabolite }
} order by ?pathway
Metabolites sometimes marked as DataNode@Type Metabolite
Based on label comparisons, we can find things that are labeled the same as a data node with the same label. Of course, this can give false positives, because genes can be incorrectly marked as metabolite in some pathway, but that is another SPARQL query.
prefix wp: <http://vocabularies.wikipathways.org/wp#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix dcterms: <http://purl.org/dc/terms/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
select ?pathway ?nonmb ?mb ?label
where {
?nonmb rdfs:label ?label .
?mb rdfs:label ?label .
OPTIONAL { ?nonmb dcterms:isPartOf ?pathway . }
FILTER ( ?nonmb != ?mb )
FILTER NOT EXISTS { ?nonmb a wp:Metabolite }
FILTER EXISTS { ?mb a wp:Metabolite }
FILTER (!regex(str(?nonmb), "noIdentifier", "i"))
FILTER (!regex(str(?mb), "noIdentifier", "i"))
}
Metabolites with an identifier but undefined data source
prefix wp: <http://vocabularies.wikipathways.org/wp#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix dcterms: <http://purl.org/dc/terms/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct ?pathway ?mb ?identifier
where {
?mb a wp:Metabolite ;
dc:source ""^^xsd:string ;
dc:identifier ?identifier ;
dcterms:isPartOf ?pathway .
FILTER (!isIRI(?identifier))
FILTER (str(?identifier) != "")
} order by ?pathway
Metabolites with a data source but no identifier
prefix wp: <http://vocabularies.wikipathways.org/wp#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix dcterms: <http://purl.org/dc/terms/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct ?pathway ?mb ?source
where {
?mb a wp:Metabolite ;
dcterms:identifier ""^^xsd:string ;
dc:source ?source ;
dcterms:isPartOf ?pathway .
FILTER (str(?source) != "")
FILTER (!regex(str(?pathway), "internal.wikipathways.org", "i"))
} order by ?pathway
Metabolites with too many labels
This is particularly caused by the metabolite URIs to be based on a non-existing identifier:
prefix wp: <http://vocabularies.wikipathways.org/wp#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix dcterms: <http://purl.org/dc/terms/>
select distinct count(?label) as ?count ?pathway ?mb
where {
?mb a wp:Metabolite ;
rdfs:label ?label ;
dcterms:isPartOf ?pathway .
} order by desc(?count) ?pathway ?mb limit 410
An example such entity with many labels and being both a metabolite, gene, complex, etc:
prefix wp: <http://vocabularies.wikipathways.org/wp#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix dcterms: <http://purl.org/dc/terms/>
select distinct str(?label) ?type
where {
<http://bio2rdf.org/geneid:noIdentifier> a ?type ; rdfs:label ?label .
} order by ?label
Metabolites with an Entrez Gene identifier
prefix wp: <http://vocabularies.wikipathways.org/wp#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix dcterms: <http://purl.org/dc/terms/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct ?pathway ?mb ?label ?identifier
where {
?mb a wp:Metabolite ;
rdfs:label ?label ;
dc:source "Entrez Gene"^^xsd:string ;
dcterms:identifier ?identifier ;
dcterms:isPartOf ?pathway .
FILTER (str(?identifier) != "")
} order by ?pathway
Federated queries - !Under Construction!
Other SPARQL endpoints used in the federated queries
- Arrayexpress Atlas: http://wwwdev.ebi.ac.uk/fgpt/gxa-sparql/index.jsp
- Gene Wiki: http://genewiki.semwebinsi.de
- ChEMBL: http://rdf.farmbio.uu.se/chembl/sparql/ (This should be replaced by the new RDF endpoint from ChEMBL itself)
- Text mining from Fraunhofer Institutes at University of Bonn: http://ops-virtuoso.scai.fraunhofer.de:8893/sparql
WikiPathways with GeneWiki
SELECT DISTINCT ?wplabel ?identifier ?snp where {
?s dc:identifier <http://identifiers.org/ncbigene/53975> .
?s dc:identifier ?identifier .
?s rdfs:label ?wplabel .
?s dc:source ?source .
SERVICE <http://genewiki.semwebinsi.de/> {
?gws dc:identifier ?identifier .
?gws rdf:type ?gwtype .
?gws <http://genewikiplus.org/wiki/Special:URIResolver/Property-3AHasSNP> ?snp .
}
}
prefix dc: <http://purl.org/dc/elements/1.1/>
prefix dcterms: <http://purl.org/dc/terms/>
select distinct * where {
?pwEntity dc:identifier ?identifier .
?pwEntity dcterms:isPartOf ?pathway .
SERVICE <http://genewiki.semwebinsi.de/> {
?concept dc:identifier <http://identifiers.org/ncbigene/12189> .
?concept dc:identifier ?identifier .
?concept <http://genewikiplus.org/wiki/Special:URIResolver/Property-3AIs_associated_with_disease> ?disease .
}
}
WikiPathways with ChEMBL: ChEMBL compounds in WikiPathways (without BridgeDb)
SELECT *
WHERE {{
SELECT DISTINCT ?pathway ?concept iri(bif:concat("http://linkedchemistry.info/chembl/chemblid/", bif:regexp_substr('http://identifiers.org/chembl.compound/(.*)',?identifier, 1))) as ?ChEMBLId where {
?concept dcterms:isPartOf ?pathway .
?concept dc:source "ChEMBL compound"^^xsd:string .
?concept dc:identifier ?identifier .
FILTER regex(str(?identifier), "^http").
}
} SERVICE <http://rdf.farmbio.uu.se/chembl/sparql/>{
?ChEMBLId ?p ?o .
} }
WikiPathways with ChEMBL: all ChEMBL assays for pathways
SELECT ?pathway ?target ?assay WHERE {
{
SELECT DISTINCT
?pathway ?uniprot
iri(
bif:concat("http://bio2rdf.org/uniprot:",
bif:regexp_substr('http://identifiers.org/uniprot/(.*)',?uniprot, 1))
) as ?chembluniprot
WHERE {
?s ?p ?uniprot .
?s dcterms:isPartOf ?pathway .
FILTER regex(?uniprot, "uniprot")
}
}
SERVICE <http://rdf.farmbio.uu.se/chembl/sparql/> {
?target owl:sameAs ?chembluniprot .
?score chembl:forTarget ?target .
?assay chembl:hasTargetScore ?score .
}
}
WikiPathways with ChEMBL: all molecules targeting pathways
SELECT ?pathway ?target ?assay ?smiles WHERE {
{
SELECT DISTINCT
?pathway ?uniprot
iri(
bif:concat("http://bio2rdf.org/uniprot:",
bif:regexp_substr('http://identifiers.org/uniprot/(.*)',?uniprot, 1))
) as ?chembluniprot
WHERE {
?s ?p ?uniprot .
?s dcterms:isPartOf ?pathway .
FILTER regex(?uniprot, "uniprot")
}
}
SERVICE <http://rdf.farmbio.uu.se/chembl/sparql/> {
?target owl:sameAs ?chembluniprot .
?score chembl:forTarget ?target .
?assay chembl:hasTargetScore ?score .
?activity chembl:onAssay ?assay ;
chembl:forMolecule ?molecule .
?molecule bo:smiles ?smiles .
}
}
WikiPathways with EBI Atlas RDF - !Under Construction!
Genes differentially expressed in asthma and Pathways
For the genes differentially expressed in asthma, get the gene products associated to a WikiPathways pathway. (Built upon example query 5 in: http://www.ebi.ac.uk/rdf/services/atlas/sparql ). You can substitute the EFO number for other disease codes.
PREFIX identifiers:<http://identifiers.org/ensembl/>
PREFIX atlas: <http://rdf.ebi.ac.uk/resource/atlas/>
PREFIX atlasterms: <http://rdf.ebi.ac.uk/terms/atlas/>
PREFIX efo: <http://www.ebi.ac.uk/efo/>
SELECT DISTINCT ?wpURL ?pwTitle ?expressionValue ?pvalue where {
SERVICE <http://www.ebi.ac.uk/rdf/services/atlas/sparql> {
?factor rdf:type efo:EFO_0000270 .
?value atlasterms:hasFactorValue ?factor .
?value atlasterms:isMeasurementOf ?probe .
?value atlasterms:pValue ?pvalue .
?value rdfs:label ?expressionValue .
?probe atlasterms:dbXref ?dbXref .
}
?pwElement dcterms:isPartOf ?pathway .
?pathway dc:title ?pwTitle .
?pathway dc:identifier ?wpURL .
?pwElement wp:bdbEnsembl ?dbXref .
}
ORDER BY ASC(?pvalue)
Genes differentially expressed in type II diabetes mellitus and Pathways
PREFIX identifiers:<http://identifiers.org/ensembl/>
PREFIX atlas: <http://rdf.ebi.ac.uk/resource/atlas/>
PREFIX atlasterms: <http://rdf.ebi.ac.uk/terms/atlas/>
PREFIX efo: <http://www.ebi.ac.uk/efo/>
SELECT DISTINCT ?wpURL ?pwTitle ?expressionValue ?pvalue where {
SERVICE <http://www.ebi.ac.uk/rdf/services/atlas/sparql> {
?factor rdf:type efo:EFO_0001360 .
?value atlasterms:hasFactorValue ?factor .
?value atlasterms:isMeasurementOf ?probe .
?value atlasterms:pValue ?pvalue .
?value rdfs:label ?expressionValue .
?probe atlasterms:dbXref ?dbXref .
}
?pwElement dcterms:isPartOf ?pathway .
?pathway dc:title ?pwTitle .
?pathway dc:identifier ?wpURL .
?pwElement wp:bdbEnsembl ?dbXref .
}
ORDER BY ASC(?pvalue)
Genes differentially expressed in obesity and Pathways
PREFIX identifiers:<http://identifiers.org/ensembl/>
PREFIX atlas: <http://rdf.ebi.ac.uk/resource/atlas/>
PREFIX atlasterms: <http://rdf.ebi.ac.uk/terms/atlas/>
PREFIX efo: <http://www.ebi.ac.uk/efo/>
SELECT DISTINCT ?wpURL ?pwTitle ?expressionValue ?pvalue where {
SERVICE <http://www.ebi.ac.uk/rdf/services/atlas/sparql> {
?factor rdf:type efo:EFO_0001073 .
?value atlasterms:hasFactorValue ?factor .
?value atlasterms:isMeasurementOf ?probe .
?value atlasterms:pValue ?pvalue .
?value rdfs:label ?expressionValue .
?probe atlasterms:dbXref ?dbXref .
}
?pwElement dcterms:isPartOf ?pathway .
?pathway dc:title ?pwTitle .
?pathway dc:identifier ?wpURL .
?pwElement wp:bdbEnsembl ?dbXref .
}
ORDER BY ASC(?pvalue)
Contributors - !Under Construction!
Currently tracking contributions made by users is not possible
Extract contributors
SELECT DISTINCT ?contributor
WHERE {
?pathway dc:contributor ?contributor
}
Extract the amount of pathways edited per contributor
SELECT DISTINCT ?contributor, count(?pathway) as ?pathwaysEdited
WHERE {
?pathway dc:contributor ?contributor
}
ORDER BY DESC(?pathwaysEdited)
find the pathways a user have edited so far.
SELECT DISTINCT ?pathway, ?pathwayLabel
WHERE {
?pathway dc:contributor wpuser:Andra .
?pathway dc:contributor ?contributor .
?pathway rdfs:label ?pathwayLabel .
}
Code examples
Perl
There is an RDF api available. Below is an example that extracts the data by converting the query into a url and extracts the data as CSV.
#!/usr/bin/perl
use LWP::Simple;
use URI::Escape;
my $sparql = "SELECT DISTINCT ?wpIdentifier ?elementneedsattention ?elementLabel
WHERE {
?pathway dc:title ?title .
?elementneedsattention a gpml:requiresCurationAttention .
?elementneedsattention dcterms:isPartOf ?pathway .
?elementneedsattention rdfs:label ?elementLabel .
?pathway wp:organism ?organism .
?pathway foaf:page ?page .
?pathway dc:identifier ?wpIdentifier .
?organism rdfs:label \"Mus musculus\"^^<http://www.w3.org/2001/XMLSchema#string> .
}
ORDER BY ?wpIdentifier";
my $url = 'http://sparql.wikipathways.org/?default-graph-uri=&query='.uri_escape($sparql).'&format=text%2Fcsv&timeout=0&debug=on';
my $content = get $url;
die "Couldn't get $url" unless defined $content;
print $content;
Java
For java we recommend the Jena Framework.
import com.hp.hpl.jena.query.Query;
import com.hp.hpl.jena.query.QueryExecution;
import com.hp.hpl.jena.query.QueryExecutionFactory;
import com.hp.hpl.jena.query.QueryFactory;
import com.hp.hpl.jena.query.QuerySolution;
import com.hp.hpl.jena.query.ResultSet;
public class javaCodeExample {
public static void main(String[] args) {
String sparqlQueryString = "SELECT * WHERE {?s ?p ?o} LIMIT 10";
Query query = QueryFactory.create(sparqlQueryString);
QueryExecution queryExecution = QueryExecutionFactory.sparqlService("http://sparql.wikipathways.org", query);
ResultSet resultSet = queryExecution.execSelect();
while (resultSet.hasNext()) {
QuerySolution solution = resultSet.next();
System.out.print(solution.get("s"));
System.out.print("\t"+solution.get("p"));
System.out.println("\t"+solution.get("o"));
}
}
}
php
For php we recommend the arc2: Easy RDF and SPARQL for LAMP systems
R
library(rrdf)
sparql.remote(
"http://sparql.wikipathways.org/",
"SELECT DISTINCT ?p WHERE { ?s ?p ?o }"
)
Bioclipse
The below code works in both the JavaScript and the Groovy console:
rdf.sparqlRemote(
"http://sparql.wikipathways.org/",
"SELECT DISTINCT ?p WHERE { ?s ?p ?o }"
)
SPARQL from the command line
For quick and easy querying, we recommend to use curl (Linux and OS X)
curl -F "query=SELECT * WHERE {?s ?p ?o} LIMIT 10" http://sparql.wikipathways.org

