sparql - How to handle Wikipedia Named Entities that have the same Category name -


i trying extract companies ran following query

prefix cat: <http://dbpedia.org/resource/category:>  prefix dcterms: <http://purl.org/dc/terms/>  prefix skos: <http://www.w3.org/2004/02/skos/core#>  select distinct ?page ?subcat  { ?subcat skos:broader* cat:companies_of_the_united_states_by_industry .  ?page dcterms:subject ?subcat .  ?page  rdfs:label ?pagename.  } 

this snapshot of results enter image description here

amgen , pfizer both companies category, end collecting under pfizer , amgen (people, product). found out these entries belong wikipedia category called category:wikipedia_categories_named_after_companies_of_the_united_states or category:wikipedia_categories_named_after_pharmaceutical_companies_of_the_united_states. tried filter these categories did this

select distinct ?page ?subcat  { ?subcat skos:broader* cat:companies_of_the_united_states_by_industry .  ?page dcterms:subject ?subcat .  ?page  rdfs:label ?pagename.  filter( !regex(?subcat,"wikipedia_categories_named_after_pharmaceutical_companies_of_the_united_states")) } 

but no luck, still there. idea how avoid problem?

the problem doesn't have them having same name. wikipedia categories don't form type hierarchy, doesn't make sense treat them one. reason see results you're seeing there's category pfizer, , broader values include company listings, dcterms:subject of dbpedia:alprazolam, dbpedia:cetirizine, etc. doesn't make sense type hierarchy, fine organizing article topics. if want companies back, ask things companies:

select distinct ?page ?subcat  {   ?subcat skos:broader* category:companies_of_the_united_states_by_industry .    ?page dcterms:subject ?subcat .    ?page rdfs:label ?pagename.    ?page dbpedia-owl:company } 

we can clean bit, though. you're not using ?label, can remove it. can use of shorter syntaxes make things little bit cleaner. can note "companies … industry" has skos:broader value "companies of united states" makes intent of query bit clearer.

select distinct ?company ?subcategory  {   ?company dcterms:subject ?subcategory ;            dbpedia-owl:company .   ?subcategory skos:broader* category:companies_of_the_united_states .  } limit 1000 

sparql results

as final note, category hierarchy doesn't mean each company has single path top category. is, company listed multiple times, e.g.:

company   subcategory ------------------------------------ companyx  textile_companies companyx  companies_in_new_hampshire 

unless need listing of subcategories, might consider eliminating query, in case can have (using property paths):

select distinct ?company {   ?company dbpedia-owl:company ;            dcterms:subject/skos:broader* category:companies_of_the_united_states . } limit 1000 

sparql results


Comments

Popular posts from this blog

javascript - RequestAnimationFrame not working when exiting fullscreen switching space on Safari -

Python ctypes access violation with const pointer arguments -