Querying the Knowledge Graph

Setting up a connection

Communication between fairgraph metadata objects and the Knowledge Graph (KG) web service is through a client object, for which an access token associated with an EBRAINS account is needed. To obtain an EBRAINS account, please see https://ebrains.eu/register.

The default way to connect to the KG is:

from fairgraph import KGClient
client = KGClient()

This will connect to the “PPD” KG environment, which should be used for testing and experimentation. When you need to connect to the production KG, specify the host, as follows:

client = KGClient(host="core.kg.ebrains.eu")

If you are working in an EBRAINS Lab Jupyter notebook, you are already authenticated, and so the client will find your access token automatically.

If working outside the EBRAINS environment, the client will print the URL of a log-in page. You should open this URL in a web-browser, log in to your EBRAINS account, then close the tab and return to your Python prompt or your notebook.

For more advanced users, you can alternatively obtain a token from another application (for example from the KG Editor, or by using clb_oauth.get_token() in the EBRAINS Lab) and provide it directly:

client = KGClient(token="<token>")

If you save the token as an environment variable named KG_AUTH_TOKEN, e.g., at a shell prompt:

export KG_AUTH_TOKEN=eyJhbGci...nPq

then the client will take the token from the environment.

Finally, if you are developing an application that is using fairgraph to access the KG, you can request a service account from EBRAINS Support:

client = KGClient(client_id="my_client_id", client_secret="<secret value>")

Listing the available metadata types

Each type of metadata node in the Knowledge Graph is represented by a Python class. These classes are organized into modules according to the openMINDS schemas. For a full list of modules, see Metadata domains.

To get a list of classes in a given module, import the module and then run list_kg_classes(), e.g.:

>>> import fairgraph.openminds.core as omcore

>>> omcore.list_kg_classes()
[fairgraph.openminds.core.research.behavioral_protocol.BehavioralProtocol,
fairgraph.openminds.core.actors.contact_information.ContactInformation,
fairgraph.openminds.core.data.content_type.ContentType,
fairgraph.openminds.core.miscellaneous.doi.DOI,
fairgraph.openminds.core.products.dataset.Dataset,
fairgraph.openminds.core.products.dataset_version.DatasetVersion,
fairgraph.openminds.core.data.file.File,
fairgraph.openminds.core.data.file_bundle.FileBundle,
fairgraph.openminds.core.data.file_repository.FileRepository,
fairgraph.openminds.core.data.file_repository_structure.FileRepositoryStructure,
fairgraph.openminds.core.miscellaneous.funding.Funding,
fairgraph.openminds.core.miscellaneous.gridid.GRIDID,
fairgraph.openminds.core.miscellaneous.isbn.ISBN,
fairgraph.openminds.core.data.license.License,
fairgraph.openminds.core.products.meta_data_model.MetaDataModel,
fairgraph.openminds.core.products.meta_data_model_version.MetaDataModelVersion,
fairgraph.openminds.core.products.model.Model,
fairgraph.openminds.core.products.model_version.ModelVersion,
fairgraph.openminds.core.miscellaneous.orcid.ORCID,
fairgraph.openminds.core.actors.organization.Organization,
fairgraph.openminds.core.actors.person.Person,
fairgraph.openminds.core.products.project.Project,
fairgraph.openminds.core.research.protocol.Protocol,
fairgraph.openminds.core.research.protocol_execution.ProtocolExecution,
fairgraph.openminds.core.miscellaneous.rorid.RORID,
fairgraph.openminds.core.miscellaneous.swhid.SWHID,
fairgraph.openminds.core.data.service_link.ServiceLink,
fairgraph.openminds.core.products.software.Software,
fairgraph.openminds.core.products.software_version.SoftwareVersion,
fairgraph.openminds.core.research.stimulation.Stimulation,
fairgraph.openminds.core.research.subject.Subject,
fairgraph.openminds.core.research.subject_group.SubjectGroup,
fairgraph.openminds.core.research.subject_group_state.SubjectGroupState,
fairgraph.openminds.core.research.subject_state.SubjectState,
fairgraph.openminds.core.research.tissue_sample.TissueSample,
fairgraph.openminds.core.research.tissue_sample_collection.TissueSampleCollection,
fairgraph.openminds.core.research.tissue_sample_collection_state.TissueSampleCollectionState,
fairgraph.openminds.core.research.tissue_sample_state.TissueSampleState,
fairgraph.openminds.core.miscellaneous.web_resource.WebResource]

Listing all metadata nodes of a given type

To obtain a list of all the metadata nodes of a given type, import the associated class and use the list() method, passing the client object you created previously, e.g., to get a list of software or open access document licences:

from fairgraph.openminds.core import License

licenses = License.list(client)

By default, this gives you the first 100 results. You can change the number of results retrieved and the starting point, e.g.:

licenses = License.list(client, from_index=15, size=10)

This returns 10 nodes starting with the 15th. To see how many nodes there are in total:

License.count(client)

Note

if you consistently retrieve an empty list, it is probably because you do not yet have the necessary permissions. See Access permissions for more information.

Filtering/searching

To obtain only metadata nodes that have certain properties, you can filter the list of nodes. For example, to see only datasets whose name contain the phrase ‘patch-clamp’:

from fairgraph.openminds.core import Dataset, Organization

datasets = Dataset.list(client, full_name="patch-clamp")

Warning

the filtering system is currently primitive, and unaware of hierarchies, e.g. filtering by “hippocampus” will not return cells with the brain region set to “hippocampus CA1”. This is on our list of things to fix soon! To see a list of possible search terms, use the property_names() attribute, e.g., DatasetVersion.property_names or consult the inline help (help(omcore.DatasetVersion)).

To search across multiple links in the graph, join property names with “__”. For example, to find all datasets whose authors are affiliated with the Karolinska Institute:

karolinska = Organization.by_name("Karolinska", client)
datasets = Dataset.list(client, authors__affiliations__member_of=Karolinska)

Retrieving a specific node based on its name or id

If you know the name or unique id of a node in the KnowledgeGraph, you can retrieve it directly:

dataset_of_interest = DatasetVersion.by_name("Whole cell patch-clamp recordings of cerebellar Golgi cells", client)
dataset_of_interest = DatasetVersion.from_id("17196b79-04db-4ea4-bb69-d20aab6f1d62", client)

Viewing metadata and connections

Once you have retrieved a node of interest, the associated metadata are available as attributes of the Python object, e.g.:

>>> dataset_of_interest.id
'https://kg.ebrains.eu/api/instances/17196b79-04db-4ea4-bb69-d20aab6f1d62'

>>> dataset_of_interest.uuid
'17196b79-04db-4ea4-bb69-d20aab6f1d62'

>>> dataset_of_interest.description[:100] + "..."
'The Golgi cells, together with granule cells and mossy fibers, form a neuronal microcircuit regulati...'

Connections between graph nodes are also available as attributes:

>>> dataset_of_interest.license
KGProxy([License], '6ebce971-7f99-4fbc-9621-eeae47a70d85')

By default, for performance reasons, connections are not followed, and instead you will see either a KGQuery or KGProxy object. In both these cases, follow the connection using the resolve() method, e.g.:

>>> license = dataset_of_interest.license.resolve(client)

>>> license.full_name
'Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International'

It is rather cumbersome to have to follow all these connections manually. You can ask fairgraph to resolve the connections for you, using the follow_links argument, e.g.:

dataset_of_interest.resolve(
    client,
    follow_links={
        "license": {},
        "is_version_of": {
            "authors": {}
        }
    }
)

Error handling

If you don’t provide all of the metadata attributes and data types expected, fairgraph will warn you.

If you wish to be certain that all required attributes have been provided, you can turn on strict checking for a given node type as follows:

DatasetVersion.set_error_handling("error")

This will then raise an Exception if an attribute is missing or of the wrong data type.

If you wish to turn off all warnings for a given node type:

DatasetVersion.set_error_handling(None)

You can also turn warnings on and off at the level of individual modules, or for all modules. For example, the following turns warnings off for all modules, then sets the “core” module to send warning messages to the Python logging system:

import fairgraph
fairgraph.set_error_handling(None)
fairgraph.openminds.core.set_error_handling("log")