Skip to main content

Working with Metadata Entities

Learn how to find, retrieve & update entities comprising your Metadata Graph programmatically.

Reading an Entity: Queries

DataHub provides the following GraphQL queries for retrieving entities in your Metadata Graph.

Getting a Metadata Entity

To retrieve a Metadata Entity by primary key (urn), simply use the <entityName>(urn: String!) GraphQL Query.

For example, to retrieve a dataset entity, you can issue the following GraphQL Query:

As GraphQL

{
dataset(urn: "urn:li:dataset:(urn:li:dataPlatform:kafka,SampleKafkaDataset,PROD)") {
urn
properties {
name
}
}
}

As CURL

curl --location --request POST 'http://localhost:8080/api/graphql' \
--header 'Authorization: Bearer <my-access-token>' \
--header 'Content-Type: application/json' \
--data-raw '{ "query":"{ dataset(urn: \"urn:li:dataset:(urn:li:dataPlatform:kafka,SampleKafkaDataset,PROD)\") { urn properties { name } } }", "variables":{}}'

In the following examples, we'll look at how to fetch specific types of metadata for an asset.

Querying for Owners of an entity

As GraphQL:

query {
dataset(urn: "urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)") {
ownership {
owners {
owner {
... on CorpUser {
urn
type
}
... on CorpGroup {
urn
type
}
}
}
}
}
}

Querying for Tags of an asset

As GraphQL:

query {
dataset(urn: "urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)") {
tags {
tags {
tag {
name
}
}
}
}
}

Querying for Domain of an asset

As GraphQL:

query {
dataset(urn: "urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)") {
domain {
domain {
urn
}
}
}
}

Querying for Glossary Terms of an asset

As GraphQL:

query {
dataset(urn: "urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)") {
glossaryTerms {
terms {
term {
urn
}
}
}
}
}

Querying for Deprecation of an asset

As GraphQL:

query {
dataset(urn: "urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)") {
deprecation {
deprecated
decommissionTime
}
}
}

Relevant Queries

Searching for a Metadata Entity

To perform full-text search against an Entity of a particular type, use the search(input: SearchInput!) GraphQL Query.

As GraphQL:

{
search(input: { type: DATASET, query: "my sql dataset", start: 0, count: 10 }) {
start
count
total
searchResults {
entity {
urn
type
...on Dataset {
name
}
}
}
}
}

As CURL:

curl --location --request POST 'http://localhost:8080/api/graphql' \
--header 'Authorization: Bearer <my-access-token>' \
--header 'Content-Type: application/json' \
--data-raw '{ "query":"{ search(input: { type: DATASET, query: \"my sql dataset\", start: 0, count: 10 }) { start count total searchResults { entity { urn type ...on Dataset { name } } } } }", "variables":{}}'

Note that by default Elasticsearch only allows pagination through 10,000 entities via the search API. If you need to paginate through more, you can change the default value for the index.max_result_window setting in Elasticsearch, or using the scroll API to read from the index directly.

Relevant Queries

Modifying an Entity: Mutations

Authorization

Mutations which change Entity metadata are subject to DataHub Access Policies. This means that DataHub's server will check whether the requesting actor is authorized to perform the action.

Updating a Metadata Entity

To update an existing Metadata Entity, simply use the update<entityName>(urn: String!, input: EntityUpdateInput!) GraphQL Query.

For example, to update a Dashboard entity, you can issue the following GraphQL mutation:

As GraphQL

mutation updateDashboard {
updateDashboard(
urn: "urn:li:dashboard:(looker,baz)",
input: {
editableProperties: {
description: "My new desription"
}
}
) {
urn
}
}

As CURL

curl --location --request POST 'http://localhost:8080/api/graphql' \
--header 'Authorization: Bearer <my-access-token>' \
--header 'Content-Type: application/json' \
--data-raw '{ "query": "mutation updateDashboard { updateDashboard(urn:\"urn:li:dashboard:(looker,baz)\", input: { editableProperties: { description: \"My new desription\" } } ) { urn } }", "variables":{}}'

Be careful: these APIs allow you to make significant changes to a Metadata Entity, often including updating the entire set of Owners & Tags.

Relevant Mutations

Adding & Removing Tags

To attach Tags to a Metadata Entity, you can use the addTags or batchAddTags mutations. To remove them, you can use the removeTag or batchRemoveTags mutations.

For example, to add a Tag a Pipeline entity, you can issue the following GraphQL mutation:

As GraphQL

mutation addTags {
addTags(input: { tagUrns: ["urn:li:tag:NewTag"], resourceUrn: "urn:li:dataFlow:(airflow,dag_abc,PROD)" })
}

As CURL

curl --location --request POST 'http://localhost:8080/api/graphql' \
--header 'Authorization: Bearer <my-access-token>' \
--header 'Content-Type: application/json' \
--data-raw '{ "query": "mutation addTags { addTags(input: { tagUrns: [\"urn:li:tag:NewTag\"], resourceUrn: \"urn:li:dataFlow:(airflow,dag_abc,PROD)\" }) }", "variables":{}}'

Pro-Tip! You can also add or remove Tags from Dataset Schema Fields (or Columns) by providing 2 additional fields in your Query input:

  • subResourceType
  • subResource

Where subResourceType is set to DATASET_FIELD and subResource is the field path of the column to change.

Relevant Mutations

Adding & Removing Glossary Terms

To attach Glossary Terms to a Metadata Entity, you can use the addTerms or batchAddTerms mutations. To remove them, you can use the removeTerm or batchRemoveTerms mutations.

For example, to add a Glossary Term a Pipeline entity, you could issue the following GraphQL mutation:

As GraphQL

mutation addTerms {
addTerms(input: { termUrns: ["urn:li:glossaryTerm:NewTerm"], resourceUrn: "urn:li:dataFlow:(airflow,dag_abc,PROD)" })
}

As CURL

curl --location --request POST 'http://localhost:8080/api/graphql' \
--header 'Authorization: Bearer <my-access-token>' \
--header 'Content-Type: application/json' \
--data-raw '{ "query": "mutation addTerms { addTerms(input: { termUrns: [\"urn:li:glossaryTerm:NewTerm\"], resourceUrn: \"urn:li:dataFlow:(airflow,dag_abc,PROD)\" }) }", "variables":{}}'

Pro-Tip! You can also add or remove Glossary Terms from Dataset Schema Fields (or Columns) by providing 2 additional fields in your Query input:

  • subResourceType
  • subResource

Where subResourceType is set to DATASET_FIELD and subResource is the field path of the column to change.

Relevant Mutations

Adding & Removing Domain

To add an entity to a Domain, you can use the setDomain and batchSetDomain mutations. To remove entities from a Domain, you can use the unsetDomain mutation or the batchSetDomain mutation.

For example, to add a Pipeline entity to the "Marketing" Domain, you can issue the following GraphQL mutation:

As GraphQL

mutation setDomain {
setDomain(domainUrn: "urn:li:domain:Marketing", entityUrn: "urn:li:dataFlow:(airflow,dag_abc,PROD)")
}

As CURL

curl --location --request POST 'http://localhost:8080/api/graphql' \
--header 'Authorization: Bearer <my-access-token>' \
--header 'Content-Type: application/json' \
--data-raw '{ "query": "mutation setDomain { setDomain(domainUrn: \"urn:li:domain:Marketing\", entityUrn: \"urn:li:dataFlow:(airflow,dag_abc,PROD)\") }", "variables":{}}'

Relevant Mutations

Adding & Removing Owners

To attach Owners to a Metadata Entity, you can use the addOwners or batchAddOwners mutations. To remove them, you can use the removeOwner or batchRemoveOwners mutations.

For example, to add an Owner a Pipeline entity, you can issue the following GraphQL mutation:

As GraphQL

mutation addOwners {
addOwners(input: { owners: [ { ownerUrn: "urn:li:corpuser:datahub", ownerEntityType: CORP_USER, type: TECHNICAL_OWNER } ], resourceUrn: "urn:li:dataFlow:(airflow,dag_abc,PROD)" })
}

As CURL

curl --location --request POST 'http://localhost:8080/api/graphql' \
--header 'Authorization: Bearer <my-access-token>' \
--header 'Content-Type: application/json' \
--data-raw '{ "query": "mutation addOwners { addOwners(input: { owners: [ { ownerUrn: \"urn:li:corpuser:datahub\", ownerEntityType: CORP_USER, type: TECHNICAL_OWNER } ], resourceUrn: \"urn:li:dataFlow:(airflow,dag_abc,PROD)\" }) }", "variables":{}}'

Relevant Mutations

Updating Deprecation

To update deprecation for a Metadata Entity, you can use the updateDeprecation or batchUpdateDeprecation mutations.

For example, to mark a Pipeline entity as deprecated, you can issue the following GraphQL mutation:

As GraphQL

mutation updateDeprecation {
updateDeprecation(input: { urn: "urn:li:dataFlow:(airflow,dag_abc,PROD)", deprecated: true })
}

As CURL

curl --location --request POST 'http://localhost:8080/api/graphql' \
--header 'Authorization: Bearer <my-access-token>' \
--header 'Content-Type: application/json' \
--data-raw '{ "query": "mutation updateDeprecation { updateDeprecation(input: { urn: \"urn:li:dataFlow:(airflow,dag_abc,PROD)\", deprecated: true }) }", "variables":{}}'

Note that deprecation is NOT currently supported for assets of type container.

Relevant Mutations

Editing Description (i.e. Documentation)

Notice that this API is currently evolving and in an experimental state. It supports the following entities today:

  • dataset
  • container
  • domain
  • glossary term
  • glossary node
  • tag
  • group
  • notebook
  • all ML entities

To edit the documentation for an entity, you can use the updateDescription mutation. updateDescription currently supports Dataset Schema Fields, Containers.

For example, to edit the documentation for a Pipeline, you can issue the following GraphQL mutation:

As GraphQL

mutation updateDescription {
updateDescription(
input: {
description: "Name of the user who was deleted. This description is updated via GrpahQL.",
resourceUrn:"urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_deleted,PROD)",
subResource: "user_name",
subResourceType:DATASET_FIELD
}
)
}

As CURL

curl --location --request POST 'http://localhost:8080/api/graphql' \
--header 'Authorization: Bearer <my-access-token>' \
--header 'Content-Type: application/json' \
--data-raw '{ "query": "mutation updateDescription { updateDescription ( input: { description: \"Name of the user who was deleted. This description is updated via GrpahQL.\", resourceUrn: \"urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_deleted,PROD)\", subResource: \"user_name\", subResourceType:DATASET_FIELD }) }", "variables":{}}'

Relevant Mutations

Soft Deleting

DataHub allows you to soft-delete entities. This will effectively hide them from the search, browse, and lineage experiences.

To mark an entity as soft-deleted, you can use the batchUpdateSoftDeleted mutation.

For example, to mark a Pipeline as soft deleted, you can issue the following GraphQL mutation:

As GraphQL

mutation batchUpdateSoftDeleted {
batchUpdateSoftDeleted(input: { : urns: ["urn:li:dataFlow:(airflow,dag_abc,PROD)"], deleted: true })
}

Similarly, you can "un delete" an entity by setting deleted to 'false'.

As CURL

curl --location --request POST 'http://localhost:8080/api/graphql' \
--header 'Authorization: Bearer <my-access-token>' \
--header 'Content-Type: application/json' \
--data-raw '{ "query": "mutation batchUpdateSoftDeleted { batchUpdateSoftDeleted(input: { deleted: true, urns: [\"urn:li:dataFlow:(airflow,dag_abc,PROD)\"] }) }", "variables":{}}'

Relevant Mutations

Handling Errors

In GraphQL, requests that have errors do not always result in a non-200 HTTP response body. Instead, errors will be present in the response body inside a top-level errors field.

This enables situations in which the client is able to deal gracefully will partial data returned by the application server. To verify that no error has returned after making a GraphQL request, make sure you check both the data and errors fields that are returned.

To catch a GraphQL error, simply check the errors field side the GraphQL response. It will contain a message, a path, and a set of extensions which contain a standard error code.

{
"errors":[
{
"message":"Failed to change ownership for resource urn:li:dataFlow:(airflow,dag_abc,PROD). Expected a corp user urn.",
"locations":[
{
"line":1,
"column":22
}
],
"path":[
"addOwners"
],
"extensions":{
"code":400,
"type":"BAD_REQUEST",
"classification":"DataFetchingException"
}
}
]
}

With the following error codes officially supported:

CodeTypeDescription
400BAD_REQUESTThe query or mutation was malformed.
403UNAUTHORIZEDThe current actor is not authorized to perform the requested action.
404NOT_FOUNDThe resource is not found.
500SERVER_ERRORAn internal error has occurred. Check your server logs or contact your DataHub administrator.

Feedback, Feature Requests, & Support

Visit our Slack channel to ask questions, tell us what we can do better, & make requests for what you'd like to see in the future. Or just stop by to say 'Hi'.