Create and Use a Microsoft Search Graph Connector Step by Step


Our Desired Outcome

In this blog post we will accomplish the below goals:

  1. Outline What problem does Graph Connector solve
  2. Explain Why you should use a Graph Connector and How
  3. Explain what External Data is and how to create it
  4. Explain what an External Data Schema is and how to create it
  5. Explain what an Indexer component is and how to create it
  6. Show you how to surface this External Data that’s have been indexed and ACL’d to your users
  7. Outline next steps and a Call to Action (CTA)

Problem Statement

Why should you consider a Microsoft Graph Connector? What problem is it solving for you? Is the the only way to solve this problem? These are questions that I want to answer from the beginning of this journey we are about to take. 

  • Primer:  My organization has data that is distributed, which could be solely on premises, a combination of on premises and in the cloud (SaaS), or just multiple SaaS providers and you also have/support Microsoft Office 365 as part of that same network of available services.
  • Scenario 1: One main problem here is in the ‘findability’ or better yet ‘discoverability’ of information i.e. where do I look for its, how do I find it?
  • Scenario 2: My organization is concerned about Security and Compliance i.e. DLP, Legal Holds and based on the primer above, my problem is how do I narrow my gaze and still ensure that I am casting a wide net against all the data I need to manage and/or secure.
  • Scenario 3: My workforce is distributed could be just ‘how we work’ or it could be a fact of life as we live it now under a pandemic.  Sprawl can be a problem when it comes to data which can affect not only performance but crate anxiety, how do I make sure that my workforce have access to the right data at the right time

There are probably many other scenarios you can think of in this problem statement but for me, these are top of mind. If you do have more, please leave me comments or ping me on twitter @fabianwilliams

Why should you be interested in Graph Connectors

Simple, Graph Connector is an answer to solve the problem statements above.  How does it do that? By creating a connection between your data sources, adopting a schema for the data that will be made available, making a determination on who should have access to that data once surfaced and keeping that information fresh in Microsoft 365.  This may be accomplished through the Microsoft 365 Search Admin Center and/or the Microsoft Graph API.

Lets look a bit deeper under the covers though.  As taken from our docs “With Microsoft Graph connectors, your organization can index third-party data so it appears in Microsoft Search results. This feature expands the types of content sources that are searchable in your Microsoft 365 productivity apps and the broader Microsoft ecosystem. The third-party data can be hosted on-premises or in the public or private clouds.” Connectors are build both by Microsoft as well as our 3rd Party partners, a list of them may be found in our connectors gallery here.

In this post we will focus our attention on what may be accomplished via the Microsoft Graph API, in particular what you are able to do under the Search Endpoint, through Indexing which is currently still in beta.

Tell me more

I will have samples for my Step by Step which will guide you through the details below:

  1. External Connection
  2. Schema
  3. Creating the External Item (Index)
  4. Security Model / Access Control List (ACL)
  5. Surfacing the External Data
    1. Office Hub
    2. SharePoint Online

The next sections will go into the detail above where I will tell the story around each item. What it is, why do you need it, how do you use it, as well as a sample that you should be able to just pick up and run with as I have done here in my demo. The location for this code is found in my Github repository here. I also have a public Github Gist bundle with all the files here, but I will be calling them out independently as we go.

At the time of this writing Microsoft Graph connectors and are currently in public preview status. To gain access to connectors functionality, you must turn on the Targeted release option in your tenant. See more details on the connectors preview program.

There are also some known limitations that I will link to here, nothing too crazy Smile just limits to amount, size, security, and sorting.

The Set Up

Here is our scenario and story…. Al long time ago.. JK…

You are a Organization or an ISV that have data that spans On Premises and possibly Cloud, in addition you are also using Microsoft 365. I am imagining that you have a data store that you can liken to a SQL table, Oracle table, NoSQL JSON container, or even a SaaS [Salesforce, Oracle NetSuite, ServiceNow, Jira] dataset. What I am doing here for the sake of simplicity and so that this can be repeatable is use a format that you can all work with in a demo, that is… a comma separated value (CSV) file with a header row that represents the properties or fields, and rows of data. In our case we are going to use a freely available Product data set from Kaggle for FlipKart.

Our Data

Our data is neatly tucked away in my GitHub Repo Here

datastoreexcel

and a view of what it looks like is above.

Our User(s)

For our scenario we are are setting up ONLY ONE ACL, you will see more of this in the Security section below and in our test when we show another user performing a Search you will notice the results honoring the security we specified.

Our Use Case

Our use case is rather simple, and it anchors around different sources you can target with Graph Connectors, in our case Files, see our guidance here for others.  We want to surface our Data that is external to Microsoft 365 to be treated JUST LIKE Microsoft 365 data i.e. 1st Party == 3rd Party such that I have only one place to go to get a 360 view of my Data regardless of its origin.

Register your App (for your Graph Connector) in Azure Active Directory (AAD)

The 1st step is to register your App as you would any other Graph App Registration, paying attention to the following Graph Permission needed

appregconnectors

External Connection

An External Connection is a logical container to add content from an external source into Microsoft Graph from which you are able to:

  • Create a Connection
  • Read/List a Connection
  • Update a Connection
  • Delete a Connection

So full CRUD capabilities, in this section which you can see more guidance from Microsoft here, we will show you how to  create said connection. Pay attention to the ID you will create with this HTTP POST call as you will need it for other calls related to this connection and data.

Below you will see how to create that connection with POST /external/connections

{
    "name": "FabsDemoFilesDriveAlpha",
    "description": "Index my hard drive network share demo",
    "id": "fabsdemofilesdrivealpha"
}

that will call https://graph.microsoft.com/beta/external/connections endpoint

Below is a screenshot of the call that I did using Postman against the same endpoint

createconnection_w_returncall

and at this point if you went to the Search Admin Center, this is what you will see

createconnection_adminctrafter

what you have above is a connection now created inside the tenant you have your Application Registered in currently in Draft mode.

Schema

Now that you have the connection, the next thing you will do is define and create the schema that you want applied to the External Data that you are going to be surfacing.  This can be as I tell my partners and customers:

  • The full Entity object representation of what you want to surface i.e. everything
  • A partial representation of what you want to surface i.e. perhaps you want to just send JUST what your users will need and for everything else, have them click a link to either use an immersive experience inside a Microsoft 365 technology product or open up a modal, or pop them back into a web browser or application view.
  • Also an entity may be a “REAL” thing, like a file, or it can also be an abstraction, something that is arbitrary or has not file type but exist in your world. Consider inside your product you may have widgets, but these widgets are not files but you want to surface the information inside the widget

For your call to create the Schema its also a POST /external/connections/{id}/schema

{
  "baseType": "microsoft.graph.externalItem",
  "properties": [
    {
      "name": "uniqid",
      "type": "String",
      "isSearchable": "false",
      "isRetrievable": "false",
      "isQueryable": "false"
    },
    {
      "name": "producturl",
      "type": "String",
      "isSearchable": "false",
      "isRetrievable": "true",
      "isQueryable": "false"
    },
    {
      "name": "productname",
      "type": "String",
      "isSearchable": "true",
      "isRetrievable": "true",
      "isQueryable": "false"
    },
    {
      "name": "retailprice",
      "type": "String",
      "isSearchable": "false",
      "isRetrievable": "true",
      "isQueryable": "false"
    },
    {
      "name": "discountedprice",
      "type": "String",
      "isSearchable": "false",
      "isRetrievable": "true",
      "isQueryable": "false"
    },    
    {
      "name": "image",
      "type": "String",
      "isSearchable": "false",
      "isRetrievable": "true",
      "isQueryable": "false"
    },
    {
      "name": "description",
      "type": "String",
      "isSearchable": "true",
      "isRetrievable": "true",
      "isQueryable": "false"
    },
    {
      "name": "brand",
      "type": "String",
      "isSearchable": "false",
      "isRetrievable": "true",
      "isQueryable": "true"
    }
  ]
}

and that will call against the ID previously created. Please see my screenshots below:

This 1st item is where I am getting ready to make the call

createschemafullwithaccepted_loc

This 2nd item is the result from the call pay note to the Location Property in the Header that is used in the GET and how it shows you the status of the Schema operation

createschemafullwithaccepted_loc_201

Once this action is completed and it can take as in my case approximately 3 minutes to do, then we are two thirds done and now we just need to PUT (push the data from our local system up to Microsoft 365). Before doing this however you would have had to consider your security model because it is inside of the Index which is what we are about to create that you define your ACL, more to come when we talk about security, logically you would have done this before but I felt it easier to show you something then explain more in details afterwards.

The call would look like this: PUT /external/connections/{connection-id}/items/{item-id}

index_loaduprunner

Lets unpack what you are seeing here and explain what this is verses what you will be doing in real life.

Demo Scenario

In this demo scenario I am imagining a data store that you can liken to a SQL table, Oracle table, NoSQL JSON container, or even a SaaS [Salesforce, Oracle NetSuite, ServiceNow, Jira] dataset. What I am doing here for the sake of simplicity and so that this can be repeatable is use a format that you can all work with in a demo, that is… a comma separated value (CSV) file with a header row that represents the properties or fields, and rows of data.

index_loaduprunner_inflight

What you see above is using Postman Runner which is a tool that you can use to automate, stress test, or basically run a command as in this case a REST call by feeding it the fields values as variables and a file containing the data to iterate through. In the case above you can see multiple PUT calls to the endpoint with a unique ID, and 200 responses being returned. Next I will open one up and show you what’s inside a payload.

index_loaduprunner_inflight_w_payload

and finally if we open it up you can see in line 3 through 9 the Security that’s bound to this indexed item, and from lines 11 through 19 the schema along with associated data we are pushing up to Microsoft 365 from our local environment i.e. the External Data.

Real Life

If this were real life let me bullet point what would be happening here:

  1. You have an external system that is your data store/source
    1. When changes happen in that system you will have a record of it and you can make #2 below event based or batched at end of day
  2. Based on some frequency you will take that data that’s new or updated in 1.1 above and its either in-memory or persisted elsewhere but you can either
    1. In real time store those changes as variables and then pass them to PUT /external/connections/{connection-id}/items/{item-id} as you see in the above screenshot where {item-id} is that unique identifier for the row that’s in your HTTP PUT while you send the properties and their values in the Message Body.
    2. Batch this at end of day which could actually make my DEMO scenario be a real life scenario, if you persisted the information in an external file and just read from it.

Regardless of the approach you use, what you will see if you look in the Admin Center now would be the following

index_loaduprunner_inflight_adminctr

which shows you that items are being indexed into Microsoft 365.

Security

When creating an externalItem (Index), the following fields are required: @odata.type, acl, and properties. The properties object must contain at least one property.

Property Type Description
acl acl collection An array of access control entries. Each entry
specifies the access granted to a user or group. Required.
content externalItemContent A plain-text representation of the contents of the
item. The text in this property is full-text indexed. Optional.
id String Developer-provided unique ID of the item within the
containing externalConnection. Must be alphanumeric and a
maximum of 128 characters. Required.
properties Object A property bag with the properties of the item. The
properties MUST conform to the schema defined for
the externalConnection.
Required.

The ACL itself must be either AAD or External, I think the AAD portion is easy, and usually when I explain the External ACL, I say imagine if you do not use AAD for your Identity Provider (IdP) ? What you would need to do is Map a relationship between your External Item Index to the security model in AAD in order to honor the security context. If this is AAD, then this can be a AAD User or Group as you will see in my demo.  If you would like to get more details on this please reference this guidance here for External Item and here for ACL.

In our Demo example my Index (External Item) is shown below

{
  "@odata.type": "microsoft.graph.externalItem",
  "acl": [
    {
      "type": "user",
      "value": "your-user-GUID-here-347aac675901",
      "accessType": "grant",
      "identitySource": "azureActiveDirectory"
    }
  ],
  "properties": {
    "uniqid": "{{uniqid}}",
    "producturl": "{{producturl}}",
    "productname": "{{productname}}",
    "retailprice": "{{retailprice}}",
    "discountedprice": "{{discountedprice}}",
    "image": "{{image}}",
    "description": "{{description}}",
    "brand": "{{brand}}"
  },
  "content": {
    "value": "Error in gateway...",
    "type": "text"
  }
}

note in acl [2nd line] under type I have user and in the value, that is the GUID from my user, this can also be a Group as seen in the sample code below

HTTP/1.1 200 OK
Content-type: application/json

{
  "@odata.type": "microsoft.graph.externalItem",
  "acl": [
    {
      "type": "user",
      "value": "e811976d-83df-4cbd-8b9b-5215b18aa874",
      "accessType": "grant",
      "identitySource": "azureActiveDirectory"
    },
    {
      "type": "group",
      "value": "14m1b9c38qe647f6a",
      "accessType": "deny",
      "identitySource": "external"
    }
  ],
  "properties": {
    "title": "Error in the payment gateway",
    "priority": 1,
    "assignee": "john@contoso.com"
  },
  "content": {
    "value": "Error in payment gateway...",
    "type": "text"
  }
}

So that does it for security. Once the Index (External Item) is all up in Microsoft 365 and you take a look at the Admin Center again, I would expect it to look like the below

index_loaduprunner_postflight_adminctr

So, now we have almost one thousand items in our index…what’s next?

Surfacing the External Data

Most of this experience is documented in Microsoft Guidance and its not done in the API, so I don’t want to make this long post even longer, but for completeness I will show you some screenshots as I give you the docs here from Microsoft.  The 1st thing you would have notices as the Index was in flight was some “Required Actions” let us go back to that screen shot.

createVertical

So the 1st thing you will need to do is create a Vertical and after than you will need to create a Result Type. I can already hear you asking.. What is that?

Search Vertical

As taken from the docs here “At the top of the Microsoft Search results page, there’s a row of tabs. These are the search verticals. A search vertical only shows results of a certain type or from certain content. Examples are Files or News. By default, Microsoft Search shows the verticals All, People, Files, Sites, and News.

You can add search verticals that are relevant to your organization. These will appear on the Microsoft Search results page in SharePoint, Office, and Bing. For example, you could create a vertical for marketing-related content and another for sales, based on the type of information that each group needs. You can add verticals to show results only from content indexed via connectors.” In our case, our Vertical is this fictitious data we got from Kaggle about products.

Here are the steps to create a Search Vertical, all you will need to do different is use the connections we created here

Result Type

Again, as taken from the docs “You can define how results are displayed in the vertical by designing the layout using result types. The result layout lets you show important information directly in the search results, so users don’t have to select each result to see if they found what they’re looking for.”

Here are the steps to create a Result Type of your own.

My Demo Experience

In my demo experience we first created a Search Vertical which as you can see from the above is very straight forward.

vertical1

and then followed a few steps that’s pretty simple to figure out, most of which are optional and serve to limit the data your get back in your results. The key below is picking the correct Connector your created earlier

vertical2

Then its next next next Finish. Off to getting the Result Type, for that we follow the instructions above to create that and the steps are equally straight forward

resulttype1

and here again selecting the correct source

resulttype2

The screenshot below is where it does get a bit nuanced as you need to determine “HOW” you would like to see the results coming back. Fortunately there is a Layout Designer that when you click the button indicated #1 below, it will open up a designer that you can select from some pre defined templates. The template I chose is below

{
    "type": "AdaptiveCard",
    "version": "1.0",
    "body": [
        {
            "type": "ColumnSet",
            "columns": [
                {
                    "type": "Column",
                    "width": 1,
                    "items": [
                        {
                            "type": "Image",
                            "url": "{image}",
                            "size": "Large",
                            "horizontalAlignment": "Left"
                        }
                    ],
                    "spacing": "None"
                },
                {
                    "type": "Column",
                    "width": 9,
                    "items": [
                        {
                            "type": "TextBlock",
                            "text": "[{productname}]({producturl})",
                            "color": "Accent",
                            "size": "Medium",
                            "weight": "Bolder",
                            "maxLines": 3
                        },
                        {
                            "type": "TextBlock",
                            "text": "{Description}",
                            "wrap": true,
                            "maxLines": 3,
                            "spacing": "Medium"
                        }
                    ],
                    "horizontalAlignment": "Center",
                    "spacing": "Medium"
                }
            ],
            "spacing": "None"
        }
    ],
    "$schema": "http://adaptivecards.io/schemas/adaptive-card.json",
    "$data": {
        "description": "Marketing team at Contoso.., and looking at the Contoso Marketing documents on the team site. This contains the data from FY20 and will taken over to FY21...Marketing Planning is ongoing for FY20..",
        "image": "https://searchuxcdn.blob.core.windows.net/designerapp/images/long-stock-image.png",
        "producturl": "https://modernacdesigner.azurewebsites.net",
        "productname": "Contoso Research Memo"
    }
}

and pasting it in below you see…

resulttype3_adaptivecard

Once you have settled on a template you are given the option to copy the JSON content that represents your Adaptive Card and paste it in the space your see #2.  I also draw your attention to #3 as it is basically the properties and data type that you have in your Index that marries up to your Schema.. or at least should. If it does not, it will let you know your JSON does not map. How do I know this? I had a capital letter in my property for my Schema and a lower case in my Index. and yes I paid dearly Smile 

resulttype4complete

So in the end when we go to two out of the 3 places mentioned above to consume the External Data and we do a search lets say for instance “women” in an attempt to find products for women:

Office Hub

Below you will see us inside the Office Hub, that is where you go hen you type in portal.office.com.  We are conducting a search for Women in #2 callout and in #3 callout you can see we are under the FlipCartCatalog Vertical and we have our results sets showing up.

result_OfficeHub

SharePoint Hub

Below you will see us inside the SharePoint Hub, that is where you go hen you type in portal.office.com.  We are conducting the same search for Women in #2 callout and in #3 callout you can see we are under the FlipCartCatalog Vertical and we have our results sets showing up.

result_SharePointHub

Show me how the ACL prevents Unauthorized Access

Of course… so any one EXCEPT my user should not gain access as you an see from the below user conducting a search

result_OfficeHubMeganNoResult

and in those simple steps we now have a working REAL WORLD Graph Connector Step by Step.

Resources

Overview of Microsoft Graph connectors

License requirements and pricing

Use the Microsoft Search API to index data

Summary

In our Demo scenario I created my Vertical and Result Type, it was pretty much straight forward, there was one GOTCHA however, I could never get my Thumbnail view in my Result Type to show the image even though I know it is coming through [you can verify by looking at the Index screen shot to see it there in the payload], I think its perhaps too big in size. When I go to the image directly it’s a full page image, but I did not confirm my suspicions.

Finally, I hope this was of value to you, I have provided you with enough information and hopefully detail that you can take this and run with it, but should you have questions, please let me know, the best way is to fire off a Tweet to @fabianwilliams and/or a message in my LinkedIn.

Leave a comment

Your email address will not be published. Required fields are marked *