Dynamically Invoking REST API with Data Factory

January 19, 2023 By Ricardo Rangel Tutorials Azure Data Factory, Azure SQL Server, Data Pipeline, Zip Code API

In this entry we are looking at dynamically creating a REST API call based on the source data within a Data Factory pipeline. We’ll use the Zip Code API from Metadapi to showcase how this is done. Please note that this pattern can be used to create a data pipeline between a file and a file, file and any database table, Data Lake and SQL Server, Data Lake and Azure Synapse, etc. the pattern just needs to replace the source and/or target endpoints and you can have many different integrations on your pipeline.

Here's the pattern:

ETL Pattern REST API Lookup

Prerequisites

Knowledge of data pipeline concepts, data factory and SQL server.
Azure SQL Server DB and Azure Data Factory resources running on your Azure account.
The US Zip Code API subscription key.

Setup

We have 2 tables that will be used on this tutorial, the first one called dbo.sourceCRM that contains the customerId and zipCode from a sample leads database. Our goal is to take this basic zip code information and append to it additional data that will be used by our analytical platforms and load it into table dbo.targetCRM with the following additional columns (find the DDL used and table load script in our github repository):

stateCode
stateName
titleCaseCountyName
titleCaseCityName
latitude
longitude
msaCode
msaName

If you would like to see all the data points available on the US Zip Code API please look at the JSON schema definition.

Build Azure Data Factory Pipeline

From your Azure Data Factory Studio create a new pipeline. This pipeline will include the required data flow to implement the ETL pattern. In this example we'll name the pipeline "CRMLeads"

Create new pipeline in azure data factory

From the Activities menu, let's add a new DataFlow activity.

Add data flow in azure data factory

When you create a new Data Flow you must go to the settings and create a new data flow (again because this is a net new data flow, and we are not re-using an existing one).

Add new data flow

Our data flow will include 6 tasks or steps for the ETL pattern to complete, at the end of the build the flow should look like this:

Full data flow for dynamic call of zip code api

Step 1. Add Source

The source table that resides on an Azure SQL Server database needs to be added to the flow, click on the "Add Source" drop down and select "Add Source"

Add source on azure data factory

On the source settings tab, click on create new dataset (this is needed to configure the SQL Server database and table that will be used on this pipeline)

source settings in azure data factory

From here you're able to select Azure SQL Database from the list of options and click continue.

select sql server database for dataset

We should be able to create the linked service (for our database). Click on "New" to setup the linked service. Here we'll input the database connection information. Since our DB is running on the same account as the data factory service, we'll select the database and provide the connection information. (this configuration will be different on each organization).

select link service for source in data factory

We need to input all the database information from our Azure subscription. Fill in your database connection details.

security settings linked service data factory

We should be able to now select the source table we'll use for this pipeline. Look for table dbo.sourceCRM (created on the setup step of this tutorial).

select table on source database table

Once selected we can click Ok.

select table

The source transformation is now complete.

Step 2. Add derived column transformation

The next transformation we add on the data flow is a derived column transformation. We need to take the zip code string from the data source and "build" a REST API request for each row (this transformation doesn't make the API call, but helps on building the "endpoint" needed to get the Zip Code details).

add derived column in azure data factory

The endpoint we'll use from the API service is https://global.metadapi.com/zipc/v1/zipcodes/{zipcode}.

Metadapi Zip Code API get details screenshot

The last part of the endpoint needs to be replaced with the actual zip code from the source database, to do this we'll create a new column on our data flow called zipCodeDetailsRequest that has the following expression:

'/zipcodes/' + zipCode

Dynamic expression for calling zip code api

We will use this column in the next transformation.

Step 3. Add external call transformation

The external call transformation allows to call an external service (i.e. a REST API) and capture the response to use on the pipeline.

add external call transformation in azure data factory

When we add the external call, select REST for the Inline dataset type. Then we need to configure the "linked service" that will be used for the API call (this configures the connection information to the API service).

linked service configure rest api

We name the linked service as "ZipCodeAPI" and set the base URL as :

https://global.metadapi.com/zipc/v1

(as a best practice add this API at the root level so you can re-use the same linked service on all the endpoints available from the service).

rest api linked service

The US Zip Code API uses an API key to authenticate valid requests, this key must be passed on the HEADER of the API request. To do this we (1) set on the ADF linked service "Authentication type" to "Anonymous". (2) Enable server certificate validation (for added security), and we add auth headers configured as follows:

(3) Name: Ocp-Apim-Subscription-Key
(4) Value: Your API Key

Click Apply to save the configuration of the REST API linked service

On the external call transformation settings we set the "row relative url" to the input: zipCodeDetailsRequest. This allows to create the dynamic calls for the zip code API using each zip code value from the source table.

You can check the Skip writing relative url (used to write the url to the body of the request, this is not needed) and Cache for repeat requests (if you have multiple rows with the same Zip Code, ADF can cache those results to speed up the process).

On the Output tab we need to configure how ADF will process the JSON response (it needs ADF syntax). Under the body definition, the type box must have the following ADF syntax definition of the JSON response:

We can uncheck the Header and Status (we don't need them on this example). In summary what this does is creates a new "column" called "body" that has the JSON response from the API call.

Download the updated ADF syntax definition of the Metadapi Zip Code API here.

Step 4. Add derived column 2 transformation

We don't want to pass the body object created in the step above to the database, we only need to extract 8 "selected" fields from the response. To do this we use again the derived column transformation, we are going to create the 8 columns we need, and parse the values out from the API response object.

We create those columns on the Derived Columns settings (we use the same name as the JSON response, this is the same name used by the target database table). For each of the columns, on the expression box you can either build the expression by using the expression builder (this let's you select the fields you need) or directly type the expression value (Example: body.data.msaCode

Parse column azure data factory
Expression builder parse response

The only exception to this is the expression for latitude and longitude, since the target table has these columns as numbers with decimals, we take the latitude and longitude "strings" and convert them to double using the expression toDouble(). (toDouble(body.data.latitude))

convert to double azure data factory expression

Your final configuration should look like this:

final screenshot response parsing

Step 5. Add select transformation

We now use the select transformation to "remove" unwanted columns from what we send to the target. We do this because as of this writing, we can't send the complex object contained in the "body" column to a target SQL server table.

add select transformation in azure data factory

So we find "body" from the list of columns and we click on the trash icon so it's not passed to the next step.

remove body response in select statement

Step 6. Add sink (target)

The last step in the data flow is the sink (or target table). We add a sink object:

We will leverage the same linked service that was created on step 1 (our source table and target table reside on the same database). If you have your target table in a different database you'll have to create a new linked service for the target database.

We will use the "Auto Mapping" option in the "Mapping" tab of the sink configuration (ADF will map the columns based on the column name, in this tutorial they are all consistent). We are now ready to run the pipeline.

Run pipeline

To run the pipeline select the Pipeline tab CRMLeads and from the options menu select Add trigger/Trigger now to manually run the pipeline.

run pipeline manually in azure data factory

You can then go to the "Monitor" tab to view the process overview. Once the status is "Succeeded" we are done with the execution of the pipeline.

monitor pipeline runs in azure data factory

A quick check into our target table using any SQL tool shows the results of the pipeline execution:

select * from dbo.targetCRM

Screenshot from select statement to verify results

We have successfully created a Data Factory pipeline that calls a dynamically generated REST API endpoint to enhance the data from our CRM system.

Additional resources:

Dynamically Invoking REST API with Data Factory

Prerequisites

Setup

Build Azure Data Factory Pipeline

Step 1. Add Source

Step 2. Add derived column transformation

Step 3. Add external call transformation

Step 4. Add derived column 2 transformation

Step 5. Add select transformation

Step 6. Add sink (target)

Run pipeline

Categories

Tags

Dynamically Invoking REST API with Data Factory

Prerequisites

Setup

Build Azure Data Factory Pipeline

Step 1. Add Source

Step 2. Add derived column transformation

Step 3. Add external call transformation

Step 4. Add derived column 2 transformation

Step 5. Add select transformation

Step 6. Add sink (target)

Run pipeline

Related Posts

Categories

Tags