19 January Dynamically Invoking REST API with Data Factory January 19, 2023 By Ricardo Rangel Tutorials Azure Data Factory, Azure SQL Server, Data Pipeline, Zip Code API In this entry we are looking at dynamically creating a REST API call based on the source data within a Data Factory pipeline. We’ll use the Zip Code API from Metadapi to showcase how this is done. Please note that this pattern can be used to create a data pipeline between a file and a file, file and any database table, Data Lake and SQL Server, Data Lake and Azure Synapse, etc. the pattern just needs to replace the source and/or target endpoints and you can have many different integrations on your pipeline. Here's the pattern: Prerequisites Knowledge of data pipeline concepts, data factory and SQL server. Azure SQL Server DB and Azure Data Factory resources running on your Azure account. The US Zip Code API subscription key. Setup We have 2 tables that will be used on this tutorial, the first one called dbo.sourceCRM that contains the customerId and zipCode from a sample leads database. Our goal is to take this basic zip code information and append to it additional data that will be used by our analytical platforms and load it into table dbo.targetCRM with the following additional columns (find the DDL used and table load script in our github repository): stateCode stateName titleCaseCountyName titleCaseCityName latitude longitude msaCode msaName If you would like to see all the data points available on the US Zip Code API please look at the JSON schema definition. Build Azure Data Factory Pipeline From your Azure Data Factory Studio create a new pipeline. This pipeline will include the required data flow to implement the ETL pattern. In this example we'll name the pipeline "CRMLeads" From the Activities menu, let's add a new DataFlow activity. When you create a new Data Flow you must go to the settings and create a new data flow (again because this is a net new data flow, and we are not re-using an existing one). Our data flow will include 6 tasks or steps for the ETL pattern to complete, at the end of the build the flow should look like this: Step 1. Add Source The source table that resides on an Azure SQL Server database needs to be added to the flow, click on the "Add Source" drop down and select "Add Source" On the source settings tab, click on create new dataset (this is needed to configure the SQL Server database and table that will be used on this pipeline) From here you're able to select Azure SQL Database from the list of options and click continue. We should be able to create the linked service (for our database). Click on "New" to setup the linked service. Here we'll input the database connection information. Since our DB is running on the same account as the data factory service, we'll select the database and provide the connection information. (this configuration will be different on each organization). We need to input all the database information from our Azure subscription. Fill in your database connection details. We should be able to now select the source table we'll use for this pipeline. Look for table dbo.sourceCRM (created on the setup step of this tutorial). Once selected we can click Ok. The source transformation is now complete. Step 2. Add derived column transformation The next transformation we add on the data flow is a derived column transformation. We need to take the zip code string from the data source and "build" a REST API request for each row (this transformation doesn't make the API call, but helps on building the "endpoint" needed to get the Zip Code details). The endpoint we'll use from the API service is https://global.metadapi.com/zipc/v1/zipcodes/{zipcode}. The last part of the endpoint needs to be replaced with the actual zip code from the source database, to do this we'll create a new column on our data flow called zipCodeDetailsRequest that has the following expression: '/zipcodes/' + zipCode We will use this column in the next transformation. Step 3. Add external call transformation The external call transformation allows to call an external service (i.e. a REST API) and capture the response to use on the pipeline. When we add the external call, select REST for the Inline dataset type. Then we need to configure the "linked service" that will be used for the API call (this configures the connection information to the API service). We name the linked service as "ZipCodeAPI" and set the base URL as : https://global.metadapi.com/zipc/v1 (as a best practice add this API at the root level so you can re-use the same linked service on all the endpoints available from the service). The US Zip Code API uses an API key to authenticate valid requests, this key must be passed on the HEADER of the API request. To do this we (1) set on the ADF linked service "Authentication type" to "Anonymous". (2) Enable server certificate validation (for added security), and we add auth headers configured as follows: (3) Name: Ocp-Apim-Subscription-Key (4) Value: Your API Key Click Apply to save the configuration of the REST API linked service On the external call transformation settings we set the "row relative url" to the input: zipCodeDetailsRequest. This allows to create the dynamic calls for the zip code API using each zip code value from the source table. You can check the Skip writing relative url (used to write the url to the body of the request, this is not needed) and Cache for repeat requests (if you have multiple rows with the same Zip Code, ADF can cache those results to speed up the process). On the Output tab we need to configure how ADF will process the JSON response (it needs ADF syntax). Under the body definition, the type box must have the following ADF syntax definition of the JSON response: We can uncheck the Header and Status (we don't need them on this example). In summary what this does is creates a new "column" called "body" that has the JSON response from the API call. Download the updated ADF syntax definition of the Metadapi Zip Code API here. Step 4. Add derived column 2 transformation We don't want to pass the body object created in the step above to the database, we only need to extract 8 "selected" fields from the response. To do this we use again the derived column transformation, we are going to create the 8 columns we need, and parse the values out from the API response object. We create those columns on the Derived Columns settings (we use the same name as the JSON response, this is the same name used by the target database table). For each of the columns, on the expression box you can either build the expression by using the expression builder (this let's you select the fields you need) or directly type the expression value (Example: body.data.msaCode The only exception to this is the expression for latitude and longitude, since the target table has these columns as numbers with decimals, we take the latitude and longitude "strings" and convert them to double using the expression toDouble(). (toDouble(body.data.latitude)) Your final configuration should look like this: Step 5. Add select transformation We now use the select transformation to "remove" unwanted columns from what we send to the target. We do this because as of this writing, we can't send the complex object contained in the "body" column to a target SQL server table. So we find "body" from the list of columns and we click on the trash icon so it's not passed to the next step. Step 6. Add sink (target) The last step in the data flow is the sink (or target table). We add a sink object: We will leverage the same linked service that was created on step 1 (our source table and target table reside on the same database). If you have your target table in a different database you'll have to create a new linked service for the target database. We will use the "Auto Mapping" option in the "Mapping" tab of the sink configuration (ADF will map the columns based on the column name, in this tutorial they are all consistent). We are now ready to run the pipeline. Run pipeline To run the pipeline select the Pipeline tab CRMLeads and from the options menu select Add trigger/Trigger now to manually run the pipeline. You can then go to the "Monitor" tab to view the process overview. Once the status is "Succeeded" we are done with the execution of the pipeline. A quick check into our target table using any SQL tool shows the results of the pipeline execution: select * from dbo.targetCRM We have successfully created a Data Factory pipeline that calls a dynamically generated REST API endpoint to enhance the data from our CRM system. Additional resources: US Zip Code API Homepage Azure Data Factory Documentation ADF Source Object Documentation ADF External Call Documentation ADF Select Documentation ADF Derived Column Documentation Related Posts The Power of Zip Code Statistics: Leveraging IRS Data for Targeted Market Analysis This blog post introduces the immense potential of zip code statistics for business analysis, highlighting how leveraging IRS data through an API can enable smarter, targeted marketing decisions. API Clients Review - 2024 This article reviews some of the best API client tools today. Revision 2024 Top 5 Benefits of Accessing Zip Code Data through an API Accessing zip code data through an API offers significant benefits for businesses and developers looking to leverage data related to Zip Codes. Key advantages include real-time data access for up-to-date insights, precision in targeting through hyper-local information, enhanced data customization, scalability for large-scale analysis, and seamless integration to improve user experience. MSA Codes by Zip Code for Targeted Data Insights In the dynamic landscape of data analysis, harnessing the power of Micro Statistical Area (MSA) codes linked to zip codes opens a myriad of possibilities for insightful exploration. From market research and demographic profiling to targeted marketing strategies, this unique correlation facilitates precision in data analysis. This article delves into the expansive realm of MSA by zip code, shedding light on its diverse applications and how businesses and researchers alike can leverage this invaluable data set for strategic advantage. Getting Income Statistics by Zip Code Understanding the income statistics of specific areas can be incredibly valuable for businesses, researchers, and policymakers. In this blog post, we'll explore how to obtain income statistics by zip code using the Zip Code API, highlight some sample use cases, and delve into one detailed use case to illustrate its practical application. Get US Population By Zip Code Unlocking the demographic pulse of a region has never been more accessible, thanks to the US Zip Code API services leveraging the rich dataset from the Census Bureau's ZCTAs (ZIP Code Tabulation Areas) population data. In this blog, we embark on a journey to demystify the intricacies of ZCTAs, explore the fusion of Census Bureau data with modern API technology, and showcase the myriad ways in which the Population by Zip Code functionality can be a game-changer for businesses, researchers, and developers alike. Please enable JavaScript to view the comments powered by Disqus. blog comments powered by Disqus