Azure Data Factory Triggers Tutorial | On-demand, scheduled and event based execution

when designing data movement workflows it is very important to also design for how those work flows will be executed this is Adam and I will show you four ways that you can execute and trigger your data factory pipelines so stay tuned azure data factory can be triggered through multiple ways currently there are four ways that you can trigger data factory from there are two time-based triggers which is scheduled in a tumbling window there there’s a way for you to trigger pipelines through events or trigger them manually the first type of trigger is scheduled trigger this is a trigger where you associate a pipeline with a date and recurrence to trigger it every time this happens important thing is that this is a many-to-many relationship so many triggers can kick off a single pipeline or a single trigger can kick off multiple pipelines in case you are wondering how this works is just a simple interval based trigger so you say happen every one day every one week every one month or maybe an hour and of course they can overlap so be careful about this when specifying a specific dates you can also specify specific hours and minutes to make sure the bigger granularity for your pipeline execution when specifying weeks you can specify a specific days of a week or in case of a month it those can be a specific days during the month the next you have a tumbling window triggers tumbling window triggers are simply said also a time-based triggers where there are fixed size non-overlapping intervals so you can say trigger my pipeline every 15 minutes and then the non-overlapping part basically means then you can specify how many of those triggers can run concurrently the distinction here is that this is a one to one relationship so if you want to use it for your pipelines you need to create a tumbling window trigger for each pipeline that you want to use this functionality and the difference here is in case of the example they gave if your interval is one hour and your mock concurrence is two or more they can overlap when the execution happens because during the first hour that there was only two concurrent runs happening but if you set my concurrency to 1 the second pipeline will need to wait until the first one finishes so make sure to adjust this property very well so you don’t get infinite loop of execution the third one is even by trigger data factor is very well integrated with a blob storage and it does that integration through something called azure event grid if you don’t know what event grid is check my video on the event grid introduction but very quickly set is just a service to trigger and route events and this integration allows for you to trigger pipelines whenever there are new files arriving or being deleted on our storage account specifically blob storage this works in a way that when the file is being put on a blob storage there’s an event that being triggered to the event grid event grid propagates this as a subscription to a data factory and data factory executes a pipeline based on this event information note that this is only information about the event itself so the file name and file path but the content of course you have to get yourself you’re not getting content of the file as an input to your pipeline you actually have to use that event information to get the file yourself and then you can do that through using expression you will get two properties from your trigger body a file name and a folder path that you can use within the pipeline to get the file and lastly you have a manual trigger with an manual trigger you can either use user interface but underneath this is just a REST API so you can use something like logic app the logic app it’s as simple as using a single block and supplying a couple of properties so what do I have for you today there are four demos that we’re gonna do a schedule based execution tumbling window based even based and a manual execution through logic app so let’s go to the portal in the portal as always let’s start from creating services I will need the storage account so that I can store some of files I’m gonna select my resource group provide the name that’s gonna be ADF trigger demo and everything as default hit create and we’re done second of all we’re gonna need the data factory so I’m gonna also hit create give it the same name and use the same resource group and select north Europe

and hit create and that’s it let’s wait a couple of minutes about 10 seconds later we got both of our resources so you can start creating but what I will need is two containers on my storage account one for input and one for the output input is where I will have my input files that pipelines will trigger on an output is where I will put a copy of the file that’s a very simple pipeline so we’re just gonna copy a file from input to output using data factory so let me input one file and upload something so I’m gonna upload a demo CSV file hit upload and that should be fine so what we need to do is go to data factory go to outer and monitor and create a very simple pipeline so I’m gonna create a pipeline first of all as always create a link service it’s gonna be a blob storage link service leave everything as default I’m just going to select subscription IDF demo it create so connection is established and we need two datasets so a blob store data set CSV based and that’s gonna be called input input CSV using our connectivity first role as header and a browse for the input and then demo CSV file I don’t need a schema so I’m gonna leave this as none and I need second dataset called output and this is using the same link service I want to export with headers and I also want to do it for the output container so I’m gonna hit OK actually I need to select none in the import schema section and I have two data sets I need to create a very simple pipeline in which I’m gonna copy the data from this input file into the output file this was very fast in case you didn’t follow please check my videos on a data factory but this is a very simple pipeline that will grab this input file and put it into output so what we need to do right now is first of all parameterize those data sets sangmi the file name as a parameter which i’m gonna use in a connection here so instead of demo csv i will allow this to be parameterize and the same for the output file i want to parameterize the output file names i also will create a parameter called filename and use that file name here as a dynamic property so now within our copy data we can specify demo csv and as an output that was an output that’s gonna be also called demo CSV and input also called demo CSV if everything works we can publish everything and in the meanwhile we can also hit the bug to see if this pipeline is working right now what it should do is should pick up demo CSV file from our input container and put the same file into output container and it worked after 4 second it finished we can confirm our demo wart by going to our research group storage account and a blob and the output so it works so since we have a very simple pipeline quickest start working on triggers one of the important thing about triggers is that they need to work on already published pipelines therefore I needed a very small pipeline which took me Freeman is to create so let’s use that pipeline to create triggers first honest and time-based trigger a schedule page trigger to do create new trigger you hit on add trigger hit new from the drop-down you pick a new trigger and give it a name I’m going to call it schedule since it’s the type of schedule and you can specify first of all a start I than you can say start running from tomorrow at 8:14 o’clock actually I don’t want to hit create yet next what you can do you can say what is the recurrence so one every minute hour you can say run every day in case of days you are already able to specify

something so at one o’clock at four o’clock and then you can specify for instance at 1:30 that will make that you will run this pipeline at 1:30 o’clock and 4:30 o’clock every day if you go further by specifying week you are also able to specify which days of the week this means it will run every Tuesday and Thursday at 1:30 and 4:30 o’clock and there there’s no end date so run it continuously forever and of course you can change it to month any kinds of man you can specify specific days of demand study one to execute it on and hit greed and hit okay so of course trigger is just a pipeline object so you need to publish for this to take an effect when you publish the trigger you will be able to find all your trigger within the trigger tab on the bottom here and you can add it as specified in case you have more than one pipeline you can also associate that by hitting a trigger newer edit and specify existing trigger if you want to specify existing trigger simply go to the drop down and hit trigger and hit OK now when you publish it you will have sorry for the pipeline I forgot to add activity remember that pipeline have to have at least one activity the simplest way to troye test that is by adding weight because it doesn’t read any input or output so let’s publish this and after publishing if you go to the triggers tab you see that the schedule that greeted is associated with two pipelines and you can check which point nine stores are so that was the most basic schedule trigger so let’s create a new one this time we’re gonna create a new trigger a tumbling – trigger tumbling based let me change that name tumbling base triggers are a bit more complex because while they’re very simple because you’re just specify run this trip pipeline every 15 minutes they have a bit more Advanced Options as I said you can find them in Advanced tab so while this basic configuration is very simple those advanced features allow you to specify some cool stuff like a deal ice run at every 15 minutes with some sort of delay you can set that mark on currency once so you make sure that the pipeline will never be executed using the trigger while there’s one already you can specify retry policy and retry interval here and that’s not the topic for today that but there is a way for you to create dependency between a tumbling place triggers it’s a very cool way to actually create a chain of triggers not based on each other so without advance options you just specify run it every 15 minutes and that’s it so let’s move to the third example of triggering your pipelines using events so to do that I will actually go to our triggers and delete the existing triggers I have a bit more clean canvas to do that and I will go back to the pipeline since we’re gonna be triggering on events that means our filename will come as a parameter so we need to change our pipeline to accept parameters so I’m gonna hit new it create by plan name and that’s going to be a parameter called finally I need to bubble down that parameter into the source so instead of giving a static value of demo CSV I’m using a pipeline parameter called filename and I’m gonna do the same for our output had a dynamic content and use the pipeline parameter file name that means our pipeline if I would try to trigger this now right and actually I need to publish that first if I will try to trigger this now it will ask me to supply a name and when I do that I will be actually able to supply the event information so if I will trigger now it will ask me to provide a file name like demo CSV so let’s add a new trigger in this case it’s gonna be new even-paced trigger I’m gonna call it a new blob event you need to specify a storage account since we already said that this works with even grid and even grid is attached to the storage account you need to specify that storage account and a container name you can do some filtering here in case you want you can use a blob path begins

with to do a triggering based on a subset of folders or you can do blob path ends with to also filter only part of the files right for instance if you’re processing a CSV file you can use this to do a filtering only for CSV files I’m gonna be triggering for the new files only since I’m trying to copy the files so that’s gonna be a blob created event and I’m gonna hit continue notice something interesting here you are actually being able to see what kind of files this would normally trigger on so it’s gonna file though find those files that match your criteria so in my case it would already trigger on demo CSV of course it won’t in this case because that file was already uploaded so that event was already triggered and it doesn’t trigger for historical trigger events only for the new ones so we’ll hit create and for when specifying this trigger you need to specify the pipeline parameters because we specified that our pipeline is taking one parameter and it’s called file name we need to supply a value here of course you could just type demo CSV and it would run but this doesn’t make sense you need to actually pass that trigger body parameter that I was specifying previously so using trigger body function so gave me what came from this even grid service what is the event body and from that event body give me the property called final name so when you hit create and publish this in just couple of seconds it will have this pipeline ready and being triggered on every new file uploaded so since this is working now and was published successfully we can go back to our subscription open a dashboard open our resource group go to our demo container go to the blob storage in the input container and upload another CSV file maybe this time I’m gonna upload both demo and movies file hit override and that’s it if we did this correctly we can actually go to monitor to see two pipelines running this was super fast because even grade itself is super fast it’s already triggering two pipelines and they run we can probably refresh the status to see to succeed at pipelines if we go back to our blob storage and go to the output container you can see two files copied successfully with the current date so you see how easy it is to set up even by its triggers so let’s write the last demo for the last demo let’s create a logic app and for that logic up we’re gonna be for instance submitting some sort of file maybe that’s gonna be always demo CSV I’m gonna use existing resource group and leave North Europe as default I will use this logic gap just to show you how to execute that data factory from logic up because it’s super simple but it allows you to do that complex logic that you would not be able to do within data factory so I’m gonna start from the blank logic up use a request trigger in case you don’t know how logic helps work feel free to check my video on the logical introduction but for now this will be just fired once by me manually but of course you can change this for instance delete this trigger and you can make the you schedule says you see you can extend it through even sliding windows for records so maybe every three minutes go to other data factory and create a pipeline run when you do that you need to sign in I will sign in using my account but of course you can sign in using server’s principle service principle simply set an application account that you can use instead of your personal one and of course that should be a recommended way to do that so you choose the subscription where your data factories and the resource group name of your data factory and lastly a name of the pipeline we have two pipelines I will execute the by playing one since the second one is empty and remember I could probably just save it and run it but this would fail so what you need to do is you need to specify parameter and you need to pass a parameter name so our pipeline is taking on parameter it’s called filing so you need to supply a JSON actually I’m missing unquote mark here with list of parameters that we’re going to be executing this with so I’m just

gonna hit run and let’s see if this worked if this worked in at just couple of seconds a new run should appear on another data Factory so let’s first see here this cell was submitted successfully we got response with a run ID from our data factory and if we go to the data factory and refresh we see our run successfully submitted it was actually submitted twice I probably clicked something twice but it doesn’t matter we were able very easily to submit run from the logic app itself with this you can actually create much more complex logic or react to many many more events that you would be able to do that from data factory so a logic apps are very nice extension to the data factory and this is not only scenario we very often do that at our company because logic apps are smart much more powerful complex business logic workflows than data factory and we just use data factory to copy and transform data as you see there are many ways to trigger data factory workflows it is up to you to design how your business logic will look like and how to trigger the workflows to fit your business needs the best way that’s it for today this was Adam if you liked the video hit thumbs up leave a comment and subscribe if you want to see more and see you next time you