In this post I am going to talk about an EventHubTrigger and data reliability in azure function. Also how to handle data loss in case of issues in azure function.
Generic scenario people use for EventHubTrigger is receiving device telemetry data from IoTHub and massaging that data and pass it to down system like database, rest api call for storing the data. Which can be used later for display or analytics purpose.
The current implementation of EventHubTrigger doesn’t provide any Fault handling mechanism. So once data is received in azure function the checkpoint maintained in azure function storage account against each partition is increased regardless your function completes successfully or not. Even if you throw an exception from the function as output the azure treats the host completed and increments the checkpoint.
This is critical issue considering reliability of data. Consider a scenario if database or rest api is down but your azure function is running. The data is ultimately lost unless you go back and manually change the checkpoint files to previous successful checkpoint.

The content of the blob file look like below. See the pointers for offset and sequence number for a partition. The file is automatically updated once function instance completes the execution regardless of error.
{“PartitionId”:”1″,”Owner”:”asfasfasf”,”Token”:”asafsa”,”Epoch”:259,”Offset”:”266318360104“,”SequenceNumber”:2432227}
How to handle issues during Data Processing
- If it is a random failure from database or rest api you can add retry logic with some delay like 5 sec with each retry.
- If your database is down or rest api is down due to network issues or load issues or any other issue then the other option currently available is killing the function instance using standard process libraries.
Killing Function execution : You can use Process.GetCurrentProcess().Kill() from exception handling block which will basically indicate to azure that instance has stopped and it will not increment the pointer.
Process.GetCurrentProcess().Kill();
Drawbacks of this Approach:
- You will not see anything in logs in azure portal or app-insight
- You will still pay up the cost of function execution as your function gets triggered again with old and new data again n again.
The issue for logs can be easily handled using alert functionality like sending an email to team that something is went wrong and data is not getting processed. So that team can take decision to stop the azure function until the down systems like database or rest api are restored back to normal position.
To avoid the cost you can disable the azure function until the down systems comes back. I have another blog post which talks about programetically disabling function. -> https://amarplayground.home.blog/2019/01/03/enable-disable-azure-function-using-rest-api/
If down system has hearbeat kind of thing implemented then enable/disable functionality can be hooked with that to automatically start/stop the function to avoid the manual intervention and data loss.
Until Microsoft azure provides inbuilt support for handling the issues in data processing in azure function and avoid auto incremental checkpoints this can be used as a solution to avoid the data loss.
Another Approach: Another approach can be managing the checkpoint files. The last successfull checkpoint can be maintained in seperate database from azure function. So whenever such situation occurs we can restore the checkpoint in the storage account so the we receive the data back again. But yes provided that you restore the checkpoint files before retention period for data at IoTHub which is default 7 days. I will talk about this in another blog post.
Thank you for reading. Happy coding. If you have any question or suggestion please do not hesitate to post a comment on this blog. I would be happy to discuss with you.
Ref :https://hackernoon.com/reliable-event-processing-in-azure-functions-37054dc2d0fc

