Data Imports in Optimizely: Part 6 - Parallelize - if it makes sense

Parallelizing a scheduled job can be a useful strategy for increasing the performance of an Optimizely import. .NET exposes some simple APIs for parallelizing. There are, however, some caveats to parallelizing a job that may make it less important for an Optimizely import.

Parallelizing an Optimizely Scheduled Job

To parallelize a scheduled job, you can make use of the Parallel class - namely the Parallel.ForEach() and Parallel.ForEachAsync() methods. In the case of async (see part 5), then you will want to use the async method. If you are not doing any async code within the processing, then you can use the non-async method.

Implementing the parallel methods is mostly a matter of replacing the foreach with the appropriate parallel method, as shown below:

using EPiServer.Scheduler;
using Nito.AsyncEx;

public class MyJob : ScheduledJobBase
{
    private CancellationTokenSource? _cancellationTokenSource;

    public override void Stop()
    {
        _cancellationTokenSource?.Cancel();
    }

    public override string Execute()
    {
        _cancellationTokenSource = new CancellationTokenSource();

        return AsyncContext.Run(async () =>
        {
            return await MyAsyncTask(_cancellationTokenSource.Token);
        });
    }

    public async Task<string> MyAsyncTask(CancellationToken cancellationToken)
    {
        var itemsToProcess = await GetItemsAsync(cancellationToken);

        await Parallel.ForEachAsync(itemsToProcess, cancellationToken, async (item, token) =>
        {
            var tasks = new List<Task>();

            foreach (var subItem in itemsToProcess.Subitems)
            {
                tasks.Add(ProcessSubItemAsync(subItem, cancellationToken));
            }

            await Task.WhenAll(tasks);
        });

        if (cancellationToken.IsCancellationRequested)
        {
            return "Job was cancelled.";
        }

        return "Job completed successfully.";
    }
}

Note that in this example, we are passing the cancellation token to the Parallel.ForEachAsync() method to handle cancelling the iteration for us, and then we are returning an appropriate message from the job based on whether it was cancelled or not.

When not to parallelize in an Optimizely scheduled job

To decide whether to parallelize an Optimizely scheduled job, it is important to consider when parallelizing is useful and when it is not.

When to consider parallelizing

  • When your work is CPU-bound and would benefit from running across multiple threads

  • When your work does not saturate the database resources

  • When you have a large number of items to process

When not to parallelize

  • When you are able to saturate the database resources without parallelizing

  • When you have a small number of items to process

In an Optimizely context, the biggest consideration is whether the database resources are getting saturated without parallelizing. If the database resources are saturated, then no amount of parallelizing is going to make it go faster, and you are more likely to see database timeout exceptions. If you are utilizing Optimizely DXP, then keep in mind that the DXP environments (especially integration) may have more limitations on database resources than you see in your testing. It is recommended to start with a non-parallel approach and test on DXP to see if you are able to saturate the database resources (eg that it uses 100% of DTUs for the database).

Previous
Previous

Data Imports in Optimizely: Part 7 - Separate jobs to be run independently

Next
Next

Data Imports in Optimizely: Part 5 - Use async where possible