Data Imports in Optimizely: Part 3 - Query data efficiently
One of the more time consuming parts of an import is looking up data to update. Naively, it is possible to use the PageCriteriaQueryService to query a each page, one at a time as it is needed. While this does work, it can add significant overhead as it requires potentially thousands of queries.
Instead, since virtually all pages are going to need to be accessed, it is more efficient to preload all of the pages using batch APIs and then query them from the in-memory data.
The key here is using the ListContentOfContentType method of IContentModelUsage to get all of the content references using the content type, and the GetItems() method of IContentRepository to batch load items.
var allPageRefs = _contentModelUsage
.ListContentOfContentType(communityContentType)
.Select(x => x.ContentLink.ToReferenceWithoutVersion());
var allPages = _contentRepository
.GetItems(allPageRefs, new LoaderOptions { LanguageLoaderOption.MasterLanguage() })
.OfType<ImportedPage>()
.GroupBy(x => x.LegacyID)
.ToDictionary(x => x.Key, x => x.First());
Then, to load the page for editing, it’s just a matter of looking up the ID in the dictionary.
if (allPages.TryGetValue(importedData.Id, out var page))
{
page = (ImportedPage)page.CreateWritableClone();
}
else
{
page = (ImportedPage)_contentRepository.GetDefault<ImportedPage>(importFolder).CreateWritableClone();
}
Some considerations:
This can use a significant amount of memory as it is loading a potentially large segment of the database. In my testing, this can even be multiple GB of data. However, RAM is plentiful and in my testing it is well within the amount of available memory on Optimizely DXP.
Loading all of the items takes a couple of seconds. However the alternative is much more. The PageCriteriaQueryService can take a significant amount of time to query each time it is used, which can add up to many minutes of additional time, just querying pages.