“The page seems to have no END”
“Heck, How many more pages does this website have?”
“Why am I seeing this content again even though I moved on to the next Page”
These are a few questions which you may have asked yourself while browsing Google or Twitter or FB news feed. Well, this is a major issue in indexing Real Time Data. We need to analyse it and obtain optimized search results on its basis.
We can use an approach for breaking large record sets into smaller portions called pages. This process is called pagination. As a developer(unlike me), you could be familiar with implementing pagination. But for the most of us implementing pagination for real time data is tricky.
Here, we are going to discuss the solutions:
- Offset based – real time data pagination
- Cursor based pagination
Real time data: Information delivered immediately after collection. There is no delay in the timeliness of the information provided. In such applications, it’s difficult to provide accurate paginated data due to the frequent updates.
Offset based – real time data pagination
Let’s take a look at the issues with offset pagination when managing real time data.
- Assumes the data is static and doesn’t change frequently – In default pagination, a retrieved record set is split into a number of pages. Results of such pagination become inaccurate when adding new data or removing existing data. Many times data is not frequently changed, users feel like the pagination is working accurately.
- Pagination only considers record count, instead of each individual record – A redundant display of records is received. Using the total record count and paginated normally the records are broken into pages. It doesn’t consider whether each record falls into the right page on pagination.
Let us assume that initially, I have 18 records(auto arranged in alphabetical order). We have set a limit of displaying 6 records on a single page.
Assuming for now that the result set is updated by 3 new records while we are still on the first page (auto arranging them in alphabetical order):
When we navigate to the second page, based on our first image, it should retrieve the records that contain
However, records with data
is retrieved. In the record, we can see
Cursor based pagination
Cursor based pagination eliminates the problems faced in Offset based pagination. To get a better understanding of the subject, Let’s see how cursor based pagination works using our earlier example.
We had 18 records broken into 3 pages. Assuming we are on the first page and adding 3 new records to the list.
In cursor based pagination we are mainly using the content parameter. With each page, we will retrieve the last record of the page. Then we use it to generate the next result set.
For a clearer understanding have a look at these images
- Image 1 previews the previous scenario.
- Image 2 previews the cursor based pagination scenario
We can now see that accurate results for the second page by eliminating 9 records at the instead of 6 in offset based pagination. In cursor based pagination we can’t consider the concept of pages. As they change rapidly, so the results are considered as either previous or next.
People often work with offset based pagination and rarely get a chance to work with cursor based pagination. So it’s important to identify the differences and benefits of each technique for using them in appropriate scenarios.
- Offset pagination contains page numbers in addition to next and previous links. But due to the highly dynamic nature of the data, we can’t provide page numbers for cursor based pagination.
- Offset based pagination and cursor based pagination allows us to navigate in both directions.
Let us now consider a realistic case. When we have millions of data sets and the auto updates take place by thousands per second. Using offset based pagination, we miss a lot of relevant data and we would never get to know about it. Also, we would receive multiple duplicate results. On the contrary in cursor based pagination, we would obtain data in real time. There is no loss using cursors. The value based approach uses the last row of the previous page as an “offset”. This enables us to retrieve the next page from the database.
Use whatever you deem right but for any slightest of doubts in the mind. Feel free to talk to the experts. Oh, oh!! Let me rephrase SolrExperts 😀