We often use the 8-4-1 paradigm to compare two blocks of memory:
1. First compare by 8-byte blocks in a loop
2. Then compare by 4-byte blocks in a loop
3. Last compare by 1-byte blocks in a loop
It can be proved that the second loop runs at most once. So we can replace the loop with a if statement, which will save us a comparison and two jump operations.
According to the discussion in https://github.com/apache/arrow/pull/5508#discussion_r343973982, loop can be expensive.