Question Details

No question body available.

Tags

c assembly bit-manipulation x86-64 simd

Answers (1)

March 9, 2026 Score: 3 Rep: 56 Quality: Medium Completeness: 60%

What you are doing is essentially a bitwise substring search (cross-correlation ), you code is slow because it compares bit by bit and it performs nested loops (worst case ≈ 32×32 operations for the 32-bit example, and 128×128 for the real problem).

The key to speeding this up on x86-64 is Work on whole registers (64/128 bits) instead of bits, Use XOR to detect mismatches, Use bit masks to test only the overlapping region.

This alone is already ~20–50× faster than your code,

#include 
#include 

int CrossCorrelate128(uint64t hi, uint64t lo, uint64t nhi, uint64t nlo, int NeedleLen) { for (int shift = 0; shift < 128; shift++) { int overlap = NeedleLen < (128 - shift) ? NeedleLen : (128 - shift);

uint128t H = ((uint128t)hi