Answer to: How to choose between brute force and efficient solution that has overhead?
Score: 12
... I could implement the brute force approach and monitor in production so I'm not prematurely optimizing...
Do this, except not in production. The brute force approach is likely faster to implement and test, but I wouldn't advise you do this in production straight away. You'll want a safe pre-production environment with realistic enough data to feel like the brute force solution won't cause production problems.
Once you have confidence in this solution, deploy to production and monitor.
The challenge is trying to predict how large the dataset will be, because that ends up being the limiting factor with brute-force approaches. Production, at the present moment, is your best comparison.
So, let' say you deploy this and start monitoring. When is there a problem? You need to define a threshold for performance and then proactively notify developers when runtime exceeds that threshold while the system still responds quick enough to avoid an emergency patch to production.
I would run some experiments to get a feel for how well the brute force approach scales; when does this become untenable? If all you have is production data, then consider simply copying and pasting data to make the dataset bigger. I like to make orders of magnitude jumps in size as a first rough test: increase the amount of data 10x for each jump.
So, test 1 involves 100 vector embeddings. Test two would be 1,000 vector embeddings, and then 10,000 vector embeddings, etc. until you hit an obviously terrible panic-inducing level of performance.
If you find the brute force approach handles 1,000 reasonably well then this indicates it will suffice. If 1,000 vector embeddings causes your computer to fall over, then consider dialing things back: how does 5x perform? How about 3x or 2x?
If the brute force approach really can't handle much more than production already has, then this could indicate you need to invest time upfront in a more complex but efficient solution.
View Question ↗
Question
Parent Entity
Score: 9 • Views: 3,561
Site: softwareengineering
Other Comments / Reviews
SaaS Metrics