mirror of
https://github.com/hwchase17/langchain
synced 2024-11-04 06:00:26 +00:00
Optimize the cosine_similarity_top_k function performance (#8151)
Optimizing important numerical code and making it run faster. Performance went up by 1.48x (148%). Runtime went down from 138715us to 56020us Optimization explanation: The `cosine_similarity_top_k` function is where we made the most significant optimizations. Instead of sorting the entire score_array which needs considering all elements, `np.argpartition` is utilized to find the top_k largest scores indices, this operation has a time complexity of O(n), higher performance than sorting. Remember, `np.argpartition` doesn't guarantee the order of the values. So we need to use argsort() to get the indices that would sort our top-k values after partitioning, which is much more efficient because it only sorts the top-K elements, not the entire array. Then to get the row and column indices of sorted top_k scores in the original score array, we use `np.unravel_index`. This operation is more efficient and cleaner than a list comprehension. The code has been tested for correctness by running the following snippet on both the original function and the optimized function and averaged over 5 times. ``` def test_cosine_similarity_top_k_large_matrices(): X = np.random.rand(1000, 1000) Y = np.random.rand(1000, 1000) top_k = 100 score_threshold = 0.5 gc.disable() counter = time.perf_counter_ns() return_value = cosine_similarity_top_k(X, Y, top_k, score_threshold) duration = time.perf_counter_ns() - counter gc.enable() ``` @hwaking @hwchase17 @jerwelborn Unit tests pass, I also generated more regression tests which all passed.
This commit is contained in:
parent
ddc353a768
commit
db9d5b213a
@ -46,11 +46,11 @@ def cosine_similarity_top_k(
|
||||
if len(X) == 0 or len(Y) == 0:
|
||||
return [], []
|
||||
score_array = cosine_similarity(X, Y)
|
||||
sorted_idxs = score_array.flatten().argsort()[::-1]
|
||||
top_k = top_k or len(sorted_idxs)
|
||||
top_idxs = sorted_idxs[:top_k]
|
||||
score_threshold = score_threshold or -1.0
|
||||
top_idxs = top_idxs[score_array.flatten()[top_idxs] > score_threshold]
|
||||
ret_idxs = [(x // score_array.shape[1], x % score_array.shape[1]) for x in top_idxs]
|
||||
scores = score_array.flatten()[top_idxs].tolist()
|
||||
return ret_idxs, scores
|
||||
score_array[score_array < score_threshold] = 0
|
||||
top_k = min(top_k or len(score_array), np.count_nonzero(score_array))
|
||||
top_k_idxs = np.argpartition(score_array, -top_k, axis=None)[-top_k:]
|
||||
top_k_idxs = top_k_idxs[np.argsort(score_array.ravel()[top_k_idxs])][::-1]
|
||||
ret_idxs = np.unravel_index(top_k_idxs, score_array.shape)
|
||||
scores = score_array.ravel()[top_k_idxs].tolist()
|
||||
return list(zip(*ret_idxs)), scores # type: ignore
|
||||
|
Loading…
Reference in New Issue
Block a user