Skip to content

In the non-fast mode of accum_or_assign and when buckets are full, premature unlocking results in data corruption during writes #239

@zhangyachen

Description

@zhangyachen

related to #232

#232 mentions that when max_hbm_for_vectors is set to a relatively small value, unit tests fail. I debugged the code and found the root cause: when max_hbm_for_vectors is set to a relatively small value and the number of inserted elements B is greater than max_capacity, this results in a high load factor for the hash table. At this point, find_and_lock_when_full is invoked. find_and_lock_when_full locks the key, but in accum_or_assign_kernel, the key is unlocked, which occurs before the vector is actually written via write_with_accum_kernel. This creates a race condition.

template <class K, class V, class S, int Strategy, uint32_t TILE_SIZE = 4>
__global__ void accum_or_assign_kernel(...) {
  for (size_t t = (blockIdx.x * blockDim.x) + threadIdx.x; t < N; t += blockDim.x * gridDim.x) {
    // ... some code    

    const int bucket_size = buckets_size[bkt_idx];
    do {
      if (bucket_size < bucket_max_size) {
        occupy_result = find_and_lock_when_vacant<K, V, S, TILE_SIZE>(
            g, bucket, insert_key, insert_score, evicted_key, start_idx,
            key_pos, src_lane, bucket_max_size);
      } else {
        start_idx = (start_idx / TILE_SIZE) * TILE_SIZE;

        // When buckets are full , invoke this
        occupy_result = find_and_lock_when_full<K, V, S, TILE_SIZE,    
                                                ScoreFunctor::LOCK_MEM_ORDER,
                                                ScoreFunctor::UNLOCK_MEM_ORDER>(
            g, bucket, insert_key, insert_score, evicted_key, start_idx,
            key_pos, src_lane, bucket_max_size);
      }

      occupy_result = g.shfl(occupy_result, src_lane);
    } while (occupy_result == OccupyResult::CONTINUE);

    // ... some code    
    
    if (g.thread_rank() == src_lane) {
      *(value_or_deltas + key_idx) = (bucket->vectors + key_pos * dim);
      *(founds + key_idx) = is_accum;
      bucket->digests(key_pos)[0] = get_digest<K>(insert_key);
      ScoreFunctor::update(bucket, key_pos, scores, key_idx, insert_score, (occupy_result != OccupyResult::DUPLICATE));
       //  This leads to premature unlocking of the key.
      (bucket->keys(key_pos))->store(insert_key, ScoreFunctor::UNLOCK_MEM_ORDER);    


    }
  }
}

I have drawn a simple diagram to illustrate this.
Image

After testing, I found that any place where unlocking occurs prematurely has the potential to cause issues:

  • insert_or_assign
  • find_or_insert
  • upsert_and_evict

Solution

I performed a quick verification: by directly writing vectors inside accum_or_assign_kernel, rather than saving the value pointer and writing in write_with_accum_kernel, the unit test passes.

Indeed, a more efficient approach is to maintain the lock until the vector is written, and then unlock it. This is similar to what the upsert_kernel_lock_key_hybrid and write_kernel_unlock_key functions do in the insert_or_assign interface.

So I'm curious: why doesn't accum_or_assign follow the same approach as the insert_or_assign interface, which uses upsert_kernel_lock_key_hybrid paired with write_kernel_unlock_key?

Moreover, the insert_or_assign interface also has upsert_kernel and write_by_cpu/write_kernel, which unlock prematurely as well. I’m not clear why the same interface includes these two different implementations.

@jiashuy

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions