What's the issue?
Currently MapReduce is implemented such a way that if you partition a dataset into number of partitions that does not divide data then you would get partitions of unequal sizes, the current implementation repeats the last data point to fill the partitions, would it make more sense to use a random datapoint instead?
What's the issue?
Currently MapReduce is implemented such a way that if you partition a dataset into number of partitions that does not divide data then you would get partitions of unequal sizes, the current implementation repeats the last data point to fill the partitions, would it make more sense to use a random datapoint instead?