Add Tensor.copyDataInto to Java API#19171
Conversation
…put reads The existing Tensor.getDataAsFloatArray() allocates a fresh float[] on every call and copies from the underlying off-heap buffer into it. In a steady-state inference loop this is a per-frame allocation proportional to the output tensor size — for a 144x192 single-channel depth output that's ~110 KB per call, multiplied by however many output tensors a model returns. Profiling on Android (Perfetto) showed this driving substantial ART GC pressure that ended up stealing CPU back from the inference thread itself via run-queue contention. The off-heap FloatBuffer that backs the output Tensor is already owned by the native side and exposed internally via the package-private getRawDataBuffer(). copyDataInto(FloatBuffer dst) lets callers reuse a pre-allocated destination across calls, eliminating the per-call float[] allocation while keeping ownership unambiguous (the destination is caller-owned, so it's safe to hand off to async consumers without racing against the next forward() overwriting native memory). Implemented for float32 (zero-copy bulk put) and float16 (per-element half->float widening, mirroring getDataAsFloatArray). Other dtypes inherit the base-class IllegalStateException default. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19171
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 1 New Failure, 1 Cancelled Job, 1 Unrelated FailureAs of commit 068212e with merge base 79b799d ( NEW FAILURE - The following job has failed:
CANCELLED JOB - The following job was cancelled. Please retry:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@claude review |
|
Claude finished @kirklandsign's task in 1m 17s —— View job Code Review: Add Tensor.copyDataInto to Java API
Overall: This is a clean, well-motivated PR. The API design is consistent with existing patterns, and test coverage is thorough. A few observations below, mostly minor. What's good
Issues
Nits
SummaryLooks good to land. The only suggestion worth considering before merge is the upfront capacity check in |
|
@pytorchbot label "release notes: none" |
Extends the previous commit (which added copyDataInto(FloatBuffer) for float32 / float16) to cover every dtype that the Tensor class supports: copyDataInto(ByteBuffer) int8 copyDataIntoUnsigned(ByteBuffer) uint8 copyDataInto(IntBuffer) int32 copyDataInto(LongBuffer) int64 copyDataInto(DoubleBuffer) float64 copyDataInto(ShortBuffer) float16 (raw fp16 bits, no widening) Mirrors the asymmetry of getDataAsByteArray (int8) vs getDataAsUnsignedByteArray (uint8) — calling copyDataInto(ByteBuffer) on a uint8 tensor throws, and copyDataIntoUnsigned(ByteBuffer) on int8 throws, just like the array accessors. The two methods are intentionally separate even though the underlying bits are identical, so a misuse caused by a dtype switch surfaces as an exception instead of silently producing values with the wrong sign interpretation. The float16 ShortBuffer overload writes the raw 16-bit half-precision bits without widening; copyDataInto(FloatBuffer) on the same tensor still performs the half->float widening, matching the dual getDataAsShortArray / getDataAsFloatArray accessors that float16 already exposes. Tests cover each new variant's happy path plus the asymmetric int8/uint8 rejection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4475cb5 to
068212e
Compare
|
@claude re-review |
|
@claude Review this PR |
|
Claude finished @mergennachin's task in 1m 1s —— View job Code Review: Add Tensor.copyDataInto to Java API
Overall: This is a well-designed, well-tested addition. The API is consistent with existing patterns, the motivation (reducing GC pressure in inference loops) is sound, and the test coverage is thorough. Ready to land with one minor suggestion. What's good
Minor suggestion
SummaryClean PR. The API mirrors the existing |
Summary
Adds
Tensor.copyDataInto(... dst)to the Android Java API for the float32 and float16 dtypes. It copies the tensor's data into a caller-provided destination buffer instead of allocating a freshfloat[]per call (asgetDataAsFloatArray()does today). The same pattern is repeated for other types.Motivation
While profiling depth inference on Android with Perfetto, output extraction was a meaningful contributor to ART GC pressure. Each call to
output.toTensor().dataAsFloatArrayallocates a new Javafloat[]sized to the tensor's element count and bulk-copies from the underlying off-heap buffer into it.The native side already exposes the underlying
FloatBufferdirectly (zero-copy view of the C++ tensor'sdata_ptr()), so the only thing missing was a public way for callers to drain it into a destination buffer they already own and reuse across calls.API
Caller-side usage example
Test plan
TensorTest.kt:testCopyDataIntoFloat32— round-trip with reuse across two callstestCopyDataIntoFloat32_writesAtDstPosition— verifies the call writes atdst.position()and advances it (does not overwrite from index 0)testCopyDataIntoFloat32_overflow—BufferOverflowExceptionon undersized destinationtestCopyDataIntoFloat16— verifies fp16→fp32 widening matchesgetDataAsFloatArraytestCopyDataIntoFloat_unsupportedDtype—IllegalStateExceptionfrom base default for non-float dtypesThis PR was authored with Claude.