Add Tensor.copyDataInto to Java API by jgibson2 · Pull Request #19171 · pytorch/executorch

jgibson2 · 2026-04-27T21:10:36Z

Summary

Adds Tensor.copyDataInto(... dst) to the Android Java API for the float32 and float16 dtypes. It copies the tensor's data into a caller-provided destination buffer instead of allocating a fresh float[] per call (as getDataAsFloatArray() does today). The same pattern is repeated for other types.

Motivation

While profiling depth inference on Android with Perfetto, output extraction was a meaningful contributor to ART GC pressure. Each call to output.toTensor().dataAsFloatArray allocates a new Java float[] sized to the tensor's element count and bulk-copies from the underlying off-heap buffer into it.

The native side already exposes the underlying FloatBuffer directly (zero-copy view of the C++ tensor's data_ptr()), so the only thing missing was a public way for callers to drain it into a destination buffer they already own and reuse across calls.

API

public void copyDataInto(FloatBuffer dst)

Implemented on all datatypes

Caller-side usage example

// One-time setup
FloatBuffer depthBuf = Tensor.allocateFloatBuffer(numelDepth);

// Per inference
EValue[] outputs = module.forward(...);
depthBuf.rewind();
outputs[0].toTensor().copyDataInto(depthBuf);   // no allocation
// ... read from depthBuf ...

Test plan

Added unit tests in TensorTest.kt:
- testCopyDataIntoFloat32 — round-trip with reuse across two calls
- testCopyDataIntoFloat32_writesAtDstPosition — verifies the call writes at dst.position() and advances it (does not overwrite from index 0)
- testCopyDataIntoFloat32_overflow — BufferOverflowException on undersized destination
- testCopyDataIntoFloat16 — verifies fp16→fp32 widening matches getDataAsFloatArray
- testCopyDataIntoFloat_unsupportedDtype — IllegalStateException from base default for non-float dtypes

This PR was authored with Claude.

…put reads The existing Tensor.getDataAsFloatArray() allocates a fresh float[] on every call and copies from the underlying off-heap buffer into it. In a steady-state inference loop this is a per-frame allocation proportional to the output tensor size — for a 144x192 single-channel depth output that's ~110 KB per call, multiplied by however many output tensors a model returns. Profiling on Android (Perfetto) showed this driving substantial ART GC pressure that ended up stealing CPU back from the inference thread itself via run-queue contention. The off-heap FloatBuffer that backs the output Tensor is already owned by the native side and exposed internally via the package-private getRawDataBuffer(). copyDataInto(FloatBuffer dst) lets callers reuse a pre-allocated destination across calls, eliminating the per-call float[] allocation while keeping ownership unambiguous (the destination is caller-owned, so it's safe to hand off to async consumers without racing against the next forward() overwriting native memory). Implemented for float32 (zero-copy bulk put) and float16 (per-element half->float widening, mirroring getDataAsFloatArray). Other dtypes inherit the base-class IllegalStateException default. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

pytorch-bot · 2026-04-27T21:10:41Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19171

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Rolling out OSDC (ARC) runners on pull & trunk workflows in PyTorch main

❌ 1 New Failure, 1 Cancelled Job, 1 Unrelated Failure

As of commit 068212e with merge base 79b799d ():

NEW FAILURE - The following job has failed:

pull / unittest / macos / macos-job (gh)
export/tests/test_target_recipes.py::TestTargetRecipes::test_linear_model

CANCELLED JOB - The following job was cancelled. Please retry:

pull / test-models-linux (phi_4_mini, portable, linux.4xlarge.memory) / linux-job (gh)
##[error]The operation was canceled.

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / test-qnn-testsuite-linux / test-backend-linux (qnn, models) / linux-job (gh) (detected as infra flaky with no log or failing log classifier)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

kirklandsign · 2026-04-27T21:28:33Z

@claude review

claude · 2026-04-27T21:29:17Z

Claude finished @kirklandsign's task in 1m 17s —— View job

Code Review: Add Tensor.copyDataInto to Java API

Read changed files and gather context
Review Tensor.java changes
Review TensorTest.kt changes
Post review feedback

Overall: This is a clean, well-motivated PR. The API design is consistent with existing patterns, and test coverage is thorough. A few observations below, mostly minor.

What's good

Motivation is solid — eliminating per-frame float[] allocations in inference loops is a real win for GC pressure on Android.
API mirrors existing patterns — the copyDataInto overrides follow the same data.rewind(); dst.put(data) approach as getDataAs*Array(), just with a caller-provided destination.
int8/uint8 disambiguation via copyDataIntoUnsigned is the right call since both share ByteBuffer.
Test coverage is comprehensive — happy path, reuse, position-awareness, overflow, cross-dtype error messages, and signed/unsigned discrimination are all covered.
Float16 → FloatBuffer widening correctly mirrors the existing getDataAsFloatArray() logic for Tensor_float16.

Issues

Thread safety of data.rewind() (pre-existing, not a blocker)
Tensor.java:829 — Every copyDataInto override calls data.rewind() on the tensor's internal buffer before reading. This is the same pattern getDataAs*Array() already uses, so it's consistent. But worth knowing: java.nio.Buffer operations are not thread-safe. If two threads call copyDataInto (or any mix of copyDataInto/getDataAs*Array) on the same Tensor concurrently, they can corrupt each other's reads. Since the PR description mentions "steady-state inference loops" — if callers ever share a Tensor across threads, they'll need external synchronization. This is a pre-existing concern, not introduced here.
Partial write on overflow for float16 copyDataInto(FloatBuffer)
Tensor.java:893-898 — The element-by-element loop for fp16→fp32 widening means a BufferOverflowException can leave dst partially written and data's position somewhere in the middle. The bulk dst.put(data) used by other dtypes is atomic at the buffer level (it checks remaining capacity upfront before transferring). This is a minor inconsistency — in practice callers should size their buffers correctly — but you could guard with an upfront check:
```
if (dst.remaining() < data.remaining()) {
    throw new BufferOverflowException();
}
```
Not a blocker, but would make the fp16 path behave identically to other dtypes on undersized buffers.

Nits

The Javadoc on the FloatBuffer base method (lines 557–576) is quite detailed compared to the other copyDataInto overloads. This makes sense since it's the one most callers will reach for, just noting the asymmetry is intentional.
testCopyDataIntoFloat64 and testCopyDataIntoFloat16RawBits manually construct DoubleBuffer/ShortBuffer via ByteBuffer.allocateDirect(...).order(...).asFooBuffer() — Tensor.allocateDoubleBuffer and Tensor.allocateHalfBuffer already exist and would shorten these.

Summary

Looks good to land. The only suggestion worth considering before merge is the upfront capacity check in Tensor_float16.copyDataInto(FloatBuffer) to avoid partial writes (issue #2), but it's not a must-have.

jgibson2 · 2026-04-27T21:29:24Z

@pytorchbot label "release notes: none"

Extends the previous commit (which added copyDataInto(FloatBuffer) for float32 / float16) to cover every dtype that the Tensor class supports: copyDataInto(ByteBuffer) int8 copyDataIntoUnsigned(ByteBuffer) uint8 copyDataInto(IntBuffer) int32 copyDataInto(LongBuffer) int64 copyDataInto(DoubleBuffer) float64 copyDataInto(ShortBuffer) float16 (raw fp16 bits, no widening) Mirrors the asymmetry of getDataAsByteArray (int8) vs getDataAsUnsignedByteArray (uint8) — calling copyDataInto(ByteBuffer) on a uint8 tensor throws, and copyDataIntoUnsigned(ByteBuffer) on int8 throws, just like the array accessors. The two methods are intentionally separate even though the underlying bits are identical, so a misuse caused by a dtype switch surfaces as an exception instead of silently producing values with the wrong sign interpretation. The float16 ShortBuffer overload writes the raw 16-bit half-precision bits without widening; copyDataInto(FloatBuffer) on the same tensor still performs the half->float widening, matching the dual getDataAsShortArray / getDataAsFloatArray accessors that float16 already exposes. Tests cover each new variant's happy path plus the asymmetric int8/uint8 rejection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

jgibson2 · 2026-04-28T12:03:35Z

@claude re-review

mergennachin · 2026-04-28T19:20:43Z

@claude Review this PR

claude · 2026-04-28T19:21:54Z

Claude finished @mergennachin's task in 1m 1s —— View job

Code Review: Add Tensor.copyDataInto to Java API

Read changed files and gather context
Review Tensor.java changes
Review TensorTest.kt changes
Post review feedback

Overall: This is a well-designed, well-tested addition. The API is consistent with existing patterns, the motivation (reducing GC pressure in inference loops) is sound, and the test coverage is thorough. Ready to land with one minor suggestion.

What's good

Complete dtype coverage — every supported tensor subclass (float32, float16, int8, uint8, int32, int64, float64) gets a copyDataInto override, plus the float16 → FloatBuffer widening path. No gaps.
copyDataIntoUnsigned disambiguation — since both int8 and uint8 back onto ByteBuffer, giving unsigned its own method name avoids ambiguity and makes misuse a compile-time error rather than a runtime surprise. The tests at lines TensorTest.kt:453-459 and TensorTest.kt:475-481 verify cross-type rejection, which is great.
Upfront capacity check for fp16 widening (Tensor.java:899-901) — the previous review flagged the partial-write risk and it's been addressed. The testCopyDataIntoFloat16_overflowIsAtomic test (TensorTest.kt:350-362) verifies dst.position() stays at 0 on failure. Well done.
Test coverage is comprehensive — happy path, buffer reuse, position-awareness, overflow, cross-dtype rejection, and signed/unsigned discrimination are all covered across 15 new test methods.

Minor suggestion

data.rewind() could be data.position(0) (or left as-is) — Buffer.rewind() clears the mark, while position(0) preserves it. In practice no code in this class uses marks, so this is purely a hygiene note and consistent with the existing getDataAs*Array() methods. Not worth changing — just noting for awareness.

Summary

Clean PR. The API mirrors the existing getDataAs*Array() pattern, the fp16 widening path has all-or-nothing overflow semantics, and the int8/uint8 disambiguation is well handled. Looks good to land.

jgibson2 requested a review from kirklandsign as a code owner April 27, 2026 21:10

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 27, 2026

jgibson2 changed the title ~~Add Tensor.copyDataInto(FloatBuffer) for allocation-free output reads on Android~~ Add Tensor.copyDataInto Apr 27, 2026

jgibson2 changed the title ~~Add Tensor.copyDataInto~~ Add Tensor.copyDataInto to Java API Apr 27, 2026

pytorch-bot Bot added the release notes: none Do not include this in the release notes label Apr 27, 2026

kirklandsign added release notes: android Android Java and JNI code and removed release notes: none Do not include this in the release notes labels Apr 27, 2026

jgibson2 force-pushed the android-tensor-copyDataInto branch from 4475cb5 to 068212e Compare April 28, 2026 11:41

mergennachin requested a review from psiddh April 28, 2026 19:20

jgibson2 mentioned this pull request Apr 29, 2026

Add Tensor.copyDataInto for allocation-free output reads on Android PolyCam/executorch#8

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Tensor.copyDataInto to Java API#19171

Add Tensor.copyDataInto to Java API#19171
jgibson2 wants to merge 2 commits intopytorch:mainfrom
PolyCam:android-tensor-copyDataInto

jgibson2 commented Apr 27, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Apr 27, 2026 •

edited

Loading

Uh oh!

kirklandsign commented Apr 27, 2026

Uh oh!

claude Bot commented Apr 27, 2026 •

edited

Loading

Uh oh!

jgibson2 commented Apr 27, 2026

Uh oh!

jgibson2 commented Apr 28, 2026

Uh oh!

mergennachin commented Apr 28, 2026

Uh oh!

claude Bot commented Apr 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jgibson2 commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

API

Caller-side usage example

Test plan

Uh oh!

pytorch-bot Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19171

❗ 1 Active SEVs

❌ 1 New Failure, 1 Cancelled Job, 1 Unrelated Failure

Uh oh!

kirklandsign commented Apr 27, 2026

Uh oh!

claude Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: Add Tensor.copyDataInto to Java API

What's good

Issues

Nits

Summary

Uh oh!

jgibson2 commented Apr 27, 2026

Uh oh!

jgibson2 commented Apr 28, 2026

Uh oh!

mergennachin commented Apr 28, 2026

Uh oh!

claude Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: Add Tensor.copyDataInto to Java API

What's good

Minor suggestion

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jgibson2 commented Apr 27, 2026 •

edited

Loading

pytorch-bot Bot commented Apr 27, 2026 •

edited

Loading

claude Bot commented Apr 27, 2026 •

edited

Loading

claude Bot commented Apr 28, 2026 •

edited

Loading