Fork Sync: Update from parent repository by github-actions[bot] · Pull Request #36 · MultiMx/tailscale

github-actions · 2025-06-29T12:16:57Z

No description provided.

If a DNS query for a domain that should be routed through a connector results in CNAME records in the response, collapse the CNAME chain to an A/AAAA record for the domain -> magic IP. Fixes tailscale/corp#39978 Signed-off-by: Fran Bull <fran@tailscale.com>

…9660) When a peer is not able to connect to control after a restart and is using a cached netmap, that nodes should be able to connect to another peer in its tailnet (given that the home DERP of that peer has not changed in the meantime). Add test that starts two peers and connects them to a tailnet with caching enabled. Then blackhole traffic to control from one peer and restart it. Verify that the connection between the two ends up direct. Adds facilities for expecting a certain path type between nodes. Updates: #19597 Signed-off-by: Claus Lensbøl <claus@tailscale.com>

Updates tailscale/corp#39975 Signed-off-by: Fran Bull <fran@tailscale.com>

Make it possible to remove the least recently used expired address assignment from addrAssignments. Before checking out a new address from the IP pools, return a handful of expired addresses. Updates tailscale/corp#39975 Signed-off-by: Fran Bull <fran@tailscale.com>

There is a 30-second timeout set on client TLS connections but the handshake was called on the wrong connection and so the timeout was never used in practice. Signed-off-by: Francois Marier <francois@fmarier.org>

…-generic core Splits SubscriberFunc[T] into: - SubscriberFunc[T]: a thin user-facing facade that holds only a pointer to a non-generic core. It exposes Close() to user code, which forwards to the core. - subscriberFuncCore: a non-generic struct that owns all the subscriber state (stop flag, unregister, logf, slow timer, cached reflect.Type) and implements the bus's package-private subscriber interface. Its dispatch() invokes a closure captured at construction time that performs the vals.Peek().Event.(T) type assertion and runs the user callback on the unboxed value. The bus's outputs map and subscriber-interface itab are parameterized only by *subscriberFuncCore, not by T, eliminating both the per-T itab and the per-T generic dictionary that previously scaled with the number of subscribed event types. Measured impact (util/eventbus/sizetest): total per-flow binary cost: linux/amd64: 3039.2 B/flow -> 2252.8 B/flow (-786.4 B / -25.9%) linux/arm64: 3145.7 B/flow -> 2228.2 B/flow (-917.5 B / -29.2%) SubscriberFunc per-receiver attribution: linux/amd64: 840.8 B/flow -> 300.8 B/flow (-540.0 B / -64.2%) linux/arm64: 849.9 B/flow -> 303.8 B/flow (-546.1 B / -64.3%) Dropped per-T symbols (200-flow eventbus binary): - (*SubscriberFunc[T]).dispatch was 26,639 B total (130 B/T) - (*SubscriberFunc[T]).subscribeType was 3,600 B total ( 18 B/T) - .dict.SubscriberFunc[T] was 14,400 B total ( 72 B/T) - go:itab.*SubscriberFunc[T],... was 9,600 B total ( 48 B/T) Of the original 913 B/flow attributed to SubscriberFunc, 540 B/flow is now gone, dropping the receiver to 300 B/flow. Behavior is unchanged: BenchmarkBasicThroughput is within noise (1955 -> 1941 ns/op on the test box) and all eventbus tests pass. Updates #12614 Change-Id: I646b3b05fd8d95f9afead59bfd0f69cd18b7a709 Signed-off-by: James Tucker <james@tailscale.com>

…ic core Mirrors the same refactor previously applied to SubscriberFunc: - Publisher[T]: a thin user-facing facade. Holds a pointer to a non-generic publisherCore and exposes Publish/Close/ShouldPublish. - publisherCore: a non-generic struct that owns the *Client back- pointer, stop flag, and cached reflect.Type. It implements the package-private publisher interface (publishType, Close). The bus's per-Client publisher set is set.Set[publisher] keyed on this single non-generic type. The publisher interface only exists to support diagnostic introspection (Debugger.PublishTypes returning the list of types a client publishes). Previously, satisfying that diagnostic-only interface forced *Publisher[T] to be the implementor and cost a per-T itab, generic dictionary, and equality function on every event type ever passed through Publish[T]. Moving the implementation to a non-generic core lets the diagnostic surface work unchanged while charging zero per-T cost for the diagnostic-driven generic interface. Publisher[T].Publish is also slimmed: the channel/select/stopFlag loop is now a non-generic publish() helper that takes the value as 'any'. The per-T body is reduced to forwarding the boxed value to the helper. Measured impact (util/eventbus/sizetest): total per-flow binary cost: linux/amd64: 2252.8 B/flow -> 1900.5 B/flow (-352.3 B / -15.6%) linux/arm64: 2228.2 B/flow -> 1835.0 B/flow (-393.2 B / -17.6%) Publisher per-receiver attribution: linux/amd64: 635.2 B/flow -> 369.6 B/flow (-265.6 B / -41.8%) linux/arm64: 751.7 B/flow -> 373.2 B/flow (-378.5 B / -50.4%) Cumulative reduction from the original baseline (5167ff412): linux/amd64: 3096.6 B/flow -> 1900.5 B/flow (-1196.1 B / -38.6%) linux/arm64: 3145.7 B/flow -> 1835.0 B/flow (-1310.7 B / -41.7%) Dropped per-T symbols (200-flow eventbus binary): - .dict.Publisher[T] was 14,400 B (72 B/T) - type:.eq.Publisher[T] was 11,832 B (58 B/T) - go:itab.*Publisher[T],publisher was 8,000 B (40 B/T) - (*Publisher[T]).Close shape stencils collapsed to 1 Behavior is unchanged: BenchmarkBasicThroughput is within noise (2018 -> 2038 ns/op at -benchtime=2s) and all eventbus tests pass. Updates #12614 Change-Id: I61979c2bf95d2a711c2321e6e0b4b7d15980e9f5 Signed-off-by: James Tucker <james@tailscale.com>

The natlab vmtest suite (tstest/natlab/vmtest) and the integration nat tests are gated behind --run-vm-tests because they need KVM and are slow. Until now nothing in CI exercised them apart from a single canary TestEasyEasy run on every PR. Add .github/workflows/natlab-test.yml that runs the full opt-in suite on demand (workflow_dispatch), on PRs labeled "natlab", and on main every 12 hours via cron. The workflow has two phases: - "prepare" builds the gokrazy VM image, downloads the Ubuntu and FreeBSD cloud images once via the new natlabprep tool, and emits a dynamic JSON matrix of every TestX function it finds in the two opt-in packages. - "test" is a per-test matrix that depends on prepare. Each matrix job restores the shared caches and runs a single test, so adding a new TestFoo is automatically picked up on the next run without any workflow edits. Rename the existing natlab-integrationtest.yml to natlab-basic.yml since it's the small smoke variant (just TestEasyEasy on every PR); the new natlab-test.yml is the bigger suite. The job inside is renamed to EasyEasy for the same reason. Move the macOS arm64 host check from vmtest.Env.Start into vmtest.Env.AddNode so a test that adds a vmtest.MacOS node skips immediately on a non-macOS host, and add an explicit skipIfNotMacOSArm64 helper at the top of the two macOS-only tests so the platform requirement is obvious to readers. Quiet the takeAgentConnOne miss log in tstest/natlab/vnet by default (it was the overwhelming majority of bytes in CI logs, with no signal in healthy runs) and replace it with a periodic "still waiting" line that only fires after 10s, so a truly stuck agent connection still surfaces. Updates #13038 Change-Id: I4582098d8865200fd5a73a9b696942319ccf3bf0 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>

startCloudQEMU hardcoded -machine q35,accel=kvm and -cpu host, which fails on any host without KVM (notably macOS). Replace with a qemuAccelArgs helper that probes /dev/kvm and falls back to QEMU's TCG software emulation, matching the pattern already used by tstest/integration/nat. Also wire the helper into startGokrazyQEMU so gokrazy VMs pick up KVM when available. Updates #13038 Change-Id: I7745518db823279b1880957bb14ca2ffdaab4c50 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>

macOS limits Unix socket paths to 104 bytes. The Go test TempDir path (e.g. /var/folders/.../TestDirectConnection...679197086/001/) easily exceeds that, causing "bind: invalid argument". Create a short /tmp/vmtest* directory for all socket files (vnet, QMP, dgram) so the paths stay well under the limit on every platform. Updates #13038 Change-Id: I721d24561d1766aaa964692bc77f40a131aa9455 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>

…d cache type name Two changes that share the same intent of reducing per-T duplication in code that doesn't actually depend on T: 1. Hoist the non-generic portion of newSubscriberFunc[T] into a newSubscriberFuncCore() helper. The hoisted work is the time timer setup, the subscriberFuncCore allocation, and the unregister closure (which captures only the non-generic reflect.Type and *subscribeState). The generic body now does only the two T-bound things it has to: compute reflect.TypeFor[T] and create the dispatch closure. Effect on the per-shape-stencil body of newSubscriberFunc[T]: before: 523 B per shape (in synthetic test) after: 293 B per shape (-230 B per shape; -56% on this body) 2. Cache reflect.Type.String() once at construction (in core.typeName) instead of recomputing it every time the dispatch closure runs. The dispatch closure also now takes the *subscriberFuncCore directly rather than building an intermediate dispatchFuncState struct on every call. Effect on the dispatch closure body (newSubscriberFunc[T].func1): before: 581 B per shape after: 480 B per shape (-101 B per shape; -17%) Combined effect on tailscaled (linux/amd64): named-symbol savings via symcost: ~7 KB stripped binary delta: -8 KB (page-quantized) arm64 binary delta: 0 (page-quantized) cumulative reduction from baseline (5167ff412): linux/amd64: -110,592 bytes (-0.391%) linux/arm64: -131,072 bytes (-0.499%) Throughput is also improved by the typeName cache: BenchmarkBasic goes from 2018 ns/op to 1864 ns/op (-7.6%) because the dispatch hot path no longer allocates a string on every event. Updates #12614 Change-Id: Ib3a3d6796785e16506330ec034e1144580d467a3 Signed-off-by: James Tucker <james@tailscale.com>

…onnectivity (#19699) Add new clientmetric counters for establishing contact with peers while using cached network map data. To do this, instrument the magicsock.Conn with a bit to indicate whether its peer data came from a cached netmap. If so, there are two conditions we will count as establishing connectivity to a peer: - Receipt of a CallMeMaybe from a peer via disco. - Establishing a valid endpoint address for a peer. In vmtest, add Env.ClientMetrics to scrape metrics from the specified node. Use this to check that counters were updated in caching tests. Updates tailscale/projects#13 Updates #12639 Change-Id: Ie8cf3244ac8af4f5bcfe4d0d944078da2ba08990 Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>

Fixes #12778 Change-Id: If9f8b299cef0cb68f93b344845b5c6a5b7554d2c Signed-off-by: DeedleFake <deedlefake@users.noreply.github.com>

…services Adds two new cap resolution methods alongside the existing PeerCaps: PeerCapsForService(src netip.Addr, svcName tailcfg.ServiceName) resolves the service name to its VIP addresses via the node's service IP mappings and returns caps scoped to that service. Exposed on /v0/whois via the svc_name query parameter and on client/local.Client as WhoIsForService. PeerCapsForIP(src, dst netip.Addr) resolves caps against an arbitrary destination IP. Exposed on /v0/whois via the svc_addr query parameter and on client/local.Client as WhoIsForIP. svc_name takes priority over svc_addr when both are present. Invalid values for either return 400. The existing PeerCaps/WhoIs path is unchanged: without a service parameter, WhoIs returns only host-level caps. Updates tailscale/corp#41632 Signed-off-by: Adriano Sela Aviles <adriano@tailscale.com>

Replace the process-global Server.mu lookup in the packet send hot path with a global hashtriemap mirror of local clientSet entries. The authoritative clients map remains guarded by Server.mu; clientsAtomic is only a lock-free fast path for active local clients. Misses, stale inactive client sets, duplicate accounting, and mesh forwarding still fall back to lookupDestUncached. This avoids taking Server.mu for the common local active-client send path, at the cost of adding one global concurrent map that mirrors Server.clients for local peers. The benchmark uses four destination peers. The before run sets TS_DEBUG_DERP_DISABLE_PEER_HASHTRIE=true to force the old mutex lookup path; the after run uses the hashtrie fast path. goos: linux goarch: amd64 pkg: tailscale.com/derp/derpserver cpu: Intel(R) Xeon(R) 6975P-C │ before │ after │ │ sec/op │ sec/op vs base │ LookupDestHashTrie-16 176.050n ± 1% 1.904n ± 6% -98.92% (p=0.000 n=10) │ before │ after │ │ B/op │ B/op vs base │ LookupDestHashTrie-16 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ ¹ all samples are equal │ before │ after │ │ allocs/op │ allocs/op vs base │ LookupDestHashTrie-16 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ ¹ all samples are equal Updates #3560 (very indirectly, historically) Updates #19713 (as an alternative to that PR) Change-Id: Ifb72e5c9854ad00e938cd24c6ab9c27312f297e8 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>

This fixes a log message where ipn/ipnlocal.shouldUseOneCGNATRoute would claim that an android machines was actually macOS. Updates #cleanup Updates #19652 Signed-off-by: Simon Law <sfllaw@tailscale.com>

…19721) This patch fixes a data race in wgengine/netstack that surfaced while running both TestTCPForwardLimits and TestTCPForwardLimits_PerClient. Because these two tests both setup the TS_DEBUG_NETSTACK envknob, a race happens because netstack.Impl.Close leaked its inject goroutine. The inject goroutine also reads the TS_DEBUG_NETSTACK envknob, so if it is still running when the next test starts, then it will break. This patch also cleans up the tests a bit, ensuring that neither of them run in T.Parallel. It also adds a T.Cleanup call to clear the envknob. Fixes #19720 Signed-off-by: Simon Law <sfllaw@tailscale.com>

Fixes tailscale/corp#40250 Signed-off-by: Fran Bull <fran@tailscale.com>

) Instead of having two entry points for running natlab tests, start converting the connectivity tests to use the vmtest framework. Grid and pair tests have yet to be moved over. Updates #13038 Signed-off-by: Claus Lensbøl <claus@tailscale.com>

A missing hosts file is not a fatal error. We should log it, but still proceed and create a new one instead of failing the DNS reconfiguration completely. Fixes #19733 Signed-off-by: Nick Khyl <nickk@tailscale.com>

Adds a new NoiseRoundTripper field to tsd.Sys to expose an http.RoundTripper to make requests over the control plane Noise connection. This will be used in PAM use cases soon. Updates tailscale/corp#41800 Signed-off-by: Adriano Sela Aviles <adriano@tailscale.com>

…ns unchanged Warnables with a non-zero TimeToVisible are only published on the eventbus when they remain unhealthy long enough to become visible. However, we still publish a health.Change when a warning that was never visible (and was never published to the eventbus) becomes healthy. This PR fixes that and reduces churn when there is no actual state change. In particular, it avoids unnecessary IPN bus notifications sent to GUI/CLI clients, captive portal detection, etc. Updates tailscale/corp#39759 (noticed while working on it) Signed-off-by: Nick Khyl <nickk@tailscale.com>

Server.clientsAtomic was introduced in 6b72979 as a lock-free mirror of Server.clients to skip Server.mu on the packet send hot path. This drops the non-concurrent map and makes all the existing callers of the old plain map just use the concurrent map, but still holding Server.mu. BenchmarkLookupDestHashTrie is unchanged at ~2ns/op. Fixes #19726 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com> Change-Id: I0894e4d86914d152b9b5fef969a3184bcb96f678

…etry Brings Subscriber[T] in line with the same non-generic-core pattern already applied to SubscriberFunc[T] and Publisher[T]: - Renames subscriberFuncCore to subscriberCore and shares it between Subscriber[T] and SubscriberFunc[T]. Both typed facades hold a *subscriberCore plus their respective per-T delivery state (Subscriber: chan T; SubscriberFunc: nothing, the user callback is captured in the dispatch closure). - The bus's outputs map and subscriber-interface itab key on *subscriberCore for both subscriber kinds, so adding a new Subscribe[T] call site no longer pays a per-T itab, dictionary, or equality function for the subscriber-interface side. - Subscribe[T] now hoists the non-generic constructor portion into newSubscriberCore (timer setup, core allocation, cached type/typeName, unregister method-value), matching SubscribeFunc. The dispatch loop is intentionally NOT extracted to a non-generic helper for Subscriber[T], unlike SubscriberFunc[T]. The reason is the typed channel send 'case s.read <- t:' must appear lexically inside the select; the only way to lift it into a non-generic loop is to bridge typed and untyped via a per-event goroutine, which costs ~2.7x throughput on BenchmarkBasicThroughput. We keep dispatchTyped on the generic facade and accept the per-shape stencil cost as the cheaper alternative. Symbol-level effect on tailscaled (linux/amd64, measured via `go tool nm -size`): Before: (*Subscriber[T]).dispatch 2 shape stencils: 1,682 + 1,549 = 3,231 B 3 thin per-T wrappers: 124 B each = 372 B 2 deferwrap1 helpers: 62 B each = 124 B total: 3,727 B After: (*Subscriber[T]).dispatchTyped 2 shape stencils: 1,678 + 1,582 = 3,260 B 0 per-T wrappers (replaced by closure stored on core) 2 deferwrap1 helpers: 62 B each = 124 B total: 3,384 B dispatch path .text delta: -343 B (-9.2%) Per-shape stencils are ~1,600 B (.text body) + ~1,100 B (pclntab) = ~2,700 B each on production tailscaled. The shape count matches before/after (two distinct GC shapes for the Subscriber[T] event types in this binary). What changes is that the per-T thin wrappers are eliminated because Subscriber[T] no longer implements the subscriber interface directly. Whole-binary section deltas: .text: -2,304 B (includes the dispatch savings plus other small downstream effects) .rodata: +512 B (additional closure-type metadata) .gopclntab: -2,981 B (fewer per-T compiled functions => less metadata) Stripped tailscaled (linux/amd64): no change at the file level (the savings fall below the linker's section-alignment boundary). Unstripped builds shrink by ~2,900 B. Behavior is unchanged: BenchmarkBasicThroughput: 2,161 ns/op, 0 B/op, 0 allocs/op BenchmarkBasicFuncThroughput: 2,493 ns/op, 144 B/op, 2 allocs/op BenchmarkSubsThroughput: 3,727 ns/op, 0 B/op, 0 allocs/op Updates #12614 Change-Id: I97918ec68bd2cdb15958bbfd7687592b39663efe Signed-off-by: James Tucker <james@tailscale.com>

…eck (#19725) Fix the following issues: 1. Endianness Bug: The nftables runner used hardcoded big-endian byte arrays for firewall mark values (0xff0000, etc.), breaking bitwise operations on little-endian systems (all x86/x64, ARM). This caused connmark save/restore rules to silently fail. Fixed by using binary.NativeEndian to generate correct byte order for the host system. 2. Connmark Restore Conditional Check: The connmark restore mechanism unconditionally overwrote packet marks, even when Tailscale hadn't set any mark bits in conntrack. This destroyed mark bits set by other systems (VPNs, policy routing, vendor flags), breaking coexistence. Fixed by adding a conditional check to only restore when (ct mark & 0xff0000) != 0, preventing the worst case of wiping all marks to zero. Changes: - util/linuxfw/linuxfw.go: Added nativeEndianUint32() helper and updated all mask functions to use native byte order instead of hardcoded bytes - util/linuxfw/nftables_runner.go: Added conditional check in makeConnmarkRestoreExprs() to only restore when ct mark has Tailscale bits set; added detailed comment about bit preservation limitations - util/linuxfw/iptables_runner.go: Added conditional check using -m connmark ! --mark to match nftables behavior - Tests updated: Fixed byte-level regression tests to expect little-endian byte sequences and verify the new conditional check Note: Perfect bit preservation in nftables remains challenging due to nftables expression VM limitations. The current implementation prevents the critical case of wiping marks with zero. Updates #3310 Fixes #11803 Related to #8555 Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>

The codegen path for map-of-slice-of-pointer fields, skipped nil-valued entries. That dropped the key from the map. This broke how dns.Config.Routes uses nil values sentinels. Fixes #19730 Fixes #19732 Fixes #19746 Fixes #19744 Change-Id: Ic6400227f4ab21b3ca0e8c0eeecf9b83d145a9ab Signed-off-by: Fernando Serboncini <fserb@tailscale.com>

The label "natlab" is a bit confusing and also used for other things. Instead, change the trigger label to "run-natlab-tests". Updates #13038 Signed-off-by: Claus Lensbøl <claus@tailscale.com>

In a lot of places, we construct an error to End a step, then immediately log it to the governing test as test fatal. Save ourselves a bit of boilerplate by putting methods on Step for that. There are a couple cases this doesn't cover, e.g., where we construct the Step outside a subtest that wants to fail individually, but it helps enough to pay for its lines. Updates #13038 Change-Id: I71f9900942962de16609b6b198d3ba13d6958a5f Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>

…#19758) Their version scheme is different, even though the OS is based on Ubuntu. We need to check Zorin's version numbers to pick the right APT_KEY_TYPE. Updates #18925 Signed-off-by: Andrew Lytvynov <awly@tailscale.com>

Add a VM-based natlab test that exercises the peer-relay feature (feature/relayserver) end-to-end across three Tailscale nodes whose network topology makes a direct A<->B UDP path impossible: both peers are behind HardNAT (FreeBSD/pfSense-style endpoint-dependent NAT) with no port-mapping services, while the relay node is behind One2OneNAT so its STUN-discovered WAN endpoint is reachable from both peers. The test enables the relay server via EditPrefs, then waits for an a->b PingDisco whose PingResult.PeerRelay is set (proving magicsock chose the peer-relay path, not DERP), and finally asserts that the relay's DebugPeerRelaySessions LocalAPI reports the session. The existing TestPeerRelayPing in tstest/integration runs three tailscaled processes on the loopback interface with no NATs; this new vmtest covers peer relay through real per-VM kernels and NATs. To wire control-server capabilities into vmtest, also add a PeerRelayGrants() EnvOption (sibling of AllOnline, SameTailnetUser) that flips testcontrol.Server.PeerRelayGrants so the wildcard packet filter grants tailcfg.PeerCapabilityRelay and PeerCapabilityRelayTarget; without those caps magicsock won't consider any peer a candidate relay. Updates #13038 Change-Id: Ib3440b83ec442da0d3b89ffa48ceea9398ea9062 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>

When recommending an exit node, suggestExitNodeLocked ranks candidates by the latency to their home DERP region, taken from the most recent netcheck report. But netcheck alternates between full reports, which probe every region, and incremental reports, which only re-probe the home region and a handful of the fastest regions. When the most recent report is incremental, the suggestion fell back to a random for exit nodes that are far away. Now we rank candidates against the best recent latency, tracked by the `netcheck.Client` - the same data that is used to pick the preferred DERP. It uses a history of measurements which includes a full netcheck report, so should cover all DERP regions. Updates tailscale/corp#17516 Signed-off-by: Anton Tolchanov <anton@tailscale.com>

suggestExitNodeLocked now ranks exit node candidates using the per-region latency tracked by the netcheck Client (RecentRegionLatency), which merges the reports retained in c.prev. That history is only useful for far-away regions if it contains a full netcheck report, since incremental reports only re-probe the home region and a handful of the fastest ones. The full-report cadence in GetReport and the c.prev retention window were two independent 5-min constants - the way we schedule netchecks ensured that the history always contaned a full report, but it was not a strong contract and we did not have any checks around this. Now full report interval and retention window are driven by the same var, and a test confirms that the history contains a full report. Updates tailscale/corp#17516 Signed-off-by: Anton Tolchanov <anton@tailscale.com>

Fix leaking peers that failed to complete the handshake. Updates #20183 Change-Id: I84f7ea0484f05b090d963a7d12c135a66a6a6964 Signed-off-by: Alex Valiushko <alexvaliushko@tailscale.com>

Outbound packets produced by netstack (used by tailscaled with --tun userspace-networking, by tsnet, and by the SOCKS5/HTTP proxies) enter the wrapper via InjectOutbound{,PacketBuffer} and take the injectedRead path, which bypasses Filter.RunOut. RunOut's side effect for UDP/SCTP is to insert the reverse-flow tuple into the connection-tracking LRU so that Filter.RunIn admits inbound replies that no explicit ACL rule covers. Skipping it on the injected path meant a netstack-side dial of UDP would send fine but the reply would be dropped as "no matching rule". The kernel-TUN path was already fine because it goes through RunOut. Fixes #14229 Fixes #20064 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com> Change-Id: I816ef55c493a12ff4f561cd89c095559b5c2743b

Both tests started flaking after my 9107354 ("tstest/natlab/vnet: send unsolicited IPv6 Router Advertisements") added background RA traffic on v6-enabled networks. TestPacketSideEffects races the periodic unsolicited-RA goroutine against its synchronous packet-count assertions: when the multicast RA fires after the test has registered its sinks, both sinks receive it and "got 1 packet, want N" becomes "got N+2". TestProtocolQEMU's reader was doing raw Read on the SOCK_STREAM unix socket and comparing the whole result to the expected length-prefixed packet. The kernel is free to coalesce the on-register RA frame and the test packet into one Read, in which case bytes.Equal fails and the entire chunk (including the test packet's bytes) gets discarded as "unexpected", leading to a 5s i/o timeout. Parse the QEMU uint32 length-prefix framing with io.ReadFull instead so we read exactly one frame per iteration regardless of how the kernel buffers them. The SOCK_DGRAM path (TestProtocolUnixDgram) keeps the original raw Read since datagram boundaries are preserved. These where the top two flakes in oss on the flakes dashboards. Updates #13038 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com> Change-Id: I32983656b692921a0f43a4a5e9a8a6ab2555ee49

The ProxyGroup HA Service reconciler's validateService scanned every Service in the cluster with shouldExpose=true for duplicate hostnames. With multi-tailnet (Tailnet CRD) support, that scan reaches across tailnet boundaries: * A Service exposed via the single-proxy path (tailscale.com/expose) on the primary tailnet would block a ProxyGroup ingress Service for the same hostname on a secondary tailnet, even though the two live in different reconcilers and different tailnet DNS namespaces. * Two ProxyGroups joined to different tailnets via spec.tailnet would also block one another for shared hostnames, again despite living in separate DNS namespaces. In both cases the ProxyGroup ingress Service was silently dropped (IngressSvcInvalid event raised, queue cleared, ConfigMap never written, ProxyGroup never serves the backend). This change tightens the check in two ways: * Skip Services that aren't themselves managed by the ProxyGroup reconciler (use isTailscaleService instead of shouldExpose). * For ProxyGroup-managed Services attached to a different ProxyGroup, look up that ProxyGroup and skip the duplicate report when spec.Tailnet differs from the current one. Fall through and flag the collision on lookup failure so genuine duplicates are not silently allowed. Adds regression tests covering both the single-proxy and the different-tailnet cases. Updates the existing TestValidateService expected error to reflect the rephrased message. Updates #20069 Signed-off-by: tsushanth <78000697+tsushanth@users.noreply.github.com>

aa5da2e (in the 1.99.x dev series, unstable) introduced some bugs, only some of which were later fixed. This fixed another. As of that change, tkaFilterNetmapLocked ran only on full netmaps through LocalBackend.setClientStatusLocked and not peer upserts via new or changed peers. The later ae74364 fixed a regression in the Engine layer but didn't fix the tkaFilter code from re-running on upserts. This add a tkaFilterDeltaMutsLocked pass before nodeBackend.UpdateNetmapDelta. For each NodeMutationUpsert whose peer fails the same signature check tkaFilterNetmapLocked applies, rewrite the upsert in place into a NodeMutationRemove targeting the same node ID, so magicsock's per-mutation dispatch and nodeBackend.peers both drop the peer, matching the prior full-netmap semantics. New tsnet tests added: - TestTailnetLockFiltersUnsignedDeltaPeer covers the new-peer case. - TestTailnetLockFiltersUnsignedDeltaPeerReplacement covers the existing-peer-replacement case, to an empty signature. - TestTailnetLockFiltersDeltaPeerWithInvalidSignature like above but with a bogus signature. Updates #12542 Updates tailscale/corp#43767 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com> Change-Id: Ib35d0391541fee654867c26489847dbc5b7e2ae8

The test transferred only 64 KiB over loopback, which can complete within a single clock tick on fast CI machines, causing time.Since(start).Seconds() to return 0 and the "transfer_time_seconds_total > 0" assertion to fail. Increase the payload to 1 MiB so zero is genuinely implausible, and retry up to 3 additional times. If the metric is still zero after 4 total attempts, fail hard — at that size it means the timing logic is actually broken. Fixes #20213 Change-Id: I3fab510ce8c567506fea5ad803d35acf40d65700 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>

This applies the same treatment from 8f21045 (netlog) to wglog, ending use of netmap.NetworkMap and instead getting the canonical data from LocalBackend/nodeBackend. This is a dependency to removing the netmap.NetworkMap from upstream callers, like wgengine.Engine in general. Updates #12542 Change-Id: Icb5af0799322def048a6f594b49f7d11273f025d Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>

This applies the same treatment from PR #20162 (netlog) and PR #20171 (wglog) to the local Taildrive filesystem wiring, ending the per-netmap-update O(n) rebuild of the drive remotes list. This moves the O(n peers) taildrive-remote list rebuild from every peer change (which previously happened regardless of whether you were even using taildrive) to instead happen only as needed. That running on every netmap update and was a contributor to the broader quadratic behavior we want to eliminate when a single peer is added or removed. Instead, this introduces drive.RemoteSource, a small interface the Taildrive filesystem pulls from lazily on incoming WebDAV requests, and caches by a generation counter. ipn/ipnlocal installs a driveRemoteSource once at NewLocalBackend time and bumps LocalBackend.driveGen on the three events that can actually flip the drive-capable peer set: full netmap installs (domain + self caps), UpdateNetmapDelta (peer add/remove or per-peer address changes), and updatePacketFilter (since PeerCapability values are derived from the packet filter rules, not from peer.CapMap). The hook itself is kept but narrowed: it no longer takes a *netmap.NetworkMap and its only remaining job is to re-notify IPN bus listeners of the current local shares list on full installs. This is a dependency to removing the netmap.NetworkMap type from upstream callers, like wgengine.Engine in general. (Also add a bunch more tests) Updates #12542 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com> Change-Id: I7e3d2f5b4a9c8e1d6f0a3b7c9e2d4f8a1b6c5e9d

Detect Hetzner via /sys/class/dmi/id/sys_vendor == "Hetzner" and wire up Hetzner's public recursive DNS resolvers (185.12.64.1, 185.12.64.2) for use as a cloud host resolver. Fixes #20217 Change-Id: I24a4c51956adfdd5731f62c937e3c7a4a733ffc7 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>

Pin govulncheck to resolve panics in the most recent version. Updates #cleanup Signed-off-by: Patrick O'Doherty <patrick@tailscale.com>

The watchdog (ipn/ipnlocal/watchdog.go) was abusing PeerForIP with an invalid netip.Addr as a way to acquire and release the engine's internal locks for deadlock detection. This does the TODO to break it out into its own method like all the other similarly named methods. Splitting this out as a prerequisite for a follow-up rewrite of PeerForIP itself; not having to preserve the lock-probe overload in the new implementation keeps that follow-up smaller. Updates #12542 Updates #cleanup Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com> Change-Id: I25cbffd11aeb65600d9128845404c4918ef88ead

I'm not keen on us having to deal with the bad side effects of the autocrlf default, but alas, if it makes things easier. Fixes #16175 Closes #16176 Signed-off-by: James Tucker <james@tailscale.com>

…ression Otherwise we may never handshake a new peer relay server endpoint around remote client restarts and/or disco key rotation. Updates #20215 Signed-off-by: Jordan Whited <jordan@tailscale.com>

Another baby step toward removing slices of peers from the engine. getStatus iterated peerSequence (a key snapshot built in Reconfig from cfg.Peers) and then asked wgdev for each peer's stats; peers that weren't active in wgdev silently fell out. Iterate active wgdev peers directly via RemoveMatchingPeers(returnFalse) instead. Updates #12542 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com> Change-Id: I3abd348abc30db706db29b3a785179259e48abda

userspaceEngine.PeerForIP read from e.netMap.Peers and e.lastCfgFull.Peers, both of which go stale when peers arrive via netmap deltas (which skip Engine.SetNetworkMap and Engine.Reconfig). Every PeerForIP caller (Engine.Ping, the TSMP disco-key handler, pendopen diagnostics, tsdial.Dialer.UseNetstackForIP, and LocalBackend.GetPeerEndpointChanges) would report "no matching peer" for freshly-added peers. Fix it the same way SetPeerByIPPacketFunc fixed the outbound packet hot path: have LocalBackend install a callback that reads the live nodeBackend. nb.NodeByAddr is built from both SelfNode and Peers (updateNodeByAddrLocked), so a single lookup covers the common case with IsSelf set when the matched node ID is SelfNode's. The subnet- route / exit-node-default-route slow path goes through a new Engine.PeerKeyForIP that exposes the engine's AllowedIPs BART table (the same table the outbound packet hot path already consults, with exit-node selection honored), and resolves the matched key back to a NodeView via the live nodeBackend. Updates #12542 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com> Change-Id: I0d4b0d8997c8e796b7367c46b49b61d4fdc717b0

The logging added in 12188c0 was generating excessive spam in backend logs. This may have been exacerbated by tailscale GUI<->backend architecture on certain platforms like Windows, where the GUI polls for exit node suggestions rather than listening on the IPN bus. Change this to log on error or if the current suggestion differs from the previous suggestion. Updates tailscale/corp#43691 Updates #20194 Signed-off-by: Amal Bansode <amal@tailscale.com>

Most of our flag descriptions start with a lowercase word (except proper nouns); fix the handful which do not. Fixes #20230 Change-Id: I00aaac171254c050ad0b75c2cf8746590c8c4d8f Signed-off-by: Alex Chan <alexc@tailscale.com>

Add a retry loop with BatchMode=yes to absorb the race window between Env.Start() returning (when tta reports the tailscale backend as Running) and cloud-init finishing the user/SSH-key setup. In CI, the second VM's tta agent has been observed connecting only a few hundred milliseconds before the test SSHes in, which is inside the window where /root/.ssh/authorized_keys hasn't fully landed yet. SSH key auth then fails and ssh(1) falls back to interactive password prompts (3x), wasting time and producing a confusing "Permission denied (publickey,password)" error. BatchMode=yes makes the client fail fast on auth failure instead of prompting, and the retry loop handles SSH transport-level errors (exit code 255) for up to 30 seconds with 500ms backoff. Remote command non-zero exits still pass through unchanged. Fixes #20228 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com> Change-Id: I17f7422e9e27bf7b995f505c0184cbb2b230ed81

Env.Start boots all VM nodes in parallel; each calls createCloudInitISO -> ensureDebugSSHKey concurrently. When /tmp/vmtest_key doesn't yet exist, the first goroutine creates it with os.WriteFile, which opens with O_CREATE|O_TRUNC and briefly leaves the file existing-but-empty between the open and the subsequent write. A concurrent goroutine that hits that window sees ReadFile succeed with zero bytes, then fails ssh.ParsePrivateKey with "ssh: no key found", causing boot to fail with: boot: creating cloud-init ISO: parse /tmp/vmtest_key: ssh: no key found Observed in CI on TestSiteToSite (3 nodes). Wrap the function in a package-level Mutex so the first caller fully writes the key before any other caller reads it. Updates #20228 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com> Change-Id: Ie6399dcba0c397bb8041931d3de1c6063a11c568

tsdial.Dialer.SetNetMap rebuilt an O(n peers) map of MagicDNS names on every netmap change. As we move toward per-peer incremental deltas, this becomes quadratic. This removes it and replaces it with SetResolveMagicDNS, a callback into LocalBackend that looks up hostnames from nodeBackend's new nodeByName index (populated alongside nodeByAddr/nodeByKey on both full and delta paths). The index stores both FQDNs and short names as keys. This is the same treatment applied to netlog (8f21045), wglog (988b090), and drive (1d69894): stop pushing *netmap.NetworkMap into subsystems and instead have them pull from LocalBackend's live data via callbacks. Updates #12542 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com> Change-Id: I24557ab0c8a27636e08e4779bcfd3ec633db0a78

Add zizmor GitHub Actions linting on changes to .github/workflows. Updates tailscale/corp#28760 Signed-off-by: Patrick O'Doherty <patrick@tailscale.com>

…20199) Router.Set reconciled tailscale0's addresses only against the in-memory r.addrs map, which starts empty each run. After a restart the kernel can still hold the addresses a previous profile put on tailscale0. With no record of them, Set never removed them, leaving two tailnets' CGNAT addresses on the interface. That broke connectivity, because the kernel could source traffic from the wrong IP. Fix this by scanning the addresses actually on the interface and, after reconciling the desired set, removing any in Tailscale's CGNAT/ULA ranges that aren't in the config. Non-Tailscale addresses are never touched, and IPv6 addresses are skipped when IPv6 is unavailable, since delAddress no-ops there. To avoid a netlink dump on every Set, the scan runs only on the first Set and when the desired address set changes. This also needs the iptables DelLoopbackRule to tolerate a missing rule: an orphan left by a previous instance never went through AddLoopbackRule here, and iptables (unlike nftables) errors when deleting an absent rule, which would otherwise block the address delete. Fixes #19974 Signed-off-by: Brendan Creane <bcreane@gmail.com>

The primary purpose is that return packets from the target app get properly SNATed on connectors with --tun=userspace-networking, matching the NAT behavior in the kernel tun path. This is also necessary but not sufficient for clients of connectors in userspace networking mode. The hook will DNAT MagicIPs, but won't actually be sent MagicIPs until conn25 app connector DNS works with userspace networking. Fixes tailscale/corp#43201 Signed-off-by: Michael Ben-Ami <mzb@tailscale.com>

The engine only used the netmap to look up self addresses and the self node's primary routes, so pass it the self node directly rather than the whole netmap. Updates #12542 Change-Id: I13c0028eed65d2177baf4cf6c449f5e441845a18 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>

setWebClientAtomicBoolLocked and setDebugLogsByCapabilityLocked each only need the node capabilities to decide what to do, so take a set.Set[tailcfg.NodeCapability] directly as part of getting rid of netmap.NetworkMap. Updates #12542 Change-Id: If7c30b6354fd42dfe82ed6d2e2fe3439de401315 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>

No code changes needed; this is to rule out cmpver as the source of any version-comparison issues. Updates #20238 Change-Id: Ib8765dd042e994549d9e2c03859a5f769a856704 Signed-off-by: Alex Chan <alexc@tailscale.com>

364b952 switched containerboot to partial netmap fetching, but stopped refreshing `DNS.ExtraRecords`, so Tailscale Services created after pod boot were invisible to resolveTailnetFQDN. To fix we watch for SelfChange ipn bus notifies, and refetch dns-config via LocalAPI to get a fresh set of `DNS.ExtraRecords`. Fixes #20233 Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>

… receive extensions" (#20257) * Revert "control/controlclient: continue map poll during key expiry to receive extensions" This reverts commit 6a822dc. This commit has caused test failures in the corp repo by unexpected changing the login behaviour when nodes have a valid node key. Updates tailscale/corp#43705 Updates #19326 Signed-off-by: Alex Chan <alexc@tailscale.com> * Revert "tsnet: test key extension after server restart" This reverts commit 3172013. This test relies on changes in 3172013, which is also being reverted because it causes test failures in corp. Updates tailscale/corp#43705 Updates #19326 Signed-off-by: Alex Chan <alexc@tailscale.com> --------- Signed-off-by: Alex Chan <alexc@tailscale.com>

franbull and others added 30 commits May 8, 2026 08:12

feature/conn25: move addrAssignments to their own file

82346f3

Updates tailscale/corp#39975 Signed-off-by: Fran Bull <fran@tailscale.com>

cmd/pgproxy: fix client TLS handshake timeout

ead5ce6

There is a 30-second timeout set on client TLS connections but the handshake was called on the wrong connection and so the timeout was never used in practice. Signed-off-by: Francois Marier <francois@fmarier.org>

cmd/tailscale/cli: add RunWithContext

ad8ead9

Fixes #12778 Change-Id: If9f8b299cef0cb68f93b344845b5c6a5b7554d2c Signed-off-by: DeedleFake <deedlefake@users.noreply.github.com>

ipn/ipnlocal: fix minor typo in shouldUseOneCGNATRoute (#19719)

6467f0d

This fixes a log message where ipn/ipnlocal.shouldUseOneCGNATRoute would claim that an android machines was actually macOS. Updates #cleanup Updates #19652 Signed-off-by: Simon Law <sfllaw@tailscale.com>

feature/conn25: keep addrAssignments through pool reconfig

3a6261b

Fixes tailscale/corp#40250 Signed-off-by: Fran Bull <fran@tailscale.com>

net/dns: create a new hosts file if it doesn't exist on Windows

32f984f

A missing hosts file is not a fatal error. We should log it, but still proceed and create a new one instead of failing the DNS reconfiguration completely. Fixes #19733 Signed-off-by: Nick Khyl <nickk@tailscale.com>

.github/workflows: change natlab test trigger label (#19750)

8203edc

The label "natlab" is a bit confusing and also used for other things. Instead, change the trigger label to "run-natlab-tests". Updates #13038 Signed-off-by: Claus Lensbøl <claus@tailscale.com>

knyar and others added 30 commits June 22, 2026 12:28

go.mod: bump wireguard-go (#20203)

568c0bd

Fix leaking peers that failed to complete the handshake. Updates #20183 Change-Id: I84f7ea0484f05b090d963a7d12c135a66a6a6964 Signed-off-by: Alex Valiushko <alexvaliushko@tailscale.com>

.github: pin govulncheck@1.3.0 (#20219)

72876a9

Pin govulncheck to resolve panics in the most recent version. Updates #cleanup Signed-off-by: Patrick O'Doherty <patrick@tailscale.com>

.gitattributes: explicitly mark text files as such with eol

b7422fa

I'm not keen on us having to deal with the bad side effects of the autocrlf default, but alas, if it makes things easier. Fixes #16175 Closes #16176 Signed-off-by: James Tucker <james@tailscale.com>

wgengine/magicsock: consider VNI as part of peer relay handshake supp…

badd0c4

…ression Otherwise we may never handshake a new peer relay server endpoint around remote client restarts and/or disco key rotation. Updates #20215 Signed-off-by: Jordan Whited <jordan@tailscale.com>

cmd/tailscale/cli: fix capitalisation of flags

281404e

Most of our flag descriptions start with a lowercase word (except proper nouns); fix the handful which do not. Fixes #20230 Change-Id: I00aaac171254c050ad0b75c2cf8746590c8c4d8f Signed-off-by: Alex Chan <alexc@tailscale.com>

.github: add zizmor GitHub Actions linting (#20243)

453c078

Add zizmor GitHub Actions linting on changes to .github/workflows. Updates tailscale/corp#28760 Signed-off-by: Patrick O'Doherty <patrick@tailscale.com>

util/cmpver: add a test for comparing three-digit versions

9f92a47

No code changes needed; this is to rule out cmpver as the source of any version-comparison issues. Updates #20238 Change-Id: Ib8765dd042e994549d9e2c03859a5f769a856704 Signed-off-by: Alex Chan <alexc@tailscale.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fork Sync: Update from parent repository#36

Fork Sync: Update from parent repository#36
github-actions[bot] wants to merge 1667 commits into
MultiMx:mainfrom
tailscale:main

github-actions Bot commented Jun 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Uh oh!

Conversation

github-actions Bot commented Jun 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants