Skip to content

waddrmgr: Put client request first in recovery.#979

Merged
guggero merged 1 commit intobtcsuite:masterfrom
JoeGruffins:dontfetchbeforebday
Mar 10, 2025
Merged

waddrmgr: Put client request first in recovery.#979
guggero merged 1 commit intobtcsuite:masterfrom
JoeGruffins:dontfetchbeforebday

Conversation

@JoeGruffins
Copy link
Contributor

@JoeGruffins JoeGruffins commented Feb 25, 2025

closes #978

I'm unsure if this is the best solution but it gets me past the block in my issue. It looks to me like this should always hit when restoring with a birthday, but it does not. Does btcwallet not download all headers anyway?

@JoeGruffins JoeGruffins marked this pull request as draft February 25, 2025 08:20
@JoeGruffins
Copy link
Contributor Author

Oh wait, got stuck a second time at another block 2025-02-25 17:19:30.554 [ERR] BTCW: Unable to synchronize wallet to chain, trying again in 5s: unable to perform wallet recovery: failed to store sync information 00000000000000000000b9e6cc0b5be06c02f56506fee5c7897edf6cafc4e7cc: failed to fetch block hash for height 870852: block not found

@JoeGruffins
Copy link
Contributor Author

If there was an error connecting to a client inside an update in recovery, the tx is rolled back but the in memory sync point is not. this fixes that https://github.com/btcsuite/btcwallet/compare/40d5f25cd330609e350818655632b18fb1a360c8..6935a129d7e68f106895c7dd0c761d8e09b1fdd9

@JoeGruffins JoeGruffins marked this pull request as ready for review February 26, 2025 07:08
@JoeGruffins JoeGruffins changed the title waddrmgr: Don't get block hash before birthday. waddrmgr: Roll back in memory sync point on error. Feb 26, 2025
@JoeGruffins
Copy link
Contributor Author

The birthday thing was apparently a coincidence.

Copy link

@dev-warrior777 dev-warrior777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one

wallet/wallet.go Outdated
)
})
if err != nil {
w.Manager.SetSyncedToMemory(&rollbackStamp)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so what was the initial error in the first place that got this out of sync? Sounds like all the failed to fetch block hash for height 870852: block not found errors are follow-up errors.
Would be nice to know that. But this fix definitely makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the causing error in the issue for one instance.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see, fetching a block failed in a Neutrino setup.

Great catch, I'm quite certain this is indeed the cause for the errors you saw.

I wonder how we should deal with all the other cases where SetSyncedTo() is called within a database transaction that might be rolled back and cause the DB and in-memory state to de-synced...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the out-of-synced state will make things worse. It looks like recoverScopedAddresses should be performed first before calling SetSyncedTo. Also it needs some refactor to move the non-db operations out so we don't need to wrap them in one giant tx walletdb.Update.

Copy link
Collaborator

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this approach works - it's hacky and creates inconsistent states.

wallet/wallet.go Outdated
)
})
if err != nil {
w.Manager.SetSyncedToMemory(&rollbackStamp)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the out-of-synced state will make things worse. It looks like recoverScopedAddresses should be performed first before calling SetSyncedTo. Also it needs some refactor to move the non-db operations out so we don't need to wrap them in one giant tx walletdb.Update.

@JoeGruffins
Copy link
Contributor Author

JoeGruffins commented Feb 27, 2025

I think the out-of-synced state will make things worse.

I'll look again but I think all of the db functions never happen, and this was the only thing not rolled back. I'll do whatever though because this bug is affecting a few of our projects.

@JoeGruffins
Copy link
Contributor Author

JoeGruffins commented Feb 27, 2025

@yyforyongyu I've looked through, and there is alot, but I can't find any values that are not rolled back. Could you point them out?

Also, with no fix, if the bug is hit, then the db would be borked anyway if what you say is correct. Even if a huge refactor is needed, a fix for the moment would be good to have. I can try to write a test I guess?

@JoeGruffins JoeGruffins changed the title waddrmgr: Roll back in memory sync point on error. waddrmgr: Put client request first in recovery. Feb 27, 2025
@JoeGruffins
Copy link
Contributor Author

JoeGruffins commented Feb 27, 2025

@yyforyongyu Is this current version ok for a patch? It only swaps the order as you suggested. Because the client request for headers happens in the first part, and this is where a connection error can happen, it won't continue to update the internal sync point. This also fixes the immediate problem for us. https://github.com/btcsuite/btcwallet/compare/469713a564568f734e51e8bd87e7a2ac2885b754..06463724d4c69dc3e09f2191de442f5635e9c606

@JoeGruffins
Copy link
Contributor Author

Copy link
Collaborator

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update! LGTM 🙏

@JoeGruffins JoeGruffins force-pushed the dontfetchbeforebday branch from 7509f6b to fe45f81 Compare March 6, 2025 06:32
@JoeGruffins
Copy link
Contributor Author

@guggero guggero self-requested a review March 6, 2025 08:35
@guggero
Copy link
Collaborator

guggero commented Mar 7, 2025

I think this change should be safe, thanks a lot for the fix.
Would be great if you could open an lnd PR that references this PR in the go.mod (with a replace directive to point to your fork/commit) to see that all integration tests still pass.

@JoeGruffins
Copy link
Contributor Author

JoeGruffins commented Mar 7, 2025

Would be great if you could open an lnd PR that references this PR in the go.mod (with a replace directive to point to your fork/commit) to see that all integration tests still pass.

Attempting to do this but having a hell of a time with module tags. Erros like server response: not found: github.com/joegruffins/btcwallet@v0.16.10-0.20241127094224-93c858b2ad63: invalid pseudo-version: preceding tag (v0.16.9) not found

So tried just setting a new tag on my master with that commit and then its 404 Not Found

Maybe it will be found after a while...

@JoeGruffins
Copy link
Contributor Author

JoeGruffins commented Mar 8, 2025

most tests passing locally JoeGruffins/lnd#1

test results
$ go test ./...
ok  	github.com/lightningnetwork/lnd	(cached)
ok  	github.com/lightningnetwork/lnd/aezeed	(cached)
ok  	github.com/lightningnetwork/lnd/aliasmgr	(cached)
ok  	github.com/lightningnetwork/lnd/amp	(cached)
ok  	github.com/lightningnetwork/lnd/autopilot	(cached)
ok  	github.com/lightningnetwork/lnd/batch	(cached)
ok  	github.com/lightningnetwork/lnd/blockcache	(cached)
ok  	github.com/lightningnetwork/lnd/brontide	(cached)
ok  	github.com/lightningnetwork/lnd/buffer	(cached)
ok  	github.com/lightningnetwork/lnd/build	(cached)
ok  	github.com/lightningnetwork/lnd/chainio	(cached)
ok  	github.com/lightningnetwork/lnd/chainntnfs	(cached)
?   	github.com/lightningnetwork/lnd/chainntnfs/bitcoindnotify	[no test files]
?   	github.com/lightningnetwork/lnd/chainntnfs/btcdnotify	[no test files]
?   	github.com/lightningnetwork/lnd/chainntnfs/neutrinonotify	[no test files]
?   	github.com/lightningnetwork/lnd/chainreg	[no test files]
ok  	github.com/lightningnetwork/lnd/chanacceptor	(cached)
ok  	github.com/lightningnetwork/lnd/chanbackup	(cached)
ok  	github.com/lightningnetwork/lnd/chanfitness	(cached)
ok  	github.com/lightningnetwork/lnd/channeldb	(cached)
ok  	github.com/lightningnetwork/lnd/channeldb/migration	(cached)
?   	github.com/lightningnetwork/lnd/channeldb/migration/lnwire21	[no test files]
ok  	github.com/lightningnetwork/lnd/channeldb/migration12	(cached)
ok  	github.com/lightningnetwork/lnd/channeldb/migration13	(cached)
ok  	github.com/lightningnetwork/lnd/channeldb/migration16	(cached)
ok  	github.com/lightningnetwork/lnd/channeldb/migration20	(cached)
ok  	github.com/lightningnetwork/lnd/channeldb/migration21	(cached)
?   	github.com/lightningnetwork/lnd/channeldb/migration21/common	[no test files]
?   	github.com/lightningnetwork/lnd/channeldb/migration21/current	[no test files]
?   	github.com/lightningnetwork/lnd/channeldb/migration21/legacy	[no test files]
ok  	github.com/lightningnetwork/lnd/channeldb/migration23	(cached)
ok  	github.com/lightningnetwork/lnd/channeldb/migration24	(cached)
ok  	github.com/lightningnetwork/lnd/channeldb/migration25	(cached)
ok  	github.com/lightningnetwork/lnd/channeldb/migration26	(cached)
ok  	github.com/lightningnetwork/lnd/channeldb/migration27	(cached)
ok  	github.com/lightningnetwork/lnd/channeldb/migration29	(cached)
ok  	github.com/lightningnetwork/lnd/channeldb/migration30	(cached)
ok  	github.com/lightningnetwork/lnd/channeldb/migration31	(cached)
ok  	github.com/lightningnetwork/lnd/channeldb/migration32	(cached)
ok  	github.com/lightningnetwork/lnd/channeldb/migration33	(cached)
ok  	github.com/lightningnetwork/lnd/channeldb/migration_01_to_11	(cached)
?   	github.com/lightningnetwork/lnd/channeldb/migration_01_to_11/zpay32	[no test files]
?   	github.com/lightningnetwork/lnd/channeldb/migtest	[no test files]
?   	github.com/lightningnetwork/lnd/channelnotifier	[no test files]
?   	github.com/lightningnetwork/lnd/cluster	[no test files]
ok  	github.com/lightningnetwork/lnd/cmd/commands	(cached)
?   	github.com/lightningnetwork/lnd/cmd/lncli	[no test files]
?   	github.com/lightningnetwork/lnd/cmd/lnd	[no test files]
ok  	github.com/lightningnetwork/lnd/contractcourt	(cached)
ok  	github.com/lightningnetwork/lnd/discovery	(cached)
ok  	github.com/lightningnetwork/lnd/feature	(cached)
ok  	github.com/lightningnetwork/lnd/funding	(cached)
ok  	github.com/lightningnetwork/lnd/graph	(cached)
ok  	github.com/lightningnetwork/lnd/graph/db	(cached)
ok  	github.com/lightningnetwork/lnd/graph/db/models	(cached)
--- FAIL: TestChannelLinkBandwidthConsistency (0.00s)
    link_test.go:2455: htlcswitch tests must be run with '-tags dev
--- FAIL: TestChannelLinkTrimCircuitsNoCommit (0.00s)
    link_test.go:3241: htlcswitch tests must be run with '-tags dev
--- FAIL: TestChannelLinkCleanupSpuriousResponses (1.01s)
    link_test.go:5559: alice fwdpkg index 0 should not have ack
--- FAIL: TestPipelineSettle (15.02s)
    link_test.go:7206: did not receive message
--- FAIL: TestSwitchDustForwarding (15.28s)
    switch_test.go:4329: 
        	Error Trace:	/home/joe/git/lnd/htlcswitch/switch_test.go:4329
        	            				/home/joe/git/lnd/htlcswitch/switch_test.go:4337
        	Error:      	Received unexpected error:
        	            	got totalDust=0 mSAT, expectedDust=70000000 mSAT
        	Test:       	TestSwitchDustForwarding
        	Messages:   	timeout checking dust
FAIL
FAIL	github.com/lightningnetwork/lnd/htlcswitch	27.927s
--- FAIL: TestMask (0.00s)
    mask_test.go:92: htlcswitch tests must be run with '-tags=dev'
FAIL
FAIL	github.com/lightningnetwork/lnd/htlcswitch/hodl	0.001s
ok  	github.com/lightningnetwork/lnd/htlcswitch/hop	(cached)
ok  	github.com/lightningnetwork/lnd/input	(cached)
ok  	github.com/lightningnetwork/lnd/internal/musig2v040	(cached)
--- FAIL: TestMigrateSingleInvoiceRapid (0.00s)
    postgres_fixture.go:70: 
        	Error Trace:	/home/joe/go/pkg/mod/github.com/lightningnetwork/lnd/sqldb@v1.0.7/postgres_fixture.go:70
        	            				/home/joe/git/lnd/invoices/sql_migration_test.go:264
        	Error:      	Received unexpected error:
        	            	dial unix /var/run/docker.sock: connect: no such file or directory
        	Test:       	TestMigrateSingleInvoiceRapid
        	Messages:   	Could not start resource
--- FAIL: TestInvoiceRegistry (0.00s)
    postgres_fixture.go:70: 
        	Error Trace:	/home/joe/go/pkg/mod/github.com/lightningnetwork/lnd/sqldb@v1.0.7/postgres_fixture.go:70
        	            				/home/joe/git/lnd/invoices/invoiceregistry_test.go:140
        	Error:      	Received unexpected error:
        	            	dial unix /var/run/docker.sock: connect: no such file or directory
        	Test:       	TestInvoiceRegistry
        	Messages:   	Could not start resource
--- FAIL: TestInvoices (0.00s)
    postgres_fixture.go:70: 
        	Error Trace:	/home/joe/go/pkg/mod/github.com/lightningnetwork/lnd/sqldb@v1.0.7/postgres_fixture.go:70
        	            				/home/joe/git/lnd/invoices/invoices_test.go:227
        	Error:      	Received unexpected error:
        	            	dial unix /var/run/docker.sock: connect: no such file or directory
        	Test:       	TestInvoices
        	Messages:   	Could not start resource
--- FAIL: TestMigrationWithChannelDB (0.00s)
    postgres_fixture.go:70: 
        	Error Trace:	/home/joe/go/pkg/mod/github.com/lightningnetwork/lnd/sqldb@v1.0.7/postgres_fixture.go:70
        	            				/home/joe/git/lnd/invoices/kv_sql_migration_test.go:31
        	Error:      	Received unexpected error:
        	            	dial unix /var/run/docker.sock: connect: no such file or directory
        	Test:       	TestMigrationWithChannelDB
        	Messages:   	Could not start resource
FAIL
FAIL	github.com/lightningnetwork/lnd/invoices	0.077s
ok  	github.com/lightningnetwork/lnd/itest	(cached)
ok  	github.com/lightningnetwork/lnd/keychain	(cached)
?   	github.com/lightningnetwork/lnd/labels	[no test files]
ok  	github.com/lightningnetwork/lnd/lncfg	(cached)
ok  	github.com/lightningnetwork/lnd/lnencrypt	(cached)
?   	github.com/lightningnetwork/lnd/lnmock	[no test files]
?   	github.com/lightningnetwork/lnd/lnpeer	[no test files]
?   	github.com/lightningnetwork/lnd/lnrpc	[no test files]
?   	github.com/lightningnetwork/lnd/lnrpc/autopilotrpc	[no test files]
?   	github.com/lightningnetwork/lnd/lnrpc/chainrpc	[no test files]
?   	github.com/lightningnetwork/lnd/lnrpc/devrpc	[no test files]
ok  	github.com/lightningnetwork/lnd/lnrpc/invoicesrpc	(cached)
?   	github.com/lightningnetwork/lnd/lnrpc/lnclipb	[no test files]
?   	github.com/lightningnetwork/lnd/lnrpc/neutrinorpc	[no test files]
?   	github.com/lightningnetwork/lnd/lnrpc/peersrpc	[no test files]
ok  	github.com/lightningnetwork/lnd/lnrpc/routerrpc	(cached)
?   	github.com/lightningnetwork/lnd/lnrpc/signrpc	[no test files]
?   	github.com/lightningnetwork/lnd/lnrpc/verrpc	[no test files]
ok  	github.com/lightningnetwork/lnd/lnrpc/walletrpc	(cached)
?   	github.com/lightningnetwork/lnd/lnrpc/watchtowerrpc	[no test files]
?   	github.com/lightningnetwork/lnd/lnrpc/wtclientrpc	[no test files]
?   	github.com/lightningnetwork/lnd/lntest	[no test files]
?   	github.com/lightningnetwork/lnd/lntest/channels	[no test files]
?   	github.com/lightningnetwork/lnd/lntest/miner	[no test files]
?   	github.com/lightningnetwork/lnd/lntest/mock	[no test files]
?   	github.com/lightningnetwork/lnd/lntest/node	[no test files]
?   	github.com/lightningnetwork/lnd/lntest/port	[no test files]
?   	github.com/lightningnetwork/lnd/lntest/rpc	[no test files]
?   	github.com/lightningnetwork/lnd/lntest/unittest	[no test files]
?   	github.com/lightningnetwork/lnd/lntest/wait	[no test files]
ok  	github.com/lightningnetwork/lnd/lntypes	(cached)
ok  	github.com/lightningnetwork/lnd/lnutils	(cached)
ok  	github.com/lightningnetwork/lnd/lnwallet	(cached)
ok  	github.com/lightningnetwork/lnd/lnwallet/btcwallet	(cached)
ok  	github.com/lightningnetwork/lnd/lnwallet/chainfee	(cached)
ok  	github.com/lightningnetwork/lnd/lnwallet/chancloser	0.285s
ok  	github.com/lightningnetwork/lnd/lnwallet/chanfunding	(cached)
ok  	github.com/lightningnetwork/lnd/lnwallet/chanvalidate	(cached)
?   	github.com/lightningnetwork/lnd/lnwallet/rpcwallet	[no test files]
?   	github.com/lightningnetwork/lnd/lnwallet/test	[no test files]
ok  	github.com/lightningnetwork/lnd/lnwallet/test/bitcoind	(cached)
ok  	github.com/lightningnetwork/lnd/lnwallet/test/btcd	(cached)
ok  	github.com/lightningnetwork/lnd/lnwallet/test/neutrino	(cached)
ok  	github.com/lightningnetwork/lnd/lnwire	(cached)
ok  	github.com/lightningnetwork/lnd/macaroons	(cached)
?   	github.com/lightningnetwork/lnd/monitoring	[no test files]
ok  	github.com/lightningnetwork/lnd/msgmux	(cached)
?   	github.com/lightningnetwork/lnd/multimutex	[no test files]
?   	github.com/lightningnetwork/lnd/nat	[no test files]
ok  	github.com/lightningnetwork/lnd/netann	(cached)
ok  	github.com/lightningnetwork/lnd/peer	(cached)
?   	github.com/lightningnetwork/lnd/peernotifier	[no test files]
ok  	github.com/lightningnetwork/lnd/pool	(cached)
ok  	github.com/lightningnetwork/lnd/protofsm	(cached)
ok  	github.com/lightningnetwork/lnd/record	(cached)
ok  	github.com/lightningnetwork/lnd/routing	(cached)
ok  	github.com/lightningnetwork/lnd/routing/blindedpath	(cached)
ok  	github.com/lightningnetwork/lnd/routing/chainview	(cached)
ok  	github.com/lightningnetwork/lnd/routing/localchans	(cached)
ok  	github.com/lightningnetwork/lnd/routing/route	(cached)
ok  	github.com/lightningnetwork/lnd/routing/shards	(cached)
ok  	github.com/lightningnetwork/lnd/rpcperms	(cached)
ok  	github.com/lightningnetwork/lnd/shachain	(cached)
?   	github.com/lightningnetwork/lnd/signal	[no test files]
ok  	github.com/lightningnetwork/lnd/subscribe	(cached)
ok  	github.com/lightningnetwork/lnd/sweep	(cached)
ok  	github.com/lightningnetwork/lnd/walletunlocker	(cached)
ok  	github.com/lightningnetwork/lnd/watchtower	(cached) [no tests to run]
ok  	github.com/lightningnetwork/lnd/watchtower/blob	(cached)
ok  	github.com/lightningnetwork/lnd/watchtower/lookout	(cached)
ok  	github.com/lightningnetwork/lnd/watchtower/wtclient	(cached)
ok  	github.com/lightningnetwork/lnd/watchtower/wtdb	(cached)
ok  	github.com/lightningnetwork/lnd/watchtower/wtdb/migration1	(cached)
ok  	github.com/lightningnetwork/lnd/watchtower/wtdb/migration2	(cached)
ok  	github.com/lightningnetwork/lnd/watchtower/wtdb/migration3	(cached)
ok  	github.com/lightningnetwork/lnd/watchtower/wtdb/migration4	(cached)
ok  	github.com/lightningnetwork/lnd/watchtower/wtdb/migration5	(cached)
ok  	github.com/lightningnetwork/lnd/watchtower/wtdb/migration6	(cached)
ok  	github.com/lightningnetwork/lnd/watchtower/wtdb/migration7	(cached)
ok  	github.com/lightningnetwork/lnd/watchtower/wtdb/migration8	(cached)
?   	github.com/lightningnetwork/lnd/watchtower/wtmock	[no test files]
ok  	github.com/lightningnetwork/lnd/watchtower/wtpolicy	(cached)
ok  	github.com/lightningnetwork/lnd/watchtower/wtserver	(cached)
ok  	github.com/lightningnetwork/lnd/watchtower/wtwire	(cached)
ok  	github.com/lightningnetwork/lnd/zpay32	(cached)
FAIL

@guggero
Copy link
Collaborator

guggero commented Mar 8, 2025

You just add replace github.com/btcsuite/btcwallet => github.com/joegruffins/btcwallet fe45f818f77fd014ff11971e8689cfd274a2df17 into the go.mod file, then run go mod tidy for it to resolve the pseudo tag.

@JoeGruffins
Copy link
Contributor Author

You just add replace github.com/btcsuite/btcwallet => github.com/joegruffins/btcwallet fe45f818f77fd014ff11971e8689cfd274a2df17 into the go.mod file, then run go mod tidy for it to resolve the pseudo tag.

THX! that was easy.. its updated now. same tests are passing

@yyforyongyu
Copy link
Collaborator

@JoeGruffins Thanks for putting up the tests! Could you open a PR in lnd so other devs can check the CI results there?

@JoeGruffins
Copy link
Contributor Author

ok lightningnetwork/lnd#9596

@yyforyongyu
Copy link
Collaborator

Confirmed the CI passed in lnd, the failures are known flakes.

Copy link
Collaborator

@guggero guggero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix, LGTM 🎉

@guggero guggero merged commit 7be2dd1 into btcsuite:master Mar 10, 2025
3 checks passed
@guggero
Copy link
Collaborator

guggero commented Mar 12, 2025

@yyforyongyu I think we might need to roll this change back and approach it in a different way.
I just tried to recover a testnet (testnet4 to verify the changes in btcd) node and noticed that if there's any error during the wallet recovery, it will start over completely at block 1.
So on Neutrino, if there's a timeout fetching a block it will roll back and start from scratch, which can take forever...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sync: failed to fetch block hash for height ####: block not found

4 participants