DataStax Bulk Loader References
-
DataStax Blogs,
-
DataStax Academy Developer Blogs,
-
Distributed Data Show
-
DataStax Documentation
-
Other Titbits
- Get DSBulk unload/load commands for a given keyspace (& it’s tables) to copy over data from source and destination. Assumptions: There is no requirement to maintain
TTL&WRITETIME(life span) and we understand that the source cluster is taking live traffic and there could be data written post the copy over operation in DSBulk,- (NOTE: I used
localhostfor the source and127.0.0.1for the destination, but one can tweak those to their cluster configuration and also add username and password to access the same, as required.)
./dsbulk unload -header false -verbosity 0 -h localhost -query "SELECT keyspace_name, table_name FROM system_schema.tables WHERE keyspace_name = 'my_example_keyspace'" | awk -F, '{printf("\ndsbulk unload -h localhost -k %s -t %s | dsbulk load -h 127.0.0.1 -k %s -t %s\n", $1, $2, $1, $2);}' - (NOTE: I used
- Let’s say if we’re performing a simple & straightforward DSBulk unload/load operation and notice the below scanned tombstone exception (click expand to view the stacktrace),
dsbulk unload -h IP -k ks_source -t tbl_name | dsbulk load -h IP -k ks_dest -t tbl_name - When dealing with data that has too many tombstones, we could turn off executor continuous paging. When Continuous Paging is on,
schema.splitscan help (should be set to high values), when Continuous Paging is off, page sizes should help (should be set to smaller values).we might see tombstone errors like these,dsbulk unload -h IP -k ks_source -t tbl_name --executor.continuousPaging.enabled false| dsbulk load -h IP -k ks_dest -t tbl_name --executor.continuousPaging.enabled falsecom.datastax.dsbulk.executor.api.exception.BulkExecutionException: Statement execution failed: TokenRangeReadStatement{table=digsupchain_qa_stage.discp_supply_commits,range=(457426640312032821,1353117933348808081]} (An unexpected error occurred server side on /10.175.181.34:9042: org.apache.cassandra.db.filter.TombstoneOverwhelmingException: Scanned over 200001 tombstones during query ‘SELECT... ... WHERE token(facility_location) > 457426640312032821 AND token(facility_location) <= 1353117933348808081 ‘ (last scanned row partion key was ((NFC_GENCO), MM81X, DAO, 9a1a5c0e-cd55-3f9d-8eec-64c18aa3a73d)); query aborted) at com.datastax.dsbulk.executor.api.internal.subscription.ResultSubscription.toErrorPage(ResultSubscription.java:509) at com.datastax.dsbulk.executor.api.internal.subscription.ResultSubscription.lambda$fetchNextPage$5(ResultSubscription.java:360) at com.datastax.dsbulk.executor.api.internal.subscription.ResultSubscription$1.onFailure(ResultSubscription.java:579) [4 skipped] at com.datastax.driver.core.ContinuousPagingQueue.complete(ContinuousPagingQueue.java:311) [6 skipped] at com.datastax.driver.core.ContinuousPagingQueue.enqueueOrCompletePending(ContinuousPagingQueue.java:196) Caused by: com.datastax.driver.core.exceptions.ServerError: An unexpected error occurred server side on /10.175.181.34:9042: org.apache.cassandra.db.filter.TombstoneOverwhelmingException: Scanned over 200001 tombstones during query ‘SELECT ... at com.datastax.driver.core.Responses$Error.asException(Responses.java:113) at com.datastax.driver.core.ContinuousPagingQueue.onResponse(ContinuousPagingQueue.java:138) at com.datastax.driver.core.MultiResponseRequestHandler.setResult(MultiResponseRequestHandler.java:740) at com.datastax.driver.core.MultiResponseRequestHandler.onSet(MultiResponseRequestHandler.java:499) at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1091) Suppressed: java.lang.Exception: #block terminated with an error at com.datastax.dsbulk.engine.UnloadWorkflow.execute(UnloadWorkflow.java:145) [2 skipped] at com.datastax.dsbulk.engine.DataStaxBulkLoader$WorkflowThread.run(DataStaxBulkLoader.java:168) - Starting with DSBulk version
1.5.0, we can load DSE Graph data swiftly into DataStax Enterprise (DSE).
- Get DSBulk unload/load commands for a given keyspace (& it’s tables) to copy over data from source and destination. Assumptions: There is no requirement to maintain