-
Notifications
You must be signed in to change notification settings - Fork 151
Description
Contact Details
What would you like to ask or discuss?
HDFS is functioning normally, and all DataNodes are online, but the task application is still reporting errors:
2025-02-12 17:33:53,953 ERROR 6633 [delayed-queue-executor-3] [] : [c.o.c.a.service.impl.LogParserServiceImpl:212] parseError: java.lang.Exception: failed to read file: hdfs://nameservice1/flume/dolphinscheduler/2025-02-12/20250212/16628488504192_1-32-41.log, err: Could not obtain block: BP-1830315256-192.168.100.219-1733820781848:blk_1073745518_4694 file=/flume/dolphinscheduler/2025-02-12/20250212/16628488504192_1-32-41.log No live nodes contain current block Block locations: DatanodeInfoWithStorage[192.168.100.220:1026,DS-282152c0-871e-4254-b69c-730c5f1761ec,DISK] DatanodeInfoWithStorage[192.168.100.221:1026,DS-063987aa-021a-4f31-9be2-a6a90191ea2e,DISK] DatanodeInfoWithStorage[192.168.100.219:1026,DS-0d5dc533-d6cc-4ede-b70e-dbfc7fb76771,DISK] Dead nodes: DatanodeInfoWithStorage[192.168.100.221:1026,DS-063987aa-021a-4f31-9be2-a6a90191ea2e,DISK] DatanodeInfoWithStorage[192.168.100.220:1026,DS-282152c0-871e-4254-b69c-730c5f1761ec,DISK] DatanodeInfoWithStorage[192.168.100.219:1026,DS-0d5dc533-d6cc-4ede-b70e-dbfc7fb76771,DISK] at com.oppo.cloud.application.util.HDFSUtil.readLines(HDFSUtil.java:135) at com.oppo.cloud.application.service.impl.LogParserServiceImpl$LogParser.extract(LogParserServiceImpl.java:384) at com.oppo.cloud.application.service.impl.LogParserServiceImpl.handle(LogParserServiceImpl.java:203) at com.oppo.cloud.application.task.DelayedTask.handleDelayTask(DelayedTask.java:120) at com.oppo.cloud.application.task.DelayedTask.lambda$run$1(DelayedTask.java:103) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
Checked the HDFS DataNode logs and found :
2025-02-12 17:33:53,901 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Failed to read expected SASL data transfer protection handshake from client at /192.168.100.203:56864. Perhaps the client is running a n older version of Hadoop which does not support SASL data transfer protection org.apache.hadoop.hdfs.protocol.datatransfer.sasl.InvalidMagicNumberException: Received 1c5182 instead of deadbeef from client. at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.doSaslHandshake(SaslDa taTransferServer.java:374) at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.getSaslStreams(SaslDat aTransferServer.java:308) at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.receive(SaslDataTransf erServer.java:135) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) at java.lang.Thread.run(Thread.java:750)
HDFS is configured as...
dfs.data.transfer.protection
authentication
hadoop.rpc.protection
authentication
The Hadoop client version of the task application is consistent with the server version
Here is the application-hadoop.yml configuration:
hadoop:
namenodes:
- nameservices: nameservice1
namenodesAddr: [ "ddp1", "ddp2" ]
namenodes: [ "nn1", "nn2" ]
user: nn
password:
port: 8020
# scheduler platform hdfs log path keyword identification, used by task-application
matchPathKeys: [ "flume" ]
# kerberos
enableKerberos: true
# /etc/krb5.conf
krb5Conf: "/data/module/compass-v1.1.2/task-application/conf/krb5.conf"
# hdfs/@EXAMPLE.COM
principalPattern: "nn/@HADOOP.COM"
# admin
loginUser: "nn/ddp1@HADOOP.COM"
# /var/kerberos/krb5kdc/admin.keytab
keytabPath: "/data/module/compass-v1.1.2/task-application/conf/nn.service.keytab"
Please help identify the cause of the issue.