-
Notifications
You must be signed in to change notification settings - Fork 242
[BUG] TypeError: 'JavaPackage' object is not callable after calling feathr_init_script.py #1217
Copy link
Copy link
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Willingness to contribute
Yes. I can contribute a fix for this bug independently.
Feathr version
1.0.0
System information
- OS Platform and Distribution: Linux Ubuntu 22.04 (Jammy), container, tag:
jupyter/pyspark-notebook:python-3.9.13 - Python version: 3.9.13
- Spark version, if reporting runtime issue: 3.3.3
Describe the problem
Hi, thanks for your work! I really enjoyed studying your project. Now I deployed it as a group of services and ran into an error with the feathr_init_script.py script:
Error details:
23/08/31 16:03:04 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://spark-master:7077...
23/08/31 16:03:04 INFO TransportClientFactory: Successfully created connection to spark-master/172.20.0.5:7077 after 16 ms (0 ms spent in bootstraps)
23/08/31 16:03:04 INFO StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20230831160304-0001
23/08/31 16:03:04 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20230831160304-0001/0 on worker-20230831151258-172.20.0.7-39611 (172.20.0.7:39611) with 12 core(s)
23/08/31 16:03:04 INFO StandaloneSchedulerBackend: Granted executor ID app-20230831160304-0001/0 on hostPort 172.20.0.7:39611 with 12 core(s), 1024.0 MiB RAM
23/08/31 16:03:04 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 37479.
23/08/31 16:03:04 INFO NettyBlockTransferService: Server created on localhost:37479
23/08/31 16:03:04 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
23/08/31 16:03:04 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, localhost, 37479, None)
23/08/31 16:03:04 INFO BlockManagerMasterEndpoint: Registering block manager localhost:37479 with 434.4 MiB RAM, BlockManagerId(driver, localhost, 37479, None)
23/08/31 16:03:04 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, localhost, 37479, None)
23/08/31 16:03:04 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, localhost, 37479, None)
23/08/31 16:03:04 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20230831160304-0001/0 is now RUNNING
23/08/31 16:03:04 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
pyspark_client.py: Preprocessing via UDFs and submit Spark job.
FeatureJoinConfig is provided. Executing FeatureJoinJob.
submit_spark_job: feature_names_funcs:
{'f_location_avg_fare,f_location_max_fare': <function preprocessing at 0x7f191a3ee9d0>}
set(feature_names_funcs.keys()):
{'f_location_avg_fare,f_location_max_fare'}
submit_spark_job: Load DataFrame from Scala engine.
Traceback (most recent call last):
File "/tmp/tmpv8dcsccs/feathr_pyspark_driver.py", line 107, in <module>
submit_spark_job(feature_names_funcs)
File "/tmp/tmpv8dcsccs/feathr_pyspark_driver.py", line 64, in submit_spark_job
dataframeFromSpark = py4j_feature_job.loadSourceDataframe(
TypeError: 'JavaPackage' object is not callable
23/08/31 16:03:05 INFO SparkContext: Invoking stop() from shutdown hook
23/08/31 16:03:05 INFO SparkUI: Stopped Spark web UI at http://localhost:4040
23/08/31 16:03:05 INFO StandaloneSchedulerBackend: Shutting down all executors
23/08/31 16:03:05 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asking each executor to shut down
23/08/31 16:03:05 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
23/08/31 16:03:05 INFO MemoryStore: MemoryStore cleared
23/08/31 16:03:05 INFO BlockManager: BlockManager stopped
23/08/31 16:03:05 INFO BlockManagerMaster: BlockManagerMaster stopped
23/08/31 16:03:05 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
23/08/31 16:03:05 INFO SparkContext: Successfully stopped SparkContext
23/08/31 16:03:05 INFO ShutdownHookManager: Shutdown hook called
23/08/31 16:03:05 INFO ShutdownHookManager: Deleting directory /tmp/spark-2a93ad8c-6d18-4542-b2ab-6c735cb953ba
23/08/31 16:03:05 INFO ShutdownHookManager: Deleting directory /tmp/spark-479df04e-793a-4c82-a245-660169eab9fa/pyspark-d8b933f6-acf7-4445-b0fe-89ad2537e846
23/08/31 16:03:05 INFO ShutdownHookManager: Deleting directory /tmp/spark-479df04e-793a-4c82-a245-660169eab9fa
My docker-compose file: Link. Short manual for my example: Link.
Since I'm not very good at understanding Java logs, I am asking for help in debugging th error. When it is solved, I will be glad to contribute.
Tracking information
Error details:
23/08/31 16:03:04 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://spark-master:7077...
23/08/31 16:03:04 INFO TransportClientFactory: Successfully created connection to spark-master/172.20.0.5:7077 after 16 ms (0 ms spent in bootstraps)
23/08/31 16:03:04 INFO StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20230831160304-0001
23/08/31 16:03:04 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20230831160304-0001/0 on worker-20230831151258-172.20.0.7-39611 (172.20.0.7:39611) with 12 core(s)
23/08/31 16:03:04 INFO StandaloneSchedulerBackend: Granted executor ID app-20230831160304-0001/0 on hostPort 172.20.0.7:39611 with 12 core(s), 1024.0 MiB RAM
23/08/31 16:03:04 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 37479.
23/08/31 16:03:04 INFO NettyBlockTransferService: Server created on localhost:37479
23/08/31 16:03:04 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
23/08/31 16:03:04 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, localhost, 37479, None)
23/08/31 16:03:04 INFO BlockManagerMasterEndpoint: Registering block manager localhost:37479 with 434.4 MiB RAM, BlockManagerId(driver, localhost, 37479, None)
23/08/31 16:03:04 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, localhost, 37479, None)
23/08/31 16:03:04 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, localhost, 37479, None)
23/08/31 16:03:04 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20230831160304-0001/0 is now RUNNING
23/08/31 16:03:04 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
pyspark_client.py: Preprocessing via UDFs and submit Spark job.
FeatureJoinConfig is provided. Executing FeatureJoinJob.
submit_spark_job: feature_names_funcs:
{'f_location_avg_fare,f_location_max_fare': <function preprocessing at 0x7f191a3ee9d0>}
set(feature_names_funcs.keys()):
{'f_location_avg_fare,f_location_max_fare'}
submit_spark_job: Load DataFrame from Scala engine.
Traceback (most recent call last):
File "/tmp/tmpv8dcsccs/feathr_pyspark_driver.py", line 107, in <module>
submit_spark_job(feature_names_funcs)
File "/tmp/tmpv8dcsccs/feathr_pyspark_driver.py", line 64, in submit_spark_job
dataframeFromSpark = py4j_feature_job.loadSourceDataframe(
TypeError: 'JavaPackage' object is not callable
23/08/31 16:03:05 INFO SparkContext: Invoking stop() from shutdown hook
23/08/31 16:03:05 INFO SparkUI: Stopped Spark web UI at http://localhost:4040
23/08/31 16:03:05 INFO StandaloneSchedulerBackend: Shutting down all executors
23/08/31 16:03:05 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asking each executor to shut down
23/08/31 16:03:05 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
23/08/31 16:03:05 INFO MemoryStore: MemoryStore cleared
23/08/31 16:03:05 INFO BlockManager: BlockManager stopped
23/08/31 16:03:05 INFO BlockManagerMaster: BlockManagerMaster stopped
23/08/31 16:03:05 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
23/08/31 16:03:05 INFO SparkContext: Successfully stopped SparkContext
23/08/31 16:03:05 INFO ShutdownHookManager: Shutdown hook called
23/08/31 16:03:05 INFO ShutdownHookManager: Deleting directory /tmp/spark-2a93ad8c-6d18-4542-b2ab-6c735cb953ba
23/08/31 16:03:05 INFO ShutdownHookManager: Deleting directory /tmp/spark-479df04e-793a-4c82-a245-660169eab9fa/pyspark-d8b933f6-acf7-4445-b0fe-89ad2537e846
23/08/31 16:03:05 INFO ShutdownHookManager: Deleting directory /tmp/spark-479df04e-793a-4c82-a245-660169eab9fa
Code to reproduce bug
No response
What component(s) does this bug affect?
-
Python Client: This is the client users use to interact with most of our API. Mostly written in Python. -
Computation Engine: The computation engine that execute the actual feature join and generation work. Mostly in Scala and Spark. -
Feature Registry API: The frontend API layer supports SQL, Purview(Atlas) as storage. The API layer is in Python(FAST API) -
Feature Registry Web UI: The Web UI for feature registry. Written in React
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working