Skip to content

compatlib_function - ConnectBy supports VARCHAR inputs#70

Open
h-serizawa wants to merge 4 commits intovertica:masterfrom
h-serizawa:master
Open

compatlib_function - ConnectBy supports VARCHAR inputs#70
h-serizawa wants to merge 4 commits intovertica:masterfrom
h-serizawa:master

Conversation

@h-serizawa
Copy link
Contributor

Currently, ConnectBy supports INTEGER inputs as parent ID and child ID. Now, it supports VARCHAR inputs as well. Close #69.

@h-serizawa
Copy link
Contributor Author

[Test Data]

=> CREATE TABLE company (id INT, supervisor_id INT, name VARCHAR(20));
=> INSERT INTO company VALUES (1, null, 'Patrick');
=> INSERT INTO company VALUES (2, 1, 'Jim');
=> INSERT INTO company VALUES (3, 1, 'Sandy');
=> INSERT INTO company VALUES (4, 3, 'Brian');
=> INSERT INTO company VALUES (4, 3, 'Otto');
=> COMMIT;

[Test Result with INTEGER inputs]

=> SELECT connect_by_path(supervisor_id, id, name, ' >> ') OVER () FROM company;

 identifier | depth |           path
------------+-------+--------------------------
          1 |     0 | Patrick
          2 |     1 | Patrick >> Jim
          3 |     1 | Patrick >> Sandy
          4 |     2 | Patrick >> Sandy >> Otto
(4 rows)

[Test Result with VARCHAR inputs]

=> SELECT connect_by_path(supervisor_id::VARCHAR, id::VARCHAR, name, ' >> ') OVER () FROM company;

 identifier | depth |           path
------------+-------+--------------------------
 1          |     0 | Patrick
 2          |     1 | Patrick >> Jim
 3          |     1 | Patrick >> Sandy
 4          |     2 | Patrick >> Sandy >> Otto
(4 rows)

@dmankins
Copy link
Collaborator

dmankins commented Apr 1, 2022

Thank you!

I have one reservation.

You give two factories: ConnectByIntFactory and ConnectByVarcharFactory to provide two interfaces to the same function (by reading ints as strings). This means that someone using ints will pay a conversion price to strings. In addition, all the map operations will be done on string keys and not int keys, which I suspect is less efficient.

If one's tables are small, this is probably not such a big concern, but Vertica is used with tables that have billions of rows.

Given that many (possibly most) uses of this function will be using integer keys from a dimension table, I think it might be better to have separate ConnectByInt and ConnectByVarchar implementations which only do conversions when needed.

I wonder if you could implement a template that does the algorithmic work in processPartition to avoid duplication?

Does that seem reasonable to you?

@h-serizawa
Copy link
Contributor Author

@dmankins Thank you for your valuable feedback! Yes, the data conversion from int to string will be a performance concern. I removed the conversion using the template. Can you review this request again?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

compatlib_function - ConnectBy supports only an integer input. It does not work when the parent-child relationship is established as a STRING / VARCHAR.

2 participants