Parametrized queries much slower when querying parquet files

### What happens?

When querying against a parquet file located on s3 we are noticing that the query plan for when query parameters are present and when they are not is different. Different enough that a query can end up being 10x or more slower. 

I have seen this replicated on both on Duckdb 1.3.2 and 1.4.1. Python and CLI both have the same behavior. 

I can not share the parquet file directly due to its size and data but the file is sorted by id which should make the query very fast. My hunch is that something is potentially skipping row group filters under this condition and instead doing a full table scan. 

### To Reproduce

Python script that I was using for performance testing. Can also be replicated running the same or similar commands within CLI. 

```python
import duckdb
import time


def create_connection():
    conn = duckdb.connect()

    conn.execute("""SET TimeZone = 'UTC';""")
    conn.execute("SET autoinstall_known_extensions = false;")
    conn.execute("INSTALL spatial; LOAD spatial;")
    conn.execute("INSTALL parquet; LOAD parquet;")
    conn.execute("INSTALL httpfs")
    conn.execute("INSTALL aws;")
    conn.execute("LOAD httpfs;")
    conn.execute("LOAD aws;")

    conn.execute("""
    CREATE SECRET aws_secret (
        TYPE S3,
        PROVIDER CREDENTIAL_CHAIN,
        CHAIN 'config'
    );
    """)
    return conn


def fast():
    # File is a parquet file ordered by id
    file_path = 's3://some_bucket/file_sorted_by_id.parquet'

    sql = f"""
          SELECT id
          FROM read_parquet('{file_path}')
          WHERE id in ('0346EADE0DE2CFFC', '13BCADA2FAB1FD66', '1CBE1BD7B8D59C98', '1D91EB13DA3E2472', '1EDACF5BEDAB3C10',
                    '1FCA7BEDE0FD9A3B', '23A306DAD89EAF0F', '28E205CC6CEFFBB8', '2CDA4C50BA2DEC8A', '30B464675F23E14B',
                    '37CE5E04CDA59E0F', '397CB382EDE1AC16', '3BBAB85A2228ADF8', '4A0A4A4BC5EB5099', '4ABAF8EEBD9DFE32',
                    '4CD67B12AC7B3D9D', '52F2CBEBD25C545D', '68782CDCEB6CDDFF', '6ABC479EF31B4AD7', '6CC7B512B7AFCE14',
                    '6CDC0CC540AEF13E', '75FB83ABE2AE2C8C', '7C82C5CCB80EDFBA', '7CFF8E5778F0F71B', '7EAE4FF84AB243AC',
                    '82B34DB28DF1AD8B', '901A27257C3FD449', '9CAE544FACCDD41D', '9D4D30EA3FE9D3E5', '9DE0191AEAFB0B7B',
                    'A0AE0D1FE22AE828', 'A6CFD40963F5BD95', 'A78ED4C6608CFEE5', 'ADE7F802A7D6CEE5', 'B1A39BB3AD58CB9A',
                    'B2D9CFEEAAF5A5C4', 'B4B150599EAEBD65', 'B4CCFCE24ABFCBA1', 'BBBAC104BD067AC7', 'BFD7028C65EACA6D',
                    'C3747454BECF20A0', 'C763573A73A3DA79', 'C78B6D74CD14EDF9', 'CA541676C357F5FF', 'CAE9ACDEC7539DE0',
                    'CC65DCBFA10B3EEE', 'CE343BDF60B26BE5', 'CF0EE41CC3CE418B', 'D3A69E024760745C', 'D7FEAA2EA8D2FECC',
                    'D8FDBADEBAC4A739', 'D9DDEBD64CCDE7EF', 'DCB539EEA25CFEF3', 'DFEACCB88AEF0EFF', 'E6C8B41EBECA1CEF',
                    'E99E67CFF37F4D44', 'EBFAFE2E18EAC99B', 'EF5BC93B43FBF9EE', 'F41BAC78137112BE', 'FAE3DFAAC7DD5B2D',
                    'FBA9DADFFE2B3A5F', 'FBEFFEBA7AEEE43A', 'FFDA9F7FDC824D05', 'FFF6DD3CD3B2D90B');
          """
    conn = create_connection()
    s = time.monotonic()
    print(conn.execute(sql).fetchall())
    e = time.monotonic()
    print(e - s)
    conn.close()


def slow():
    file_path = 's3://some_bucket/file_sorted_by_id.parquet'
    # This is much slower. Query plan is different. Sometimes up to 10 to 20x for larger files
    sql = f"""
          SELECT id
          FROM read_parquet('{file_path}')
          WHERE id in $1;
          """
    query_values = ['0346EADE0DE2CFFC', '13BCADA2FAB1FD66', '1CBE1BD7B8D59C98', '1D91EB13DA3E2472', '1EDACF5BEDAB3C10',
                    '1FCA7BEDE0FD9A3B', '23A306DAD89EAF0F', '28E205CC6CEFFBB8', '2CDA4C50BA2DEC8A', '30B464675F23E14B',
                    '37CE5E04CDA59E0F', '397CB382EDE1AC16', '3BBAB85A2228ADF8', '4A0A4A4BC5EB5099', '4ABAF8EEBD9DFE32',
                    '4CD67B12AC7B3D9D', '52F2CBEBD25C545D', '68782CDCEB6CDDFF', '6ABC479EF31B4AD7', '6CC7B512B7AFCE14',
                    '6CDC0CC540AEF13E', '75FB83ABE2AE2C8C', '7C82C5CCB80EDFBA', '7CFF8E5778F0F71B', '7EAE4FF84AB243AC',
                    '82B34DB28DF1AD8B', '901A27257C3FD449', '9CAE544FACCDD41D', '9D4D30EA3FE9D3E5', '9DE0191AEAFB0B7B',
                    'A0AE0D1FE22AE828', 'A6CFD40963F5BD95', 'A78ED4C6608CFEE5', 'ADE7F802A7D6CEE5', 'B1A39BB3AD58CB9A',
                    'B2D9CFEEAAF5A5C4', 'B4B150599EAEBD65', 'B4CCFCE24ABFCBA1', 'BBBAC104BD067AC7', 'BFD7028C65EACA6D',
                    'C3747454BECF20A0', 'C763573A73A3DA79', 'C78B6D74CD14EDF9', 'CA541676C357F5FF', 'CAE9ACDEC7539DE0',
                    'CC65DCBFA10B3EEE', 'CE343BDF60B26BE5', 'CF0EE41CC3CE418B', 'D3A69E024760745C', 'D7FEAA2EA8D2FECC',
                    'D8FDBADEBAC4A739', 'D9DDEBD64CCDE7EF', 'DCB539EEA25CFEF3', 'DFEACCB88AEF0EFF', 'E6C8B41EBECA1CEF',
                    'E99E67CFF37F4D44', 'EBFAFE2E18EAC99B', 'EF5BC93B43FBF9EE', 'F41BAC78137112BE', 'FAE3DFAAC7DD5B2D',
                    'FBA9DADFFE2B3A5F', 'FBEFFEBA7AEEE43A', 'FFDA9F7FDC824D05', 'FFF6DD3CD3B2D90B']
    conn = create_connection()
    s = time.monotonic()
    print(conn.execute(sql, parameters=[query_values]).fetchall())
    e = time.monotonic()
    print(e - s)
    conn.close()


if __name__ == '__main__':
    fast()
    slow()
```

### OS:

MacOS Sequoia 15.5

### DuckDB Version:

1.4.1 & 1.3.2

### DuckDB Client:

Python & CLI

### Hardware:

Apple M4 Pro 48GB

### Full Name:

Gregory Hartranft

### Affiliation:

Prefer not to say

### Did you include all relevant configuration (e.g., CPU architecture, Linux distribution) to reproduce the issue?

- [x] Yes, I have

### Did you include all code required to reproduce the issue?

- [x] Yes, I have

### Did you include all relevant data sets for reproducing the issue?

No - I cannot share the data sets because they are confidential

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parametrized queries much slower when querying parquet files #169

What happens?

To Reproduce

OS:

DuckDB Version:

DuckDB Client:

Hardware:

Full Name:

Affiliation:

Did you include all relevant configuration (e.g., CPU architecture, Linux distribution) to reproduce the issue?

Did you include all code required to reproduce the issue?

Did you include all relevant data sets for reproducing the issue?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Parametrized queries much slower when querying parquet files #169

Description

What happens?

To Reproduce

OS:

DuckDB Version:

DuckDB Client:

Hardware:

Full Name:

Affiliation:

Did you include all relevant configuration (e.g., CPU architecture, Linux distribution) to reproduce the issue?

Did you include all code required to reproduce the issue?

Did you include all relevant data sets for reproducing the issue?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions