Hi - human here writing this, thanks for the excellent open source data work - it's much appreciated. Came across a possible DQ issue and thought it may be helpful to flag. Disclaimer - this below could be making some assumptions about your data model that are wrong. It's also (obviouly) quite heavily AI assisted in writing the bug report. Hope it helps.
-Adam
[END HUMAN]
[START AI SLOP]
Thanks & Context
First, thank you for maintaining FinanceDatabase - it's an excellent resource! We're using the Equities dataset to look up GICS sector classifications for bond issuers in our fixed income analytics platform, and it's been tremendously helpful.
During our integration work, we noticed some data quality issues with CUSIP identifiers that we wanted to flag.
Issue
The same CUSIP identifier is sometimes assigned to completely different companies. This makes CUSIP-based lookups unreliable.
Statistics
import financedatabase as fd
equities = fd.Equities()
df = equities.select()
cusip_df = df[df['cusip'].notna()]
print(f"Total records with CUSIP: {len(cusip_df):,}") # 13,990
print(f"Unique CUSIPs: {cusip_df['cusip'].nunique():,}") # 2,459
cusip_to_names = cusip_df.groupby('cusip')['name'].nunique()
multi_name = cusip_to_names[cusip_to_names > 1]
print(f"CUSIPs with >1 company: {len(multi_name):,}") # 1,660 (67.5%)
cusip_to_sectors = cusip_df.groupby('cusip')['sector'].nunique()
multi_sector = cusip_to_sectors[cusip_to_sectors > 1]
print(f"CUSIPs with >1 sector: {len(multi_sector):,}") # 597 (24.3%)
Examples
These CUSIPs map to clearly different companies in different sectors:
| CUSIP |
Company 1 |
Sector |
Company 2 |
Sector |
| 00089H106 |
ACS, Actividades de Construccion |
Industrials |
Oakley Capital Investments |
Financials |
| 00090Q103 |
Abundance International |
Materials |
ADT Inc. |
Industrials |
| 00181T107 |
A-Mark Precious Metals |
Financials |
Amir Marketing and Investments |
Materials |
| 00182C103 |
ANI Pharmaceuticals |
Health Care |
BSF Enterprise Plc |
Financials |
| 00211Y506 |
ARCA biopharma |
Health Care |
Albioma |
Utilities |
Comparison
For context, the FIGI identifiers in the same dataset have a 0.0% collision rate (1 out of 25,103 unique FIGIs), suggesting this may be specific to how CUSIP data was sourced or aggregated.
Environment
- financedatabase: latest (tested Jan 2026)
- Python: 3.11
Thanks again for the great work on this project!
Hi - human here writing this, thanks for the excellent open source data work - it's much appreciated. Came across a possible DQ issue and thought it may be helpful to flag. Disclaimer - this below could be making some assumptions about your data model that are wrong. It's also (obviouly) quite heavily AI assisted in writing the bug report. Hope it helps.
-Adam
[END HUMAN]
[START AI SLOP]
Thanks & Context
First, thank you for maintaining FinanceDatabase - it's an excellent resource! We're using the Equities dataset to look up GICS sector classifications for bond issuers in our fixed income analytics platform, and it's been tremendously helpful.
During our integration work, we noticed some data quality issues with CUSIP identifiers that we wanted to flag.
Issue
The same CUSIP identifier is sometimes assigned to completely different companies. This makes CUSIP-based lookups unreliable.
Statistics
Examples
These CUSIPs map to clearly different companies in different sectors:
Comparison
For context, the FIGI identifiers in the same dataset have a 0.0% collision rate (1 out of 25,103 unique FIGIs), suggesting this may be specific to how CUSIP data was sourced or aggregated.
Environment
Thanks again for the great work on this project!