[IMPROVE] Data quality: CUSIP identifier collisions across different companies

Hi - human here writing this, thanks for the excellent open source data work - it's much appreciated.  Came across a possible DQ issue and thought it may be helpful to flag.  Disclaimer - this below could be making some assumptions about your data model that are wrong.  It's also (obviouly) quite heavily AI assisted in writing the bug report.  Hope it helps.

-Adam

[END HUMAN]

[START AI SLOP]

## Thanks & Context

First, thank you for maintaining FinanceDatabase - it's an excellent resource! We're using the Equities dataset to look up GICS sector classifications for bond issuers in our fixed income analytics platform, and it's been tremendously helpful.

During our integration work, we noticed some data quality issues with CUSIP identifiers that we wanted to flag.

## Issue

The same CUSIP identifier is sometimes assigned to completely different companies. This makes CUSIP-based lookups unreliable.

## Statistics

```python
import financedatabase as fd

equities = fd.Equities()
df = equities.select()

cusip_df = df[df['cusip'].notna()]
print(f"Total records with CUSIP: {len(cusip_df):,}")        # 13,990
print(f"Unique CUSIPs: {cusip_df['cusip'].nunique():,}")     # 2,459

cusip_to_names = cusip_df.groupby('cusip')['name'].nunique()
multi_name = cusip_to_names[cusip_to_names > 1]
print(f"CUSIPs with >1 company: {len(multi_name):,}")        # 1,660 (67.5%)

cusip_to_sectors = cusip_df.groupby('cusip')['sector'].nunique()
multi_sector = cusip_to_sectors[cusip_to_sectors > 1]
print(f"CUSIPs with >1 sector: {len(multi_sector):,}")       # 597 (24.3%)
```

## Examples

These CUSIPs map to clearly different companies in different sectors:

| CUSIP | Company 1 | Sector | Company 2 | Sector |
|-------|-----------|--------|-----------|--------|
| 00089H106 | ACS, Actividades de Construccion | Industrials | Oakley Capital Investments | Financials |
| 00090Q103 | Abundance International | Materials | ADT Inc. | Industrials |
| 00181T107 | A-Mark Precious Metals | Financials | Amir Marketing and Investments | Materials |
| 00182C103 | ANI Pharmaceuticals | Health Care | BSF Enterprise Plc | Financials |
| 00211Y506 | ARCA biopharma | Health Care | Albioma | Utilities |

## Comparison

For context, the FIGI identifiers in the same dataset have a 0.0% collision rate (1 out of 25,103 unique FIGIs), suggesting this may be specific to how CUSIP data was sourced or aggregated.

## Environment

- financedatabase: latest (tested Jan 2026)
- Python: 3.11

Thanks again for the great work on this project!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[IMPROVE] Data quality: CUSIP identifier collisions across different companies #118

Thanks & Context

Issue

Statistics

Examples

Comparison

Environment

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

CUSIP	Company 1	Sector	Company 2	Sector
00089H106	ACS, Actividades de Construccion	Industrials	Oakley Capital Investments	Financials
00090Q103	Abundance International	Materials	ADT Inc.	Industrials
00181T107	A-Mark Precious Metals	Financials	Amir Marketing and Investments	Materials
00182C103	ANI Pharmaceuticals	Health Care	BSF Enterprise Plc	Financials
00211Y506	ARCA biopharma	Health Care	Albioma	Utilities

Uh oh!

[IMPROVE] Data quality: CUSIP identifier collisions across different companies #118

Description

Thanks & Context

Issue

Statistics

Examples

Comparison

Environment

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions