PyCacheGen

PyCacheGen is a python package for generating data caches written in the Amaranth HDL. The generated caches can be converted into synthesizable Verilog code.

Citing

If you are using PyCacheGen for an academic publication, please cite the following paper:

Link to Paper

@InProceedings{pycachegen2025,
    author = {M{\"u}ller, Richard and L{\"u}beck, Konstantin and Kuhn, Michael and Bernardo, Paul Palomero and Bringmann, Oliver},
    title = {PyCacheGen: A Highly Configurable Open-Source Generator for Synthesizable Caches},
    booktitle = {Proceedings of the CPS Summer School PhD Workshop "Designing Cyber-Physical Systems – From Concepts to Implementation"},
    volume = {4106},
    year = {2025},
    publisher = {CEUR},
    address = {Alghero, Sardinia, Italy}
}

Feature overview

Configurable number of sets, block size, associativity, word width, address width, byte size
Supports write back and write through which can both be combined with write allocate and write no-allocate
Supports the FIFO, LRU, tree-based PLRU and MRU based PLRU replacement policies
Read and write hits can be answered within one cycle
Uses critical word first for read misses
Caches with a block size greater than 1 can take advantage of bigger backend data widths
Offers priority and round robin arbiters for generating multiple memory ports

Installation

The package can simply be installed using

python -m pip install git+https://github.com/ekut-es/pycachegen.git

To install it for development, run the following from within this repository:

python -m pip install -e .\[dev\]

Example Usage

The CacheWrapper class can generate a top-level module containing an arbitrary number of caches in a linear hierarchy. It can also generate an arbiter for creating multiple request ports and a simple data memory. All modules in the cache wrapper are optional, meaning that you can also use it for just creating a single cache, which is the recommended way for doing so. See the internal documentation of the CacheWrapper, CacheConfig and MemoryConfig classes for more information.

The cache wrapper has one memory interface (see section Memory Interface) per port. Each signal is prefixed with fe_x__, x being the respective port index. It also has one signal hit_o_x per port, indicating whether the request resulted in a hit or a miss.

Note that multiple ports get generated using an arbiter. This arbiter includes input buffers, meaning that requests can be issued even while the cache is busy from a request of another port. The arbiter also includes output buffers, meaning that requests from other ports also won't affect outgoing signals of the other ports.

example_usage.py shows an example on how to configure a CacheWrapper. It also shows how to generate the Verilog code for a module (this can be done similarly for other modules like the Cache, should you want to do so). Note that in the exported Verilog code, Amaranth (or rather Yosys) will initialize every register individually, which means also means that you will get one line per data word in both the cache store and the main memory. If you want to create large caches or a large main memory, generation would thus take long and create a very big file. Instead you should thus use your own memory modules for them. You can specify that when creating the CacheWrapper and CacheConfig objects.

Memory Interface

All modules share a common memory interface composed of the following signals:

address (In): The request address. Note that this address uses word addressing and so it does not contain byte offset bits.
write_data (In): The data to be written.
write_strobe (In): If this signal is 0 it signals a read request, else it signals a write request. It contains one bit per byte of the write data so that each byte can be selected individually.
request_valid (In): Specifies whether the request is valid.
flush (In): Used to request a write back of all data in the cache that is both valid and dirty. The flush action does not require the request_valid signal to be set. The cache will request a flush of the next level cache after it is done to ensure that the whole cache hierarchy gets flushed.
read_data (Out): The requested data. Note that this signal may not change until the next request gets sent.
read_data_valid (Out): Whether the read_data Signal holds valid data. Note that this signal may not change until the next request gets sent.
port_ready (Out): Whether the port is ready to accept a new request. The requester assumes that his request was accepted if he set request_valid (or flush) and port_ready is also set. If a target is not ready to accept a new request, it must either set this signal to 0 or it must buffer the accepted request. This port is also allowed to be used like a grant signal which only gets set to 1 if there is an incoming request.

pulp_soc integration

We've successfully integrated PyCacheGen Caches into the Pulpissimo SoC as a pure data cache. pulpissimo_cache_subsystem_wrapper.sv provides a SystemVerilog module that can be used for integrating the cache into the Pulpissimo or other SoCs based on the PULP SoC. The pulpissimo_cache_subsystem_wrapper itself uses the PulpissimoCacheSubsytem, which will itself create a CacheWrapper. You will thus need to configure the caches in the PulpissimoCacheSubsystem and then generate the Verilog code from it.

This module is designed to go into the soc_interconnect_wrap. There it will take all tcdm master ports which are part of the master_ports array (tcdm_fc_data_addr_remapped, tcdm_fc_instr, tcdm_udma_tx, tcdm_udma_rx, tcdm_debug and all axi_bridge_2_interconnect ports) as input to tcdm_in. Requests that are addressed for private bank 0 (which is where all data and nothing else should be) are then arbitrated and sent to the cache. If your linker script works differently, you might get poor performance. Requests that are not destined for private bank 0 will instead be output on the tcdm_out ports. These then need to be connected to the master_ports of the soc_interconnect (replacing the original master_ports array since that is now used as input for this module), where they'll go through a demux and the crossbars. The cache's own request interface is output as tcdm_priv0. Since all requests for private bank 0 will come from it, the contiguous crossbar should not get a rule for this address range and it also needs one less output port. This output port then has to be replaced by this module's tcdm_priv0. This is also done in the soc_interconnect_wrap: The l2_private_slaves array will still get one of its ports from the contiguous_slaves array output from the crossbar in the soc_interconnect, but the other port will be tcdm_priv0. The other contiguous_slave still has to be connected to the bootrom. Make sure you connect the ports in the correct order based on your contiguous crossbar rules and the order of the ports/memory banks in the l2 ram.

Name		Name	Last commit message	Last commit date
Latest commit History 582 Commits
.vscode		.vscode
src/pycachegen		src/pycachegen
tests		tests
.flake8		.flake8
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
example_usage.py		example_usage.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PyCacheGen

Citing

Feature overview

Installation

Example Usage

Memory Interface

pulp_soc integration

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

ekut-es/pycachegen

Folders and files

Latest commit

History

Repository files navigation

PyCacheGen

Citing

Feature overview

Installation

Example Usage

Memory Interface

pulp_soc integration

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages