Skip to content

Development and simulation framework for Application Specific Vector Processor

License

Notifications You must be signed in to change notification settings

timurkelin/simsimd

Repository files navigation

simSIMD

Development and Simulation Framework for Application Specific Vector Processor.


Abstract: ???

Keywords: Vector Processor Architecture, SIMD, Single Instruction Multiple Data, ASIP, Application Specific Instruction Processor, NoC, Network on Chip, Data Driven Design, Cycle Accurate Simulation, System Simulation, Model Based Development and Optimization, SystemC

Overview

  • A framework for application specific vector processor with exemplar building blocks

  • Simulation software with exemplar building blocks and application examples

    • This simplifies the architecture development of the hardware which performs operations on the data structures of finite length (vectors)
  • Typical applications and their data vectors

    • Signal processing: OFDM symbols and/or code blocks
    • Image and Video processing
    • Cryptography: cipher blocks
    • Networking: data packets
  • Framework also provides unified approach for the development of the functional components: Execution Units (EU) and Data Memories (DM) as well as for their interconnect and interoperation

    • Specific set and configuration of the functional units are identified by the target application
  • Development and simulation methodology of the hardware architecture from system specifications (algorithms) and throughput requirements

  • Follow top-down development and optimization strategy

    • Area: improvement of the utilization rate of the processing blocks and storage elements (RAMs)
    • Throughput: datapath is elaborated at the architecture development stage
    • Dynamic Power
    • Minimization non-recurring engineering (NRE) resources and risks
  • Benefits of using SystemC

    • Short update-simulation-analysis cycle which allows for simulation driven development and optimization of the architecture
    • Cycle accurate simulation for the vector core to obtain realistic timing and throughput estimates at the earlier stages
    • High level of abstraction for Vector Core preferences, runtime configuration and status to stay focused on the architectural tasks
    • Large ecosystem of C/C libraries for data manipulation and processing allows for top-down development approach: from high level processing functions down to elementary arithmetic operations
    • Integration into the existing simulation workflows
    • Open source

Block Diagram of the Simulation Platform

block_diagram

Components in Brief

  • 2 major parts: Vector Core and Scalar or Control Core

  • Vector Core functions

    • Perform vector computations in accordance to the configuration supplied from the Scalar Core
    • Generate events at the different stages of the execution of the vector operations
      • Events are delivered to the Scalar Core
      • Allow for synchronization of the control threads inside Scalar Core with the Vector Core
      • Update of the statuses of the vector operations
      • Allow for data dependent processing
    • Sequencing of the vector operations
  • Vector Core components

    • Data Memories (DM): temporary storage of the vectors.
      • A set of Address Generators (AG) associated with each DM allows for flexible addressing and fetching the elements of the vector.
    • Execution Units (EU): perform successive processing of the elements of the vectors.
      • These parts of the Vector Core is specific for target application
    • Streaming Devices: interfacing Vector Core with the external devices
      • ADC (source) or DAC (sink)
      • Preceding or subsequent processing blocks
      • Interface to the external storage (DMA)
      • In simulation Streaming I/O Devices address common Data and Memory Pool
    • XBAR: Network-On-Chip (NOC) for routing the vector streams between Execution Units (EU), Data Memories (DM) and/or Streaming Devices
      • DMs, EUs and Streaming Devices have a unified interface for the connection to XBAR
  • Functions of the Scalar Infrastructure

    • Respond to the events and status data from the Vector Core components
    • Control the execution flow in the Vector Core
    • Synchronize vector processing threads
    • Generate and deliver configuration data to the Vector Core components
  • Scalar Infrastructure components

    • Scalar Core: processes the events and statuses from Vector Core, and generates configurations for the components of the Vector Core.
      • It can be implemented as a programmable general purpose CPU subsystem or an FSM depending on the complexity of the control procedures
    • Event MUX: delivers events which were generated by the Vector Core components to the Scalar Core
    • Config De-multiplexer: delivers commands and data supplied from to the Scalar Core to the Vector Core components.
      • Broadcasting commands are supported for the execution control
    • Status MUX: delivers the status data, which was sent by the Vector Core components in response to the request commands from the Scalar Core, to the Scalar Core

Major development steps

  • Decompose processing algorithm down to the level of vector transactions between functional blocks and storage elements
  • Develop functional blocks which have specified interface with XBAR and Scalar Infrastructure
  • Assemble the functional blocks and storage elements with the Core framework
  • Develop Scalar CPU SW and/or Control FSM

For more details and application examples please refer to doc/simsimd.pptx

Directories:

doc          	- documentation
examples     	- application examples
	 - basic		- basic operation
		 - dm2dm	- data memory to data memory transaction (DM test)
		 - dm_init	- dm block initialization from file 
				  (DM initialization test. matio integration and basic operation)
		 - vri_test	- test for the valid-ready interface (integrity and connectivity test)
		 - transp       - transparent EU test. Examines different approaches to the EU implementation
		 - adder  	- chain of 2 vector adders A B=C C D=E (EU with multiple inputs, EU chaining)
		 - adder_stream	- vector adder with stream I/O (test for the streaming interface)
		 - adder_cache  - vector adder with cache-like operation (example of the basic system design)
	 - comp			- computational examples
		 - fft-r4	- R4/2R2 fft
		 - sort		- Bitonic sorter
	 - system		- system design examples		
simd_sys_crm    - System component: clock and reset source for the execution core
simd_sys_core	- System component: simd execution core (interconnect and signals)
simd_sys_dm     - System component: data memory units  
simd_sys_eu     - System component: execution units  
simd_sys_pool   - System component: segmented memory pool  
simd_sys_scalar - System component: control unit (FSM or scalar processor)  
simd_sys_stream - System component: Streaming interfaces (timed or on-demand)   
simd_sys_xbar   - System component: cross-bar switch / NoC

simd_common  	- top-level functions  
simd_pref    	- simulation and system preferences  
simd_dump    	- data dump   
simd_report  	- logging and reporting functions  
simd_systemc 	- Files taken from SystemC sources  
simd_trace   	- waveform trace  
simd_time    	- simulation time and resolution  

Prerequisites:
GCC (4.8.5)
cmake (3.16)
make (3.82)
SystemC (2.3.3)
Boost (1.68.0)
matIO (1.5.16)
gtkwave (3.3.95) or other VCD viewer

Environment:

export  CC=$(command -v gcc)
export GCC=$(command -v gcc)
export CXX=$(command -v g  )
$BOOST_HOME 	contains Boost   installation path
$MATIO_HOME 	contains matIO   installation path
$SYSTEMC_HOME	contains SystemC installation path

Quick start:

Skim through the slides:
doc/simsimd.pptx

Build:
$ mkdir build
$ cd build
$ cmake -DCMAKE_BUILD_TYPE=Release -DEXAMPLE=basic/vri_test ..
$ make all

Run:
cd ..
./build/simsimd

Inspect the result:
In gtkwave File->Open New Window->trace.vcd