Skip to content

turbo9team/turbo9

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Turbo9 - A Compact & Efficient Pipelined 6809 Microprocessor IP

Turbo9 Logo

Soft release v0.9 - This repository is still under construction!


Overview

The Basics


What is the Turbo9?

The Turbo9 is a pipelined microprocessor IP written in Verilog that executes a superset of the Motorola 6809 instruction set. It is a new modern microarchitecture with 16-bit internal datapaths that balances high performance vs small area / low power. The Turbo9R with a 16-bit memory interface achieves 0.69 DMIPS/MHz which is 3.8 times faster than Motorola's original 8-bit MC6809 implementation. It is an active graduate research project at the Department of Electrical & Computer Engineering at the University of Florida

Turbo9 Microarchitecture


What are the target applications?

The target applications are SoC sub-blocks or small mixed-signal ASICs that require a compact and efficient microprocessor for programmable high-level control. There are many 32 or 64-bit RISC-V or ARM cores that try to fill this niche, but prove to be inefficient solutions given many of these applications only require 16-bit precision.

Turbo9 Target Examples


Why use the 6809 instruction set? Why not RISC?

Current industry trends are to adapt 32-bit RISC IP for microcontroller use, however their large 32x32 register files and loosely encoded instructions limit their absolute minimum footprint. So with the goal of a creating a high performance and compact microprocessor IP, we need an 16-bit instruction set architecture (ISA). Also, we want an architecture that is capable of running C code effectively. Given these requirements, the Motorola 6809 ISA stands out with its minimal number of registers (shown below), orthogonal instruction set, and powerful indexed and indirect addressing modes that map well to C concepts, such as arrays and pointers.

Turbo9 Programming Model


What is uRTL micro-op translation?

The 6809's elegant accumulator-style instruction set is simpler than many RISC ISAs. However, it is retroactively classified as a CISC architecture because it violates the RISC "load-store" principle, using memory as an operand in many instructions. Consequently, load and store operations with the 6809's advanced address modes are often required as steps prior to an ALU operation. This is not an issue for a multi-cycle implementation like the original MC6809 but poses challenges for creating an efficient and high-performance pipelined microarchitecture.

We addressed this challenge by developing uRTL, a novel toolset for systematically designing microarchitectures with hardwired micro-op translation. The uRTL methodology emphasizes direct opcode decoding from multiple synthesized Verilog blocks, in contrast to traditional microprogramming that relies on sequential decoding from a ROM. While this micro-op translation technique is common in large modern superscalar microprocessors, we have applied it to design a smaller and more efficient embedded microprocessor.

uRTL Design Flow

uRTL Inputs
rtl/urtl/turbo9_urtl.asm uRTL Microcode Assembly
rtl/urtl/turbo9_urtl.mac uRTL Macro Definitions
uRTL Outputs
rtl/turbo9_urtl_microcode.v uRTL Sequential Decode Verilog Output
rtl/turbo9_urtl_decode_pg1_AR.v uRTL Direct Decode Verilog Output (Page1 Address Register Pointer)
rtl/turbo9_urtl_decode_pg1_JTA.v uRTL Direct Decode Verilog Output (Page1 Jump Table A)
rtl/turbo9_urtl_decode_pg1_JTB.v uRTL Direct Decode Verilog Output (Page1 Jump Table B)
rtl/turbo9_urtl_decode_pg1_R1.v uRTL Direct Decode Verilog Output (Page1 Register 1 Pointer)
rtl/turbo9_urtl_decode_pg1_R2.v uRTL Direct Decode Verilog Output (Page1 Register 2 Pointer)
rtl/turbo9_urtl_decode_pg2_AR.v uRTL Direct Decode Verilog Output (Page2 Address Register Pointer)
rtl/turbo9_urtl_decode_pg2_JTA.v uRTL Direct Decode Verilog Output (Page2 Jump Table A)
rtl/turbo9_urtl_decode_pg2_JTB.v uRTL Direct Decode Verilog Output (Page2 Jump Table B)
rtl/turbo9_urtl_decode_pg2_R1.v uRTL Direct Decode Verilog Output (Page2 Register 1 Pointer)
rtl/turbo9_urtl_decode_pg2_R2.v uRTL Direct Decode Verilog Output (Page2 Register 2 Pointer)
rtl/turbo9_urtl_decode_pg3_AR.v uRTL Direct Decode Verilog Output (Page3 Address Register Pointer)
rtl/turbo9_urtl_decode_pg3_JTA.v uRTL Direct Decode Verilog Output (Page3 Jump Table A)
rtl/turbo9_urtl_decode_pg3_JTB.v uRTL Direct Decode Verilog Output (Page3 Jump Table B)
rtl/turbo9_urtl_decode_pg3_R1.v uRTL Direct Decode Verilog Output (Page3 Register 1 Pointer)
rtl/turbo9_urtl_decode_pg3_R2.v uRTL Direct Decode Verilog Output (Page3 Register 2 Pointer)
rtl/urtl/urtl_statistics.log uRTL Control Signal Statistics Report
uRTL Assembler Source Code
urtl_asm_src/ uRTL Microcode Assembler Source Code

Key Features

  • Professional Level IP

    • Modern RTL design techniques & "good practice"
      • Fully synchronous with single clock
      • Well defined separation of control and datapath
      • Separate hierarchy into smaller easier to maintain modules
      • Design for efficient synthesis into ASIC standard cell libraries & FPGAs
      • Written in Verilog 2001 for EDA tool compatibility
    • Optimized for speed, power and area
      • Design for performance, but not at the expense of power and area
      • Minimize timing paths for max clock rate
      • Implement multi-cycle to reduce area / power
  • Executes a Superset of the Motorola 6809 Instruction Set

    • Compatible with existing 6809 compilers, assemblers and code base
    • 16/32-bit multiply & divide instruction extensions
  • Modern pipelined 16-bit micro-architecture

    • Instruction prefetch stage
    • Advanced decode stage (CISC to RISC micro-op translation)
    • Single/Multi-cycle execute stage
    • Turbo9R with a 16-bit memory interface achieves 0.69 DMIPS/MHz
      • ~3.8 times faster than original 8-bit MC6809 implementation
  • Pipelined Wishbone bus

    • Public domain industry standard
    • Internal separate Program Bus & Data Bus
    • External shared Program/Data Bus
    • Adjustable pipeline stages w/ automatic latency adjustment
    • Different bus configurations available:
      • Turbo9: 8-bit shared data/program bus
      • Turbo9S: 16-bit aligned shared data/program bus
      • Turbo9R: 16-bit non-aligned shared data/program bus
      • Turbo9GTR: 16-bit non-aligned dual data & program bus
  • Custom uRTL microcode assembler

    • written in C
    • macro based assembler
    • Verilog output for efficient synthesis into gates, no ROMs
    • Statistics output
    • Unlike traditional sequential microcode, it also capable of direct parallel decoding
  • Professional Verification Testbench

    • Full self-checking Verilog testbench to verify instruction set
    • Full randomized regression capable

Presentations

YouTube


Publications


Directory Structure

asm/ Assembly code for the Turbo9
docs/ Documents
images/ Images
c_code/ C code for the Turbo9
build_gcc/ build directory for GCC
build_vbcc/ build directory for VBCC Turbo9
build_vbcc_6809/ build directory for VBCC 6809
byte_sieve_src/ BYTE Sieve source
dhrystone_src/ Dhrystone source
hello_world_src/ Hello World source
lib_gcc/ Library for GCC
lib_vbcc/ Library for VBCC
fpga/ FPGA project directory
bit_files/ .bit files for Arty A7-100T
regress/ Nightly regression run directory
rtl/ Verilog RTL for micro-architecture
urtl/ uRTL microcode for micro-architecture
sim/ Simulation run directory
tb/ Testbench & Testcases
urtl_asm_src/ uRTL microcode assembler source code

Third-Party Tools


Current Status

The current version of the Turbo9 is thoroughly verified and is capable of running C code.

Version 1.0 is in development and testing. Version 1.0 completes the interrupt system (SYNC and CWAI) and the Turbo9 instructions extensions (EDIVS & EMULS). The performance is also increased to 0.75 DMIPS/MHz! We will release it once verification is complete.


Team Members

Kevin Phillipson

Kevin Phillipson

  • Project Leader
  • Responsibilities
    • Microarchitecture design
    • RTL & Microcode development
  • 15 years of industry experience in ASIC design
  • Bachelor's Degree in Electrical Engineering from University of Florida 2008
  • Master's Degree in Electrical Engineering from University of Florida in 2022
  • Currently pursuing a PhD from University of Florida
  • Master's Thesis: A Compact & Efficient Microprocessor IP for SoC Sub-Blocks and Mixed-Signal ASICs

Michael Rywalt

Michael Rywalt

  • Principal Contributor
  • Responsibilities
    • Custom uRTL microcode assembler
    • Verification & Tools
  • 15 years of industry experience in ASIC design
  • Bachelor's Degree of science in Computer Science and software Engineering from Florida Institute of Technology 2008
  • Currently pursuing a Master's Degree in Electrical Engineering from University of Florida
  • Master's Thesis: Verification of a compact & efficient microprocessor IP

Faculty

Dr. Greg Stitt

Dr. Greg Stitt

  • Associate Professor
  • NSF Center for Space, High-Performance, and Resilient Computing (SHREC)
  • Research interests: Embedded systems with an emphasis in synthesis, compilers, reconfigurable computing, hardware/software co-design
  • Website: www.gstitt.ece.ufl.edu

Dr. Eric M. Schwartz

Dr. Eric M. Schwartz

  • Instructional Professor
  • Machine Intelligence Laboratory Director
  • Research interests: Robotics, embedded systems, controls, autonomous mobile agents
  • Website: mil.ufl.edu/ems

Dr. Martin Margala

Dr. Martin Margala

  • Director of School of Computing and Informatics - University of Louisiana Lafayette
  • Academia: Former Professor and Chair of the Electrical and Computer Engineering Department at the University of Massachusetts Lowell
  • Website: people.cmix.louisiana.edu/margala/

Contact

You may contact us at team[at]turbo9[dot]org. Thank you!