Lexical Analyzer in C

Overview

This project is a Lexical Analyzer 'Scanner' implemented in C, designed to tokenize a simple programming language. The lexical analyzer takes a source code string as input and breaks it down into meaningful tokens such as keywords, identifiers, numbers, operators, and punctuation marks. It uses a Deterministic Finite Automaton 'DFA' to classify and process tokens effectively.

.

Features

Token Classification: Identifies various tokens like IF, ID, NUM, OP, and more.
Attribute Extraction: Extracts attributes such as numeric values and operator types.
Error Handling: Provides detailed error messages with character positions for invalid input.
Dynamic Memory Allocation: Efficiently manages memory for identifiers and other dynamic content.
Modular Design: Includes reusable components like:
- car_suivant for character reading.
- get_lexeme for lexeme extraction.
- reculer for backtracking positions.

Getting Started

Prerequisites

A C compiler 'e.g., gcc'.
Basic understanding of the C programming language and lexical analysis.

Compilation

Clone this repository:

git clone https://github.com/your-username/lexical-analyzer-c.git
cd lexical-analyzer-c

Compile the program:

gcc car_suivant.c erreur_lexicale.c get_lexeme.c reculer.c token_suivant.c main.c -o lexical_analyzer

Usage

Run the compiled program
```
./lexical_analyzer
```
Enter a source code string when prompted. For example:
```
if (x = 42) x = y 5 max;
```

The output will display a sequence of tokens and their attributes, like:

<IF, >
<PARG, >
<ID, x>
<ASSIGN, >
<NUM, 42>
<PARD, >
<ID, x>
<ASSIGN, >
<ID, y>
<OP, PLUS>
<NUM, 5>
<OP, PLUS>
<ID, max>
<PV, >
<FIN, >

Project Structure

lexical-analyzer-c/

├── main.c                # Main driver program
├── include.c             # Header and utility functions
├── README.md             # Project documentation
├── Other utility fonctions # tocken_suivant, car_suivant, reculer,get_lexeme...

Fonctionality

Token Types

* FIN: End of program.
* PV: Semicolon ';'.
* IF: Keyword if.
* ASSIGN: Assignment operator '='.
* OP: Operators ' , -'.
* PARG: Opening parenthesis '('.
* PARD: Closing parenthesis ')'.
* ID: Identifier 'e.g., variable names'.
* NUM: Numbers (e.g., 42)

Error handling

When an invalid character is encountered, the program outputs an error message indicating the position and the problematic character. For example: ```

position 4: le caractère ’*’ est illégal!

Contributing

Contributions are welcome! If you’d like to improve this project, feel free to fork the repository and submit a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments

This project serves as a learning tool for understanding lexical analysis and compiler construction. Inspired by classic DFA-based tokenization techniques.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lexical Analyzer in C

Overview

Features

Getting Started

Prerequisites

Compilation

Usage

Project Structure

Fonctionality

Token Types

Error handling

Contributing

License

Acknowledgments

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LICENSE		LICENSE
README.md		README.md
automaton.png		automaton.png
car_suivant.c		car_suivant.c
erreur_lexicale.c		erreur_lexicale.c
get_lexeme.c		get_lexeme.c
include.c		include.c
main.c		main.c
reculer.c		reculer.c
token_suivant.c		token_suivant.c

License

missipsag/LexiC-analyser

Folders and files

Latest commit

History

Repository files navigation

Lexical Analyzer in C

Overview

Features

Getting Started

Prerequisites

Compilation

Usage

Project Structure

Fonctionality

Token Types

Error handling

Contributing

License

Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages