Project Overview: Welcome to the technical assessment for the Data Engineer position. This challenge is designed to evaluate your expertise in integrating, processing, and storing real-time data using advanced streaming technologies and database systems. The task involves capturing, processing, and storing real-time traffic data from open sources like the Directorate General of Traffic (DGT) using a combination of PostgreSQL and MongoDB.
Objective: Develop a scalable and efficient system capable of handling high-volume real-time data streams, transforming the data as necessary, and storing it in both PostgreSQL and MongoDB. The solution should be deployable on AWS, utilizing services such as AWS Kinesis, AWS RDS, and AWS DocumentDB or MongoDB Atlas.
Technologies Suggested:
- Data Streaming: Apache Kafka, Apache Flink, Amazon Kinesis, or equivalent.
- Databases: PostgreSQL and MongoDB.
- Cloud Platform: Amazon Web Services (AWS).
Solution Development:
- Data Integration: Set up a data ingestion system to consume real-time information.
- Data Processing: Use stream processing tools to manage data flows and prepare them for storage.
- Data Storage: Configure PostgreSQL and MongoDB for data storage, ensuring efficiency and performance.
- Optional - Monitoring and Operations: Implement monitoring solutions using tools like AWS CloudWatch or Grafana.
- Optional - Data Governance: Consider establishing data governance practices to maintain data quality and compliance.
How to Submit Your Solution:
- Fork this repository.
- Create a new branch named after your GitHub username.
- Develop your solution, ensuring all code is well-documented and includes clear setup instructions.
- Submit your solution via a pull request to this repository by July 20.
Evaluation Criteria:
- Adherence to the functional and technical requirements.
- Clarity, efficiency, and organization of the code.
- Scalability and maintainability of the system.
- Documentation quality and presentation skills.
Additional Considerations:
- Continuous Integration/Continuous Deployment (CI/CD) practices are encouraged.
- Security and data protection measures should be addressed in your design.
Resources:
Notes: This assessment aims to test your technical capabilities and your ability to communicate complex solutions effectively. While monitoring and data governance are valued, they are not mandatory for this task. We are interested in your problem-solving approaches and technical proficiency in real-time data engineering scenarios.