Our Vice President of Technology will give a talk on the Apache Doris Meetup @Singapore! Rayner (Mingyu) Chen is one of the core engineers who has been leading the technical innovations of Apache Doris. He has nurtured the project's flourishing development and community growth. If you want to understand the ins and outs of Apache Doris, Rayner is the go-to person. He will talk about the use cases and development status quo of Apache Doris and the technologies behind its fast performance in online reporting, log analysis, and data lakehousing. Come join us: https://lnkd.in/g7wCwTuv #opensource #Meetup #Singapore #database #analytics #ApacheDoris
About us
VeloDB is a real-time data warehouse company established in January 2022, of which the founding team members are the active and core contributors from the Apache Doris open-source community. VeloDB has been largely contributing to the open-source community of Apache Doris. Meanwhile, it has developed its own commercial version based on the kernel of Apache Doris, offering enterprise out-of-the-box data warehouse services which could be realized on multi-cloud.
- Website
-
http://www.velodb.io/
External link for VeloDB
- Industry
- Technology, Information and Internet
- Company size
- 51-200 employees
- Headquarters
- Singapore
- Type
- Partnership
- Specialties
- database, open source, Apache Doris, computing, data warehouse, big data, and Cloud-Native data warehouse
Locations
-
Primary
PAYA LEBAR ROAD
Singapore, SG
Employees at VeloDB
Updates
-
VeloDB reposted this
Announcing the Apache Doris Meetup in Singapore on October 24 🇸🇬 🕝 Thursday, October 24, 2024 (2:30 PM to 5:05 PM SGT) 🏙️ 51 Bras Basah Road, Lazada One, Singapore Topics: 1️⃣ Apache Doris: A Real-Time Data Warehouse 2️⃣ Multi-Stream Real-Time Data Analysis Solution Based on Apache Doris 3️⃣ Apache Doris in WeChain: Application and Best Practices 4️⃣ RisingWave & Apache Doris: Simplifying Real-time Data Enrichment and Analytics Speakers: 1️⃣ Mingyu Chen, Apache Doris PMC Chair, Vice President of Technology at VeloDB 2️⃣ Boyang Chen, Apache Doris Contributor, Database Development Engineer at #Douyin Group 3️⃣ Special Guest Speaker, Data Engineer at a leading #cryptocurrency exchange 4️⃣ Liu Zhi, Product Manager at RisingWave Labs Secure your spot by RSVP and share this link with friends in #Singapore! We're expecting more meetups in more cities. Look forward to connecting face-to-face with all of you! https://lnkd.in/ga-THiNR #Meetup #opensource #database #BigData #dataengineer #DataAnalytics
-
We are expanding our team to achieve greater things! #bigdata #programming #jobs #cloudcomputing #siliconvalley #dataengineering
We're hiring! Our business team in Silicon Valley is looking for taltents for 3 positions: Markerting Director, Pre-sales Solution Architect, Database Post-Sales Engineer(SaaS). Apply now: [email protected] VeloDB is a startup founded by the core members of the Apache Doris project, which supports the data processing of over 4000 enterprises worldwide, including TikTok, Cisco, Alibaba, Tencent, and other tech giants. We are determined to build enterprise-grade warehouse software and future-oriented cloud-native data warehouse service, and now have offices in Silicon Valley, Singapore and Beijing. Apply now: [email protected]
-
Compute Clusters in VeloDB 🌟 💡 Question 4️⃣ 💡 How to implement access control and resource isolation? Within a VeloDB warehouse, the compute clusters are completely isolated from each other with independent computing resources. How do we prevent users from mistakenly using the wrong cluster and leading to interference? Additionally, how do we ensure that one cluster's access to shared storage does not disrupt other clusters? VeloDB ensures the orderly functioning of a multi-cluster architecture through a comprehensive access control and resource isolation mechanism: Only users assigned permissions for a specific cluster can access that cluster, thereby preventing misuse. For access to storage resources, VeloDB supports bandwidth and IOPS throttling based on cluster specifications. When the limit is exceeded, storage access requests will be queued to avoid interference between clusters.
-
Compute Clusters in VeloDB 🌟 💡 Question 3️⃣ 💡 How to achieve flexible caching? In a compute-storage decoupled architecture, where object storage and HDFS are typically used as remote shared storage systems, the initial I/O requests often experience slow response times. How can we ensure high performance in these situations, and furthermore, in multi-cluster scenarios? VeloDB addresses these challenges by providing a well-designed caching management mechanism: For a single compute cluster, VeloDB defaults to an LRU caching strategy. When the cache size is sufficient to store all hot data, it delivers the same performance as the compute-storage coupled architecture, with much less storage costs. Additionally, VeloDB offers manual caching control strategies, allowing users to prioritize certain tables for caching. During cluster scaling, VeloDB automatically pre-heats or migrates caches based on statistical information to ensure smooth query service despite changes. For multiple computing clusters, VeloDB provides cross-cluster cache synchronization capabilities, thus accelerating query performance. It also supports partition-level cache synchronization control. The cache of each compute cluster operates independently, allowing users to control cache size as needed. #database #cloudcompute #dataengineer
-
Compute Clusters in VeloDB 🌟 💡 Question 2️⃣ 💡 How to allow multiple nodes to process data writes simultaneously? After years of exploration, most relational databases adopt a read-heavy architecture, where only one cluster is allowed to write data into the shared storage. However, VeloDB enables multiple clusters to write simultaneously. VeloDB leverages a Multi-Version Concurrency Control (MVCC) mechanism and a shared metadata center for transaction coordination. Data is first submitted to multiple clusters for transformation processing, followed by distributed coordination during the metadata update phase. The cluster that obtains the lock first successfully writes, while other clusters will retry. Since the overhead of data writing primarily occurs during the transformation process, this distributed coordination mechanism and optimistic locking design allow for multi-read and multi-write capabilities, while also utilizing multiple clusters to further enhance concurrent write throughput.
-
Compute Clusters in VeloDB 🌟 The multi-compute cluster architecture of VeloDB is to facilitate read-write isolation and query workload isolation. It may not appear to be challenging to implement such architecture in a cloud-native solution with decoupled computation and storage. However, from a product perspective, there are still many key aspects that require carefully crafted design. We are going to answer a series of questions around it with a few posts. 💡 Question 1️⃣ 💡 How to ensure data consistency across the compute clusters? With computation and storage decoupled, data is in shared storage accessible by multiple compute clusters. VeloDB has undergone in-depth refactoring to achieve shared metadata. After data is written into shared storage, the shared metadata is updated first, and then the data write result is returned. Other clusters will access the shared metadata center and retrieve the latest data.
-
VeloDB VS HBase: When to use them? VeloDB is a real-time data warehouse, while HBase is a distributed key-value store. 🏠 Use case: OLAP analytical workloads combined with KV point lookups VeloDB is a great fit while HBase lacks relevant analytical capabilities. 🏠 Use case: small to medium-scale KV point lookups Both VeloDB and HBase are good options, but VeloDB boasts a simpler and more maintainable architecture. 🏠 Use case: only PB-scale online KV point lookups HBase is a better solution. #database #dataengineering #HBase #BigData #analytics
-
To adapt to the changing workloads, VeloDB Cloud supports elastic scaling of compute resources. It allows users to adjust the compute resources based on peak / off-peak hours and job execution patterns. VeloDB Cloud provides both manual and time-based scaling options as well as automatic start/stop capabilities.
-
VeloDB's multi-compute cluster architecture arises from two typical use cases: 1️⃣ Read-write isolation: Using separate compute clusters to handle reads and writes independently, thus avoiding high write pressure from affecting query services. 2️⃣ Separation of online and offline business: In many data analysis cases, the same dataset supports the various business of the company. With a multi-compute cluster architecture, VeloDB can use isolated computing resources to serve both online and offline business needs based on a single data copy, bringing cost savings and simpler operation. In VeloDB, a data warehouse instance can contain multiple compute clusters, similar to the idea of compute queues or compute groups in a distributed system. Data is persisted in shared storage, which can be accessed by all the clusters. Each cluster is a distributed system itself, consisting of one or more BE (Backend) nodes. To accelerate data access, we have introduced caching on the local compute nodes. For example, in the architecture shown below, Warehouse 1 contains Cluster 1, Cluster 2, and Cluster 3, all of which can access the data in the shared storage.