About us

VeloDB is a real-time data warehouse company established in January 2022, of which the founding team members are the active and core contributors from the Apache Doris open-source community. VeloDB has been largely contributing to the open-source community of Apache Doris. Meanwhile, it has developed its own commercial version based on the kernel of Apache Doris, offering enterprise out-of-the-box data warehouse services which could be realized on multi-cloud.

Website
http://www.velodb.io/
Industry
Technology, Information and Internet
Company size
51-200 employees
Headquarters
Singapore
Type
Partnership
Specialties
database, open source, Apache Doris, computing, data warehouse, big data, and Cloud-Native data warehouse

Locations

Employees at VeloDB

Updates

  • View organization page for VeloDB, graphic

    729 followers

    Our Vice President of Technology will give a talk on the Apache Doris Meetup @Singapore! Rayner (Mingyu) Chen is one of the core engineers who has been leading the technical innovations of Apache Doris. He has nurtured the project's flourishing development and community growth. If you want to understand the ins and outs of Apache Doris, Rayner is the go-to person. He will talk about the use cases and development status quo of Apache Doris and the technologies behind its fast performance in online reporting, log analysis, and data lakehousing. Come join us: https://lnkd.in/g7wCwTuv #opensource #Meetup #Singapore #database #analytics #ApacheDoris

    • No alternative text description for this image
  • VeloDB reposted this

    View organization page for Apache Doris, graphic

    2,633 followers

    Announcing the Apache Doris Meetup in Singapore on October 24 🇸🇬 🕝 Thursday, October 24, 2024 (2:30 PM to 5:05 PM SGT) 🏙️ 51 Bras Basah Road, Lazada One, Singapore Topics: 1️⃣ Apache Doris: A Real-Time Data Warehouse 2️⃣ Multi-Stream Real-Time Data Analysis Solution Based on Apache Doris 3️⃣ Apache Doris in WeChain: Application and Best Practices 4️⃣ RisingWave & Apache Doris: Simplifying Real-time Data Enrichment and Analytics Speakers: 1️⃣ Mingyu Chen, Apache Doris PMC Chair, Vice President of Technology at VeloDB 2️⃣ Boyang Chen, Apache Doris Contributor, Database Development Engineer at #Douyin Group 3️⃣ Special Guest Speaker, Data Engineer at a leading #cryptocurrency exchange 4️⃣ Liu Zhi, Product Manager at RisingWave Labs Secure your spot by RSVP and share this link with friends in #Singapore! We're expecting more meetups in more cities. Look forward to connecting face-to-face with all of you! https://lnkd.in/ga-THiNR #Meetup #opensource #database #BigData #dataengineer #DataAnalytics

    Apache Doris Meetup @Singapore, Thu, Oct 24, 2024, 2:30 PM | Meetup

    Apache Doris Meetup @Singapore, Thu, Oct 24, 2024, 2:30 PM | Meetup

    meetup.com

  • View organization page for VeloDB, graphic

    729 followers

    We are expanding our team to achieve greater things! #bigdata #programming #jobs #cloudcomputing #siliconvalley #dataengineering

    View profile for Qiyuan Wu, graphic

    Human Resource Director at VeloDB

    We're hiring! Our business team in Silicon Valley is looking for taltents for 3 positions: Markerting Director, Pre-sales Solution Architect, Database Post-Sales Engineer(SaaS). Apply now: [email protected] VeloDB is a startup founded by the core members of the Apache Doris project, which supports the data processing of over 4000 enterprises worldwide, including TikTok, Cisco, Alibaba, Tencent, and other tech giants. We are determined to build enterprise-grade warehouse software and future-oriented cloud-native data warehouse service, and now have offices in Silicon Valley, Singapore and Beijing. Apply now: [email protected]

    • No alternative text description for this image
  • View organization page for VeloDB, graphic

    729 followers

    Compute Clusters in VeloDB 🌟 💡 Question 4️⃣ 💡 How to implement access control and resource isolation? Within a VeloDB warehouse, the compute clusters are completely isolated from each other with independent computing resources. How do we prevent users from mistakenly using the wrong cluster and leading to interference? Additionally, how do we ensure that one cluster's access to shared storage does not disrupt other clusters? VeloDB ensures the orderly functioning of a multi-cluster architecture through a comprehensive access control and resource isolation mechanism: Only users assigned permissions for a specific cluster can access that cluster, thereby preventing misuse. For access to storage resources, VeloDB supports bandwidth and IOPS throttling based on cluster specifications. When the limit is exceeded, storage access requests will be queued to avoid interference between clusters.

    • No alternative text description for this image
  • View organization page for VeloDB, graphic

    729 followers

    Compute Clusters in VeloDB 🌟 💡 Question 3️⃣ 💡 How to achieve flexible caching? In a compute-storage decoupled architecture, where object storage and HDFS are typically used as remote shared storage systems, the initial I/O requests often experience slow response times. How can we ensure high performance in these situations, and furthermore, in multi-cluster scenarios? VeloDB addresses these challenges by providing a well-designed caching management mechanism: For a single compute cluster, VeloDB defaults to an LRU caching strategy. When the cache size is sufficient to store all hot data, it delivers the same performance as the compute-storage coupled architecture, with much less storage costs. Additionally, VeloDB offers manual caching control strategies, allowing users to prioritize certain tables for caching. During cluster scaling, VeloDB automatically pre-heats or migrates caches based on statistical information to ensure smooth query service despite changes. For multiple computing clusters, VeloDB provides cross-cluster cache synchronization capabilities, thus accelerating query performance. It also supports partition-level cache synchronization control. The cache of each compute cluster operates independently, allowing users to control cache size as needed. #database #cloudcompute #dataengineer

    • No alternative text description for this image
  • View organization page for VeloDB, graphic

    729 followers

    Compute Clusters in VeloDB 🌟 💡 Question 2️⃣ 💡 How to allow multiple nodes to process data writes simultaneously? After years of exploration, most relational databases adopt a read-heavy architecture, where only one cluster is allowed to write data into the shared storage. However, VeloDB enables multiple clusters to write simultaneously. VeloDB leverages a Multi-Version Concurrency Control (MVCC) mechanism and a shared metadata center for transaction coordination. Data is first submitted to multiple clusters for transformation processing, followed by distributed coordination during the metadata update phase. The cluster that obtains the lock first successfully writes, while other clusters will retry. Since the overhead of data writing primarily occurs during the transformation process, this distributed coordination mechanism and optimistic locking design allow for multi-read and multi-write capabilities, while also utilizing multiple clusters to further enhance concurrent write throughput.

    • No alternative text description for this image
  • View organization page for VeloDB, graphic

    729 followers

    Compute Clusters in VeloDB 🌟 The multi-compute cluster architecture of VeloDB is to facilitate read-write isolation and query workload isolation.  It may not appear to be challenging to implement such architecture in a cloud-native solution with decoupled computation and storage. However, from a product perspective, there are still many key aspects that require carefully crafted design. We are going to answer a series of questions around it with a few posts. 💡 Question 1️⃣ 💡 How to ensure data consistency across the compute clusters? With computation and storage decoupled, data is in shared storage accessible by multiple compute clusters. VeloDB has undergone in-depth refactoring to achieve shared metadata. After data is written into shared storage, the shared metadata is updated first, and then the data write result is returned. Other clusters will access the shared metadata center and retrieve the latest data.

    • No alternative text description for this image
  • View organization page for VeloDB, graphic

    729 followers

    VeloDB VS HBase: When to use them? VeloDB is a real-time data warehouse, while HBase is a distributed key-value store. 🏠 Use case: OLAP analytical workloads combined with KV point lookups VeloDB is a great fit while HBase lacks relevant analytical capabilities. 🏠 Use case: small to medium-scale KV point lookups Both VeloDB and HBase are good options, but VeloDB boasts a simpler and more maintainable architecture. 🏠 Use case: only PB-scale online KV point lookups HBase is a better solution. #database #dataengineering #HBase #BigData #analytics

    • No alternative text description for this image
    • No alternative text description for this image
  • View organization page for VeloDB, graphic

    729 followers

    To adapt to the changing workloads, VeloDB Cloud supports elastic scaling of compute resources. It allows users to adjust the compute resources based on peak / off-peak hours and job execution patterns. VeloDB Cloud provides both manual and time-based scaling options as well as automatic start/stop capabilities.

    • No alternative text description for this image
  • View organization page for VeloDB, graphic

    729 followers

    VeloDB's multi-compute cluster architecture arises from two typical use cases: 1️⃣ Read-write isolation: Using separate compute clusters to handle reads and writes independently, thus avoiding high write pressure from affecting query services. 2️⃣ Separation of online and offline business: In many data analysis cases, the same dataset supports the various business of the company. With a multi-compute cluster architecture, VeloDB can use isolated computing resources to serve both online and offline business needs based on a single data copy, bringing cost savings and simpler operation. In VeloDB, a data warehouse instance can contain multiple compute clusters, similar to the idea of compute queues or compute groups in a distributed system. Data is persisted in shared storage, which can be accessed by all the clusters. Each cluster is a distributed system itself, consisting of one or more BE (Backend) nodes. To accelerate data access, we have introduced caching on the local compute nodes. For example, in the architecture shown below, Warehouse 1 contains Cluster 1, Cluster 2, and Cluster 3, all of which can access the data in the shared storage.

    • No alternative text description for this image

Similar pages