Apache Doris

Apache Doris

Software Development

San Francisco, California 2,633 followers

Apache Doris is an open-source real-time data warehouse based on MPP architecture.

About us

Apache Doris is an open-source real-time data warehouse based on MPP architecture, known for its fast speed and ease of use. It supports real-time data ingestion and real-time query response in both high-concurrency point query and high-throughput analysis scenarios. With it, users can process and analyze large datasets in the blink of an eye. In June 2022, Apache Doris became a full-fledged, top-level project incubated by ASF. It accumulated nearly 600 contributors and more than 20,000 developers are using Apache Doris today. Doris is also used in production within over 2000 companies around the world, trusted by business giants such as AWS, Fuse, JD.com, Lenovo, OPPO, Shoppe, TikTok, Tencent, Vivo, Xiaomi and etc. We welcome more open source technology enthusiasts to join the Apache Doris community and together discover infinite possibilities! Learn more about Apache Doris on Github: https://github.com/apache/doris Join the Apache Doris community on Slack: https://join.slack.com/t/apachedoriscommunity/shared_invite/zt-2gmq5o30h-455W226d79zP3L96ZhXIoQ

Website
https://doris.apache.org/
Industry
Software Development
Company size
201-500 employees
Headquarters
San Francisco, California
Type
Nonprofit
Founded
2018

Locations

Employees at Apache Doris

Updates

  • View organization page for Apache Doris, graphic

    2,633 followers

    Announcing the Apache Doris Meetup in Singapore on October 24 🇸🇬 🕝 Thursday, October 24, 2024 (2:30 PM to 5:05 PM SGT) 🏙️ 51 Bras Basah Road, Lazada One, Singapore Topics: 1️⃣ Apache Doris: A Real-Time Data Warehouse 2️⃣ Multi-Stream Real-Time Data Analysis Solution Based on Apache Doris 3️⃣ Apache Doris in WeChain: Application and Best Practices 4️⃣ RisingWave & Apache Doris: Simplifying Real-time Data Enrichment and Analytics Speakers: 1️⃣ Mingyu Chen, Apache Doris PMC Chair, Vice President of Technology at VeloDB 2️⃣ Boyang Chen, Apache Doris Contributor, Database Development Engineer at #Douyin Group 3️⃣ Special Guest Speaker, Data Engineer at a leading #cryptocurrency exchange 4️⃣ Liu Zhi, Product Manager at RisingWave Labs Secure your spot by RSVP and share this link with friends in #Singapore! We're expecting more meetups in more cities. Look forward to connecting face-to-face with all of you! https://lnkd.in/ga-THiNR #Meetup #opensource #database #BigData #dataengineer #DataAnalytics

    Apache Doris Meetup @Singapore, Thu, Oct 24, 2024, 2:30 PM | Meetup

    Apache Doris Meetup @Singapore, Thu, Oct 24, 2024, 2:30 PM | Meetup

    meetup.com

  • View organization page for Apache Doris, graphic

    2,633 followers

    If you want to learn more about Apache Doris, you should definitely meet with the Apache Doris PMC Chair. See you in Singapore on October 24!

    View organization page for VeloDB, graphic

    729 followers

    Our Vice President of Technology will give a talk on the Apache Doris Meetup @Singapore! Rayner (Mingyu) Chen is one of the core engineers who has been leading the technical innovations of Apache Doris. He has nurtured the project's flourishing development and community growth. If you want to understand the ins and outs of Apache Doris, Rayner is the go-to person. He will talk about the use cases and development status quo of Apache Doris and the technologies behind its fast performance in online reporting, log analysis, and data lakehousing. Come join us: https://lnkd.in/g7wCwTuv #opensource #Meetup #Singapore #database #analytics #ApacheDoris

    • No alternative text description for this image
  • View organization page for Apache Doris, graphic

    2,633 followers

    Looking forward to reading more about it!

    View profile for Kamil Borowski, graphic

    MLOps AI Infrastructure K8s Microservices Cloud Product Design App Development and last but not least GPT Integration

    🔍 Tackling Data Bottlenecks in Modern Business with Apache Doris 🔍 In recent months, we’ve gathered extensive insights into the challenges that companies face when dealing with large datasets, data tagging, and seamless data lake integration. Apache Doris stands out as a solution by enabling: ✅ Faster Reporting: Real-time data analysis ensures decision-makers have the insights they need, when they need them. ✅ Scalable Data Tagging: Simplifies management of vast datasets, improving efficiency. ✅ Seamless Data Lake Integration: Unlocks the full potential of data lakes, providing quicker and more accurate insights. In some cases, we’ve managed projects with monthly budgets of $150k and more. By leveraging Doris, we've significantly optimized costs compared to standard on-demand AWS pricing, making it a powerful and scalable solution while keeping expenses under control. Below is a visual representation of how Apache Doris optimizes the entire process of tag data management and improves efficiency, moving from JSON merging with Elasticsearch to Doris’ more efficient key models. Very soon, we’ll be sharing a detailed case study that shows how Apache Doris helped us tackle real-world data challenges, so stay tuned! :) #DataAnalytics #ApacheDoris #DataLake #DataReporting #Efficiency #DevOps #devopsbay

    • No alternative text description for this image
  • Apache Doris reposted this

    View profile for Javier Ariza Batalloso, graphic

    Data architect

    ¿𝗖ó𝗺𝗼 𝗽𝗼𝗱𝗲𝗺𝗼𝘀 𝗺𝗶𝗴𝗿𝗮𝗿 𝘂𝗻𝗮 𝗮𝗿𝗾𝘂𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗮 𝗯𝗮𝘀𝗮𝗱𝗮 𝗲𝗻 𝗞𝘂𝗱𝘂 𝗲 𝗜𝗺𝗽𝗮𝗹𝗮, 𝗲𝗻 𝘂𝗻𝗮 𝗮𝗿𝗾𝘂𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗮 𝗯𝗮𝘀𝗮𝗱𝗮 𝗲𝗻 𝗔𝗽𝗮𝗰𝗵𝗲 𝗗𝗼𝗿𝗶𝘀? 🔥 🔥 Apache Doris 𝗲𝘀 𝘂𝗻 𝗮𝗹𝗺𝗮𝗰é𝗻 𝗱𝗲 𝗱𝗮𝘁𝗼𝘀 𝗾𝘂𝗲 𝗽𝗲𝗿𝗺𝗶𝘁𝗲 𝗵𝗮𝗰𝗲𝗿 𝗮𝗻𝗮𝗹í𝘁𝗶𝗰𝗮 𝗲𝗻 𝘁𝗶𝗲𝗺𝗽𝗼 𝗿𝗲𝗮𝗹 𝘆 𝗲𝘀𝘁á 𝗯𝗮𝘀𝗮𝗱𝗼 𝗲𝗻 𝘂𝗻𝗮 𝗮𝗿𝗾𝘂𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗮 𝗠𝗣𝗣. Devuelve resultados de 𝗰𝗼𝗻𝘀𝘂𝗹𝘁𝗮𝘀 𝗰𝗼𝗻 𝗺𝘂𝘆 𝗯𝗮𝗷𝗮 𝗹𝗮𝘁𝗲𝗻𝗰𝗶𝗮, 𝘆 𝗽𝘂𝗲𝗱𝗲 𝘀𝗼𝗽𝗼𝗿𝘁𝗮𝗿 𝗻𝗼 𝘀ó𝗹𝗼 𝗲𝘀𝗰𝗲𝗻𝗮𝗿𝗶𝗼𝘀 𝗱𝗲 𝗰𝗼𝗻𝘀𝘂𝗹𝘁𝗮𝘀 𝗽𝘂𝗻𝘁𝘂𝗮𝗹𝗲𝘀 𝗰𝗼𝗻 𝗮𝗹𝘁𝗮 𝗰𝗼𝗻𝗰𝘂𝗿𝗿𝗲𝗻𝗰𝗶𝗮, 𝘀𝗶𝗻𝗼 𝘁𝗮𝗺𝗯𝗶é𝗻 𝗲𝘀𝗰𝗲𝗻𝗮𝗿𝗶𝗼𝘀 𝗱𝗲 𝗮𝗻á𝗹𝗶𝘀𝗶𝘀 𝗰𝗼𝗺𝗽𝗹𝗲𝗷𝗼𝘀 𝗰𝗼𝗻 𝘂𝗻 𝗮𝗹𝘁𝗼 𝗿𝗲𝗻𝗱𝗶𝗺𝗶𝗲𝗻𝘁𝗼. Puede usarse en: Análisis de Reporting, Consultas Ad Hoc, Consultas Federadas... 𝗔𝗿𝗾𝘂𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗮 𝗱𝗲 𝗔𝗽𝗮𝗰𝗵𝗲 𝗗𝗼𝗿𝗶𝘀 ☝𝗙𝗿𝗼𝗻𝘁𝗲𝗻𝗱 (𝗙𝗘): Peticiones de usuarios, análisis sintáctico, planificación de las consultas, gestión de metadatos, etc. ✌𝗕𝗮𝗰𝗸𝗲𝗻𝗱 (𝗕𝗘): Se encarga del almacenamiento de datos y de la ejecución de los planes de consulta. Ambos tipos de procesos son escalables horizontalmente y garantizan una alta disponibilidad de los servicios y una alta fiabilidad de los datos mediante protocolos de consistencia. El motor de almacenamiento se basa en almacenamiento columnar. 𝗠𝗼𝗱𝗲𝗹𝗼𝘀 𝗱𝗲 𝗱𝗮𝘁𝗼𝘀 🔘𝗗𝘂𝗽𝗹𝗶𝗰𝗮𝘁𝗲 𝗞𝗲𝘆 𝗺𝗼𝗱𝗲𝗹: Utilizado con consultas ad-hoc. 🔘𝗨𝗻𝗶𝗾𝘂𝗲 𝗞𝗲𝘆 𝗺𝗼𝗱𝗲𝗹: Para restricciones de unicidad en los datos. Permite deduplicación, upserts multi-stream y actualizaciones parciales de columnas. 🔘𝗔𝗴𝗴𝗿𝗲𝗴𝗮𝘁𝗲 𝗞𝗲𝘆 𝗺𝗼𝗱𝗲𝗹: Para informes de datos preagregados. ¿𝗣𝗼𝗿 𝗾𝘂é 𝗔𝗽𝗮𝗰𝗵𝗲 𝗗𝗼𝗿𝗶𝘀 𝗲𝘀 𝘂𝗻𝗮 𝗯𝘂𝗲𝗻𝗮 𝗮𝗹𝘁𝗲𝗿𝗻𝗮𝘁𝗶𝘃𝗮 𝗮 𝗞𝘂𝗱𝘂 𝗲 𝗜𝗺𝗽𝗮𝗹𝗮? ✅ 𝗙á𝗰𝗶𝗹 𝗱𝗲 𝘂𝘀𝗮𝗿: SQL ✅ 𝗛𝗶𝗴𝗵 𝗣𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲: Baja latencia Alto rendimiento Motor de consulta vectorizado Vistas materializadas Índices. ✅ 𝗨𝗻𝗶𝗳𝗶𝗰𝗮𝗰𝗶ó𝗻: Datos en Real-Time, analizar datos interactivos y procesar datos offline. ✅ 𝗖𝗼𝗻𝘀𝘂𝗹𝘁𝗮𝘀 𝗙𝗲𝗱𝗲𝗿𝗮𝗱𝗮𝘀 ✅ 𝗘𝗰𝗼𝘀𝗶𝘀𝘁𝗲𝗺𝗮 𝗰𝗿𝗲𝗰𝗶𝗲𝗻𝘁𝗲: Spark (r/w), Flink (r/w), Kafka, DBT, BI ✅ 𝗔𝗻á𝗹𝗶𝘀𝗶𝘀 𝗱𝗲 𝗰𝗼𝗻𝘀𝘂𝗹𝘁𝗮𝘀: Muchas optimizaciones para realizar joins con tablas anchas, en comparación a Kudu que presenta un performance peor. ✅ 𝗧𝗶𝗲𝗺𝗽𝗼𝘀 𝗱𝗲 𝗰𝗼𝗻𝘀𝘂𝗹𝘁𝗮: Menores tiempos de respuesta en las consultas respecto a Kudu. ✅ 𝗗𝗮𝘁𝗮: Hot Data en Apache Doris, y Cold Data en Minio y Apache Hudi. ⭐ 𝗥𝗲𝗽𝗼𝘀𝗶𝘁𝗼𝗿𝗶𝗼: https://lnkd.in/dt3M7x8J ❗ 𝗖𝗼𝗺𝗽𝗮𝗿𝗮𝘁𝗶𝘃𝗮 de Kudu Impala y Apache Doris en la imagen 𝗥𝗲𝗳𝗲𝗿𝗲𝗻𝗰𝗶𝗮𝘀 🔘 Real-time data warehouse in TikTok based on Apache Doris: https://lnkd.in/dU33R5yN 🔘 ClickHouse & Kudu to Doris: 10X concurrency increased, 70% latency down: https://lnkd.in/ddMA78UN 🔘 The Efficiency of the data warehouse greatly improved in LY Digital: https://lnkd.in/dBYQSa29

    • No alternative text description for this image
    • No alternative text description for this image
    • No alternative text description for this image
    • No alternative text description for this image
    • No alternative text description for this image
  • Apache Doris reposted this

    View organization page for VeloDB, graphic

    729 followers

    Compute Clusters in VeloDB 🌟 💡 Question 4️⃣ 💡 How to implement access control and resource isolation? Within a VeloDB warehouse, the compute clusters are completely isolated from each other with independent computing resources. How do we prevent users from mistakenly using the wrong cluster and leading to interference? Additionally, how do we ensure that one cluster's access to shared storage does not disrupt other clusters? VeloDB ensures the orderly functioning of a multi-cluster architecture through a comprehensive access control and resource isolation mechanism: Only users assigned permissions for a specific cluster can access that cluster, thereby preventing misuse. For access to storage resources, VeloDB supports bandwidth and IOPS throttling based on cluster specifications. When the limit is exceeded, storage access requests will be queued to avoid interference between clusters.

    • No alternative text description for this image
  • Apache Doris reposted this

    View organization page for VeloDB, graphic

    729 followers

    Compute Clusters in VeloDB 🌟 💡 Question 3️⃣ 💡 How to achieve flexible caching? In a compute-storage decoupled architecture, where object storage and HDFS are typically used as remote shared storage systems, the initial I/O requests often experience slow response times. How can we ensure high performance in these situations, and furthermore, in multi-cluster scenarios? VeloDB addresses these challenges by providing a well-designed caching management mechanism: For a single compute cluster, VeloDB defaults to an LRU caching strategy. When the cache size is sufficient to store all hot data, it delivers the same performance as the compute-storage coupled architecture, with much less storage costs. Additionally, VeloDB offers manual caching control strategies, allowing users to prioritize certain tables for caching. During cluster scaling, VeloDB automatically pre-heats or migrates caches based on statistical information to ensure smooth query service despite changes. For multiple computing clusters, VeloDB provides cross-cluster cache synchronization capabilities, thus accelerating query performance. It also supports partition-level cache synchronization control. The cache of each compute cluster operates independently, allowing users to control cache size as needed. #database #cloudcompute #dataengineer

    • No alternative text description for this image
  • View organization page for Apache Doris, graphic

    2,633 followers

    Real-time data processing is more challenging than offline batch processing because it involves complicated operations like multi-stream JOINs and dimension table changes. It requires a higher level of development and maintenance input, and due to the need for system stability guarantee, it often leads to resource redundancy and waste. We are excited to invite the data platform team of TikTok to talk about how they use Apache Doris in their real-time data architecture and how they benefit from it, which could serve as a model for effective real-time data warehousing. https://lnkd.in/gRRpqvkg TikTok also has some job openings for engineers familiar with Apache Doris: https://lnkd.in/gdjxFbmw #TikTok #dataplatform #realtime #dataprocessing #datawarehouse #livestream #ecommerce

    • No alternative text description for this image
  • Apache Doris reposted this

    View organization page for VeloDB, graphic

    729 followers

    Compute Clusters in VeloDB 🌟 💡 Question 2️⃣ 💡 How to allow multiple nodes to process data writes simultaneously? After years of exploration, most relational databases adopt a read-heavy architecture, where only one cluster is allowed to write data into the shared storage. However, VeloDB enables multiple clusters to write simultaneously. VeloDB leverages a Multi-Version Concurrency Control (MVCC) mechanism and a shared metadata center for transaction coordination. Data is first submitted to multiple clusters for transformation processing, followed by distributed coordination during the metadata update phase. The cluster that obtains the lock first successfully writes, while other clusters will retry. Since the overhead of data writing primarily occurs during the transformation process, this distributed coordination mechanism and optimistic locking design allow for multi-read and multi-write capabilities, while also utilizing multiple clusters to further enhance concurrent write throughput.

    • No alternative text description for this image
  • View organization page for Apache Doris, graphic

    2,633 followers

    📢 Apache Doris 2.1.6 is released! This version comes with optimizations and new features in data lakehousing, semi-structured data management, query execution, and more. Highlights include but are not limited to: ☑ Data writeback to #Iceberg tables ☑ Wider support for transparent rewriting in async materialized view ☑ More flexible ingestion, conversion, and processing of semi-structured data https://lnkd.in/gvaNhgrG #database #datalakehouse #opensource

    Apache Doris 2.1.6 just released - Apache Doris

    Apache Doris 2.1.6 just released - Apache Doris

    doris.apache.org

  • Apache Doris reposted this

    View organization page for VeloDB, graphic

    729 followers

    Compute Clusters in VeloDB 🌟 The multi-compute cluster architecture of VeloDB is to facilitate read-write isolation and query workload isolation.  It may not appear to be challenging to implement such architecture in a cloud-native solution with decoupled computation and storage. However, from a product perspective, there are still many key aspects that require carefully crafted design. We are going to answer a series of questions around it with a few posts. 💡 Question 1️⃣ 💡 How to ensure data consistency across the compute clusters? With computation and storage decoupled, data is in shared storage accessible by multiple compute clusters. VeloDB has undergone in-depth refactoring to achieve shared metadata. After data is written into shared storage, the shared metadata is updated first, and then the data write result is returned. Other clusters will access the shared metadata center and retrieve the latest data.

    • No alternative text description for this image

Similar pages