Route data to multiple destinations
Enrich data events with business or service context
Search and analyze data directly at its source, an S3 bucket, or Cribl Lake
Reduce the size of data
Shape data to optimize its value
Store data in S3 buckets or Cribl Lake
Replay data from low-cost storage
Collect logs and metrics from host devices
Centrally receive and route telemetry to all your tools
Redact or mask sensitive data
Optimize data for better threat detection and response
Streamline infrastructure to reduce complexity and cost
Simplify Kubernetes data collection
Optimize logs for value
Control how telemetry is stored
Easily handle new cloud telemetry
Ensure freedom in your tech stack
Accelerate the value of AIOps
Effortlessly search, collect, process, route and store telemetry from every corner of your infrastructure—in the cloud, on-premises, or both—with Cribl. Try the Cribl Suite of products today.
Learn moreGet started quickly without managing infrastructure
Get telemetry data from anywhere to anywhere
Streamline collection with a scalable, vendor-neutral agent
Easily access and explore telemetry from anywhere, anytime
Store, access, and replay telemetry.
AI-powered tools designed to maximize productivity
Instrument, collect, observe
Get hands-on support from Cribl experts to quickly deploy and optimize Cribl solutions for your unique data environment.
Work with certified partners to get up and running fast. Access expert-level support and get guidance on your data strategy.
Get inspired by how our customers are innovating IT, security, and observability. They inspire us daily!
Read customer storiesFREE training and certs for data pros
Log in or sign up to start learning
Step-by-step guidance and best practices
Tutorials for Sandboxes & Cribl.Cloud
Ask questions and share user experiences
Troubleshooting tips, and Q&A archive
The latest software features and updates
Get older versions of Cribl software
For registered licensed customers
Advice throughout your Cribl journey
Connect with Cribl partners to transform your data and drive real results.
Join the Cribl Partner Program for resources to boost success.
Log in to the Cribl Partner Portal for the latest resources, tools, and updates.
Our Criblpedia glossary pages provide explanations to technical and industry-specific terms, offering valuable high-level introduction to these concepts.
At its most basic level, a data pipeline can be seen as an aggregator or even a manifold that takes data from multiple sources and distributes that data to multiple destinations, eliminating the need for multiple bespoke systems. As the data transits the pipeline, it may also be acted upon, essentially shaped based on organizational needs and/or the requirements of a receiving system.
The internals of a data pipeline can be viewed as a series of steps or processes that shape the data in motion as it travels from its source to its destination. These tools and techniques perform an ETL (extraction, transformation, and loading) type function on the raw data and shape it into a format suitable for analysis.
Data pipelines are built using a combination of software tools, technologies, and coding scripts. Many companies offer data or observability pipelines. They share many common features like routing, filtering, and shaping, but each vendor also has some unique values. In addition to buying a data pipeline solution, some organizations may use open-source tools to build their own, either for cost-saving or to address specific issues in their enterprise. However, once it’s built, you have to maintain it forever, which might prove to be more expensive and complex than an off-the-shelf solution.
Data pipelines come in a variety of types, some designed for a specific purpose, while others support a range of functionalities. Understanding them is crucial to optimizing data processing strategies. It enables enterprises to leverage the right approach for their specific needs and objectives. Let’s explore these types in more depth.
Batch processing
This pipeline function is specifically designed to process large volumes of data in batches at scheduled intervals. It excels in handling large datasets that do not require real-time analysis. By moving data in batches, it optimizes efficiency and resource utilization.
Streaming Data
As the name suggests, this function is designed to handle streaming data in real-time. It is particularly useful for applications that require immediate analysis and response, such as fraud detection and monitoring system performance. Processing data on arrival enables fast decision-making and proactive actions.
Hybrid Data Pipeline
Most data pipelines have some capability to support both capabilities. Combining elements from both to handle real-time and batch-processing needs. This flexibility allows companies to efficiently manage diverse data processing requirements, ensuring both immediate insights and comprehensive analysis.
Deployment Modes
Data pipelines are available as both cloud (SaaS) and on-premise (SW) solutions. The choice of deployment model is user-specific and may depend on security concerns and the location of data sources and destinations. Some vendors offer hybrid solutions that leverage both cloud and hardware/SW components.
The architecture of a data pipeline can vary significantly, depending on the specific needs and complexities involved in managing the data. Some common components typically included in a data pipeline are:
Data Source
This encompasses a wide range of sources from which raw data is collected, including databases, files, web APIs, data stores, and other forms of data repositories. These diverse sources provide a comprehensive and varied pool of information that serves as the foundation for data analysis and decision-making processes. Think of it like this: before you can ingest data, you must attach a source to a data pipeline.
Agent / Extractor
This component is actually external to the pipeline. Typically, it’s located on the data source or between the source and the pipeline and plays a pivotal role in the data processing for seamless retrieval of data from its designated source. Its role is to efficiently collect and transfer data to the pipeline, playing an integral role in getting the right data into the pipeline.
Pre-Processing / Transformer
This is the first stage when raw data enters the pipeline. Here, the data is filtered and formatted data is cleaned and transformed into a more usable format for analysis. Meticulous data preparation ensures accuracy, consistency, and reliability, laying the foundation for meaningful insights and informed decisions.
Routes/ Loader
This component’s primary role is to forward the pre-processed data to its designated path for processing. Systems typically use a set of filters to identify a subset of received events and deliver that data to a specific pipeline for processing.
Processor
Data matched by a given Route is delivered to a logical Pipeline. Pipelines are the heart of data processing and are composed of individual functions that operate on the data they receive. When events enter a Pipeline, they’re processed by a series of Functions.
At its core, a Function is code that executes on an event. The term “processing” means a variety of possible options: string replacement, obfuscation, encryption, event-to-metrics conversions, etc. For example, a Pipeline can be composed of several Functions – one that replaces the term “foo” with “bar,” another one that hashes “bar,” and a final one that adds a field (say, dc=jfk-42) to any event that matches source==’us-nyc-application.log’.
Destinations
The final stage of the pipeline data processing is forwarding the data to the final destination. This can include data stores, systems of analysis, or many others.
Data pipelines allow administrators to process machine data – logs, instrumentation data, application data, metrics, etc. – in real-time, and deliver them to your analysis platform of choice. It allows you to:
Classic choice. Sadly, our website is designed for all modern supported browsers like Edge, Chrome, Firefox, and Safari
Got one of those handy?