-
Notifications
You must be signed in to change notification settings - Fork 902
Insights: rapidsai/cudf
Overview
Could not load contribution data
Please try again later
38 Pull requests merged by 21 people
-
remove WheelHelpers.cmake
#17276 merged
Nov 8, 2024 -
Make constructor of DeviceMemoryBufferView public
#17265 merged
Nov 8, 2024 -
Wrap custom iterator result
#17251 merged
Nov 8, 2024 -
Implement inequality joins by translation to conditional joins
#17000 merged
Nov 8, 2024 -
Rewrite Java API
Table.readJSON
to return the output from libcudfread_json
directly#17180 merged
Nov 8, 2024 -
Add IWYU to CI
#17078 merged
Nov 8, 2024 -
Plumb pylibcudf datetime APIs through cudf python
#17275 merged
Nov 8, 2024 -
Fix data_type ctor call in JSON_TEST
#17273 merged
Nov 8, 2024 -
Allow generating large strings in benchmarks
#17224 merged
Nov 8, 2024 -
Add optional column_order in JSON reader
#17029 merged
Nov 8, 2024 -
Use pylibcudf.search APIs in cudf python
#17271 merged
Nov 7, 2024 -
Use
pylibcudf.strings.convert.convert_integers.is_integer
in cudf python#17270 merged
Nov 7, 2024 -
Move strings to date/time types benchmarks to nvbench
#17229 merged
Nov 7, 2024 -
Process parquet bools with microkernels
#17157 merged
Nov 7, 2024 -
Add support for
pyarrow-18
#17256 merged
Nov 7, 2024 -
Add io.text APIs to pylibcudf
#17232 merged
Nov 7, 2024 -
AWS S3 IO through KvikIO
#16499 merged
Nov 7, 2024 -
Refactor gather/scatter benchmarks for strings
#17223 merged
Nov 7, 2024 -
Fix extract-datetime deprecation warning in ndsh benchmark
#17254 merged
Nov 7, 2024 -
cudf-polars
string/numeric casting#17076 merged
Nov 7, 2024 -
Added ast tree to simplify expression lifetime management
#17156 merged
Nov 7, 2024 -
Move strings/numeric convert benchmarks to nvbench
#17255 merged
Nov 7, 2024 -
Fix the example in documentation for
get_dremel_data()
#17242 merged
Nov 7, 2024 -
Put a ceiling on cuda-python
#17264 merged
Nov 7, 2024 -
KvikIO shared library
#17239 merged
Nov 7, 2024 -
Disallow cuda-python 12.6.1 and 11.8.4
#17253 merged
Nov 6, 2024 -
Search for kvikio with lowercase
#17243 merged
Nov 6, 2024 -
Deprecate single component extraction methods in libcudf
#17221 merged
Nov 5, 2024 -
Separate evaluation logic from
IR
objects in cudf-polars#17175 merged
Nov 5, 2024 -
Refactor Dask cuDF legacy code
#17205 merged
Nov 4, 2024 -
Fix discoverability of submodules inside
pd.util
#17215 merged
Nov 4, 2024 -
Use more pylibcudf.io.types enums in cudf._libs
#17237 merged
Nov 4, 2024 -
Expose mixed and conditional joins in pylibcudf
#17235 merged
Nov 4, 2024 -
Make HostMemoryBuffer call into the DefaultHostMemoryAllocator
#17204 merged
Nov 4, 2024 -
Expose stream-ordering in subword tokenizer API
#17206 merged
Nov 4, 2024 -
Add
num_iterations
axis to the multi-threaded Parquet benchmarks#17231 merged
Nov 2, 2024 -
Change default KvikIO parameters in cuDF: set the thread pool size to 4, and compatibility mode to ON
#17185 merged
Nov 1, 2024
24 Pull requests opened by 14 people
-
Add read_parquet_metadata to pylibcudf
#17245 opened
Nov 5, 2024 -
Migrate contiguous split APIs to pylibcudf
#17246 opened
Nov 5, 2024 -
Add breaking change workflow trigger
#17248 opened
Nov 5, 2024 -
POC: Implement `HOST_UDF` aggregations
#17249 opened
Nov 5, 2024 -
[WIP] Add new ``dask_cudf.read_parquet`` API
#17250 opened
Nov 5, 2024 -
Add write_parquet to pylibcudf
#17252 opened
Nov 6, 2024 -
Expose streams in public quantile APIs
#17257 opened
Nov 6, 2024 -
Add type stubs for pylibcudf
#17258 opened
Nov 6, 2024 -
Always prefer `device_read`s when kvikio is enabled
#17260 opened
Nov 6, 2024 -
[DNM][WIP] Single-partition Dask executor for cuDF-Polars
#17262 opened
Nov 7, 2024 -
Add write_parquet to pylibcudf
#17263 opened
Nov 7, 2024 -
Expose delimiter character in JSON reader options to JSON reader APIs
#17266 opened
Nov 7, 2024 -
Add `catboost` to the third-party integration tests
#17267 opened
Nov 7, 2024 -
Raise errors on specific types of fallback in `cudf.pandas`
#17268 opened
Nov 7, 2024 -
Move strings filter benchmarks to nvbench
#17269 opened
Nov 7, 2024 -
Follow up making Python tests more deterministic
#17272 opened
Nov 7, 2024 -
Add `cudf::calendrical_month_sequence` to pylibcudf
#17277 opened
Nov 7, 2024 -
Optimize distinct inner join to use set `find` instead of `retrieve`
#17278 opened
Nov 8, 2024 -
Add compute_column_expression to pylibcudf for transform.compute_column
#17279 opened
Nov 8, 2024 -
Use numba-cuda<0.0.18
#17280 opened
Nov 8, 2024 -
Java JNI for Multiple contains
#17281 opened
Nov 8, 2024 -
test PR
#17283 opened
Nov 8, 2024 -
WIP: [DO NOT MERGE] enforce wheel size limits, README formatting in CI
#17284 opened
Nov 8, 2024 -
Switch to using `TaskSpec`
#17285 opened
Nov 8, 2024
23 Issues closed by 7 people
-
[FEA] Check why we need `__iter__` special overrides for `cudf.pandas`
#14481 closed
Nov 8, 2024 -
[BUG] Incorrect dtype when iterating over dtypes in cudf.pandas
#17165 closed
Nov 8, 2024 -
[FEA] Support conditional joins in cudf-polars
#16926 closed
Nov 8, 2024 -
[FEA] Implement a better JNI function to assemble the output columns from `cudf::read_json`
#17002 closed
Nov 8, 2024 -
Compile times are growing significantly
#581 closed
Nov 8, 2024 -
[BUG] `cudf::io::read_json` does not verify output column structures with the input schema
#16799 closed
Nov 8, 2024 -
[FEA] `read_json` should output all-nulls columns for the schema columns that do not exist in the input
#17091 closed
Nov 8, 2024 -
[FEA] The output columns of `read_json` need to follow depth-first-search order as in the input schema
#17090 closed
Nov 8, 2024 -
[BUG] JSON parser still returns outdated schema structure for strings column
#17240 closed
Nov 8, 2024 -
[BUG] Chunked parquet reader incorrect results for large string columns
#17158 closed
Nov 7, 2024 -
[BUG] The read_json() method of cudf can't parse the string like "5-0"
#14670 closed
Nov 7, 2024 -
The example in the documentation for `get_dremel_data()` seems incorrect at line#1764
#11396 closed
Nov 7, 2024 -
Reduce IO when `byte_range` option is used in `read_json`
#15185 closed
Nov 5, 2024 -
Memory mapped datasource does not allow reading data beyond the mapped range
#15186 closed
Nov 5, 2024 -
[FEA] [Proposal] Separate IR evaluation logic from the IR object in cudf-polars
#17127 closed
Nov 5, 2024 -
[FEA] Support datetime64[D]
#16803 closed
Nov 5, 2024 -
[BUG] Series.plot method populates pd.util.version when run under cudf.pandas
#17166 closed
Nov 4, 2024 -
[FEA] support max and min aggregations for nested structs
#8974 closed
Nov 4, 2024 -
[FEA] Add nested struct support for comparison operations
#8964 closed
Nov 4, 2024 -
[BUG] new parquet writer code checks for `nullable` not `has_nulls`
#7654 closed
Nov 4, 2024 -
[FEA] Adjust libcudf to not load cuFile by default
#16512 closed
Nov 1, 2024 -
[BUG] `cudf::io::json::detail::get_token_stream` does not respect `normalize_single_quotes` option
#17230 closed
Nov 1, 2024
9 Issues opened by 7 people
-
[BUG] pd.api.interchange.from_dataframe fails with simple cuDF dataframe
#17282 opened
Nov 8, 2024 -
[BUG] AST_TEST TransformTest.DeeplyNestedArithmeticLogicalExpression fails when run in a debug build
#17274 opened
Nov 7, 2024 -
[FEA] Adjust libcudf to use kvikIO for small host reads
#17259 opened
Nov 6, 2024 -
[FEA] Compare performance of decompression engine (HW) versus decompression kernels (SW) on Blackwell
#17247 opened
Nov 5, 2024 -
[FEA] Support `strict=False` casting in `cudf-polars`
#17244 opened
Nov 4, 2024 -
Support Hash-based group by aggregations for min/max with nesting
#17241 opened
Nov 4, 2024 -
[FEA] read_csv optimizations with streaming multiprocessors and MGPU
#17238 opened
Nov 2, 2024 -
[FEA] Add function for "deduplicate map" to libcudf
#17236 opened
Nov 1, 2024
37 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Add cudf::strings::contains_multiple
#16900 commented on
Nov 8, 2024 • 19 new comments -
[FEA] Report all unsupported operations for a query in cudf.polars
#16960 commented on
Nov 8, 2024 • 15 new comments -
Add new nvtext minhash_permuted API
#16756 commented on
Nov 8, 2024 • 13 new comments -
Reading multi-source compressed JSONL files
#17161 commented on
Nov 8, 2024 • 7 new comments -
Migrate CSV writer to pylibcudf
#17163 commented on
Nov 7, 2024 • 6 new comments -
Improve the performance of low cardinality groupby
#16619 commented on
Nov 8, 2024 • 5 new comments -
Polars: DataFrame Serialization
#17062 commented on
Nov 8, 2024 • 4 new comments -
Benchmarking JSON reader for compressed inputs
#17219 commented on
Nov 7, 2024 • 2 new comments -
Support hyper log log plus plus(HLL )
#17133 commented on
Nov 6, 2024 • 2 new comments -
Implement `cudf-polars` chunked parquet reading
#16944 commented on
Nov 8, 2024 • 1 new comment -
add telemetry setup to test
#16924 commented on
Nov 4, 2024 • 0 new comments -
Use block per string for super long strings in cudf::strings::find()
#17011 commented on
Nov 6, 2024 • 0 new comments -
Use `libcudf_exception_handler` throughout `pylibcudf.libcudf`
#17109 commented on
Nov 7, 2024 • 0 new comments -
WIP: Support decimal dtypes
#17113 commented on
Nov 7, 2024 • 0 new comments -
Improve cudf::io::datasource::create().
#17115 commented on
Nov 3, 2024 • 0 new comments -
[DO NOT MERGE/REVIEW] GDS debugging
#17116 commented on
Nov 5, 2024 • 0 new comments -
[test] compression benchmarks
#17145 commented on
Nov 8, 2024 • 0 new comments -
Added Arrow Interop Benchmarks
#17194 commented on
Nov 7, 2024 • 0 new comments -
Fix `Dataframe.__setitem__` slow-downs
#17222 commented on
Nov 8, 2024 • 0 new comments -
Precompute AST arity
#17234 commented on
Nov 5, 2024 • 0 new comments -
[FEA] Support COUNT_VALID and COUNT_ALL for scan and group by scan
#8710 commented on
Nov 4, 2024 • 0 new comments -
[FEA] aggregation each list in a column to a single value using a user supplied function
#8020 commented on
Nov 4, 2024 • 0 new comments -
[FEA] Full coverage of datetime methods in cudf-polars
#16481 commented on
Nov 5, 2024 • 0 new comments -
[FEA] Confirm correct and efficient dictionary processing in libcudf
#15199 commented on
Nov 5, 2024 • 0 new comments -
[FEA] Use bloom filters in Parquet reader to filter row groups with equality predicates
#17164 commented on
Nov 5, 2024 • 0 new comments -
[Story] Enable multithreading in cuIO and libcudf
#17119 commented on
Nov 5, 2024 • 0 new comments -
[FEA] Global thread pool in benchmarks
#16801 commented on
Nov 5, 2024 • 0 new comments -
[FEA] Add cudf-polars to test.yaml
#16383 commented on
Nov 5, 2024 • 0 new comments -
[QST] cudf.pandas prefer using CPU over GPU in some cases
#14500 commented on
Nov 5, 2024 • 0 new comments -
[FEA] Offer more control over CPU fallback in cudf.pandas
#14975 commented on
Nov 5, 2024 • 0 new comments -
[FEA] Replace internal usage of std::string with std::string_view
#15907 commented on
Nov 6, 2024 • 0 new comments -
[FEA] Provide type stubs for pylibcudf package
#15190 commented on
Nov 6, 2024 • 0 new comments -
[FEA] Expose public stream-ordered C APIs
#13744 commented on
Nov 6, 2024 • 0 new comments -
[BUG] read_json fails when reading multiple files but works with the individual files
#14268 commented on
Nov 7, 2024 • 0 new comments -
[QST] Does the read_json() method support GPU acceleration?
#14669 commented on
Nov 7, 2024 • 0 new comments -
[FEA] Implement all libcudf modules required by cuDF Python in pylibcudf
#15162 commented on
Nov 8, 2024 • 0 new comments -
Fix reading of single-row unterminated CSV files
#16055 commented on
Nov 8, 2024 • 0 new comments