Skip to content

Bodo 2020.12 Release (Date: 12/30/2020)

This release includes many new features, bug fixes and performance improvements. Overall, 60 code patches were merged since the last release.

New Features and Improvements

  • Bodo is updated to use Numba 0.52 (latest)

  • Support for reading CSV and Parquet from Azure Data Lake Storage (ADLS)

  • Improved support for UDFs
    • More robust user function handling
    • Improved support for date/time data types in UDFs
  • Improved support for rolling window functions
    • Support raw argument of apply()
    • Support column selection from rolling objects
    • Support for nullable int values
  • Pandas coverage:
    • Support for groupby.apply
    • Support for groupby rolling functions
    • Improved support for dataframe indexing using df.loc/iloc
    • Improve dtype handling in read_csv
    • Support for Series.mask
    • Improved robustness for highly skewed string data (e.g. most of string data is on a few processes due to uneven data distribution)
    • Support for dataframes with repeated column names
    • Support for datetime.date arrays as Index in pivot_table and as argument to pd.DatetimeIndex
    • Improved error checking in Pandas implementations
    • Unroll constant loops for type stability in more cases
  • Numpy coverage:
    • Support for np.hstack
  • Scikit-learn:
    • Support for sklearn.preprocessing.StandardScaler inside jit functions.