MoleculeNet benchmark dataset & MolMapNet dataset
-
Updated
Mar 29, 2022 - HTML
MoleculeNet benchmark dataset & MolMapNet dataset
A Python toolkit for file processing, text cleaning and data splitting. 文件处理,文本清洗和数据划分的python工具包。
Data-Splitter is a Python script designed to split a large CSV file containing data into three different formats: JSON, a database table, and another CSV file. The script ensures a random distribution of data across the three output formats based on custom-defined ratios.
Comparative Analysis of Data Protection Mechanisms in Public Clouds
splitting image dataset into train, val, test sets
ML model for Crop Detection
Python Preprocessing for Sales Project Notebook
In this project, I have used logistic regression, a supervised machine learning algorithm, to predict whether a person has diabetes or not based on various features such as age, blood pressure, glucose level, body mass index, etc. I have used Python and popular libraries such as Pandas, Scikit-Learn, and Matplotlib to perfom model building
A basic Python script to split a .dat file into individual sample files.
This project focuses on cleaning and analyzing a loan application dataset to gain insights into the factors influencing loan defaults. Through systematic data cleaning, visualization, and merging with previous application data, it provides a robust foundation for further predictive modeling.
Julia package for "FDR Control via Data Splitting for Testing-after-Clustering (arXiv: 2410.06451)"
Predicting company bankruptcy using various machine learning models. The dataset is sourced from Kaggle: Company Bankruptcy Prediction.
As Tensorflow Kennard-Stone algorithmin uses euclidean distances, the need for an adaptation arrises when dealing with a big vector space that has unknown correlations between its variables, it may improve a lot neural networks performance.
Apply DUPLEX data split to the given dataset and return training and test datasets. REF: Snee, R. D. (1977). Validation of regression models: methods and examples. Technometrics, 19(4), 415-428.
Analyzed customer churn using transaction data. Built ML model to predict lapses. Dataset includes customer status, collection/redemption info, and program tenure. Delivered business presentation outlining modeling approach, findings, and churn reduction strategies.
Focus on selecting datasets suitable for a machine learning experiment, with an emphasis on data cleaning, encoding, and transformation steps necessary to prepare the data.
Utilizing Apache Spark & PySpark to analyze a movie dataset. Tasks include data exploration, identifying top-rated movies, training a linear regression model, and experimenting with Airflow.
Final project program DBA mitra Ruangguru X Studi Independen Bersertifikat Kampus Merdeka batch 2
As Tensorflow Kennard-Stone algorithmin uses euclidean distances, the need for an adaptation arrises when dealing with a big vector space that has unknown correlations between its variables, it may improve a lot neural networks performance.
A sample model for predicting the systolic level of an individual by providing the age,cholesterol and blood pressure
Add a description, image, and links to the data-splitting topic page so that developers can more easily learn about it.
To associate your repository with the data-splitting topic, visit your repo's landing page and select "manage topics."