- Stochastic Gradient Descent
- Batch Gradient Descent
- Momentum Based Gradient Descent
- Nesterov's accelerated gradient descent
- Mean Squared Error
- Categorical Cross_entropy
- Sigmoid_Layer
- Softmax_Layer
- Relu_Layer
- numpy
- pandas
- matplotlib
- using pickle is optional: It is just used for saving the model object
- Layers:
model.add_Dense(neuron_type,num_neurons) # neuron_type=Sigmoid_Layer, Softmax_Layer
- Training:
model.fit(x_train,y_train,x_test,y_test,n_epoch,batch_size) # used for training the model
- Validation:
model.validate(x,y) # to find the accuracy on validation dataset
- Data Preparation:
get_data(path) # to seperate the feature vectors and labels from the csv file get_test_data(path) # to get the test data get_submission_csv(path,model_obj,submission_file_path) # Get a kaggle submission file for the given test file (path)
- Test Train Split:
test_train_split(frac,x,y) # to get the train and test data in the required fracion
- Predict:
get_predictions(dataset, model_obj) # dataset==the test_dataset, object of the model class used
- Sigmoid in the last layer does not work well with the categorical cross entropy loss function.
- Small Dataset requires a small neural network so 3 layers seem to be enough.
- Note: Accuracy is not remaining the same for a neural network trained again.
- Softmax with categorical cross entropy gives the best accuracy. Links of various runs can be found along with their names in the wandb database.
- Sigmoid layer at the output is performing very good as compared to softmax layer.
- Momentum based gradient descent with moment (0.1) gets same accuracy as plain model(90, 88) in just 20 epochs as against 40 epochs of plain model with error of 0.0094 and gave 92.5% accuracy on kaggle with 40 epochs.
- True divide does not work in a softmax function while np.divide() works.....why?
- Stochastic works better than batch gradient descent
- Reducing 1 layer performs better but the model still suffers from bias
- Libraries used: numpy, pandas, matplotlib, sklearn (for creating datasets)
- Fit the data set and train the clustering algorithm:
model.fit(list_of_training_data_points)
- Return a list of all the neighbors of the point satisfying the epsilon conditions
model.get_neighbors(pt_whose_neighbors_are_to_be_found)
- Return the number of clusters identified so far
model.number_of_clusters()
- Class Point: contains 2 attributes:
pt.value # gives the coordinate vector of the point pt.cluster # gives the cluster number to which the point belongs
The algorithm above works on the depth first or breadth first search in a graph where it finds the number of disjoint graphs and clusters out all the points that do not satisfy the conditions to be core points or edge points. These points are thus classified as Noise Points.
- Libraries used: numpy, matplotlib, sklearn (for datasets), pandas
- Used Iris data set to compare the Kmeans from sklearn to my implementation
- Train:
model.fit(X) # X - list of training data points
- Return the centers of various clusters identified
model.cluster_centers()
- Return a list of the predicted labels for the datapoints
model.labels(X) # X - list of data points to predict upon
- Return the list of all the elements in the specified cluster
model.get_cluster_elements(X,cluster_number)
- Initialization is not random. It is done following the structure of data.
- minkowski’s distance metric has been used rather than euclidean distance to include multiple weight metrics.
- Weighted Kmeans has been used which gives more preference to important features while less preference to less important features in a feature vector.