The Data Engineering subteam of Cornell Data Science
-
Updated
Oct 8, 2024 - XSLT
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
The Data Engineering subteam of Cornell Data Science
Graphical recommender system in Scala using personalized PageRank
This application will calculate the daily product revenue that displays date in ascending order and revenue in decending order in Spark & MySQL. It also demonstrates how to reduce Stages & Task in Spark using broadcast variables.
Predicting the fare of Taxi rides (with Spark-Scala)
This is Spark/Scala based Next Best Offer Recommendation Model which used Random Forest Classifier Algorithm
Twitter Sentiment Analysis using Kafka Connect, Spark Streaming, Apache Avro, MLlib and Stanford CoreNLP
Academic project to build a profile based flight recommendation system, which will predict the flight rates for a particular destination, in order to enable the user to book a flight at its cheapest rates.
This is spark/Scala based Mobile Telecommunication Customer Churn Prediction model developed using Random Forest algorithm
This is Spark/Scala based Mobile Telecommunication Customer Value Added Services Recommendation Model which used Collaborative Filtering based Alternative Least Square Algorithm.
Spark DStream application to detect emerging topics on Twitter
spark with scala, including rdd, transform, action, hdfs, sparkSQL, dataframe and mllib
Created by Matei Zaharia
Released May 26, 2014