Maarten Grootendorst
Tilburg, Noord-Brabant, Nederland
16K volgers
Meer dan 500 connecties
Info
▪️ Author of the upcoming "Hands-On Large Language Models" book.
▪️ Open source developer and author of BERTopic, KeyBERT, and PolyFuzz (>5 million downloads)
▪️ Author of the "Exploring Language Models" Newsletter
My path to this point has not been conventional, transitioning from psychology to data science, but it has left me with a strong desire to create data-driven solutions that make the world a slightly better place.
Overview of interesting links:
▪️ Book -- learning.oreilly.com/library/view/hands-on-large-language/9781098150952/
▪️ GitHub -- github.com/MaartenGr
▪️ Newsletter -- maartengrootendorst.substack.com
▪️ YouTube -- youtube.com/@MaartenGrootendorst
▪️ Medium -- medium.com/@maartengrootendorst
▪️ Personal Website -- maartengrootendorst.com
(Data Science) Languages & Frameworks:
▪️ Python (NumPy, Pandas, Sklearn, Pytorch, etc.)
▪️ Docker, Git (CI/CD), Flask, Uvicorn, Graylog, MLFlow, etc.
▪️ NLP (LLMs, Transformers, topic modeling, BERT, etc.)
▪️ Graph-based (Graphviz, Networkx)
▪️ Reinforcement Learning (PER-D3QN, PPO, A2C, etc.)
▪️ SQL, Qlik, SPSS
Activiteit
-
Just received my copy of "Hands-On Large Language Models" by Jay Alammar and Maarten Grootendorst in the mail – the second book I've had the pleasure…
Just received my copy of "Hands-On Large Language Models" by Jay Alammar and Maarten Grootendorst in the mail – the second book I've had the pleasure…
Gemarkeerd als interessant door Maarten Grootendorst
-
Excited to see the LLM-book.com repo at 1K stars! It contains a Jupyter notebook per chapter containing all the code examples in the book. Check it…
Excited to see the LLM-book.com repo at 1K stars! It contains a Jupyter notebook per chapter containing all the code examples in the book. Check it…
Gemarkeerd als interessant door Maarten Grootendorst
-
This is an amazing book by Jay Alammar and Maarten Grootendorst I really loved the depths of the book. I am a beginner to LLMs but I am able to…
This is an amazing book by Jay Alammar and Maarten Grootendorst I really loved the depths of the book. I am a beginner to LLMs but I am able to…
Gemarkeerd als interessant door Maarten Grootendorst
Ervaring
Opleiding
-
-
My thesis was accepted for publication in the European Conference on Machine Learning:
Grootendorst, M., & Vanschoren, J. (2019, September). Beyond Bag-of-Concepts: Vectors of Locally Aggregated Concepts. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 681-696). Springer, Cham. -
-
Thesis: Influence of Emotional Valence in Dual Tasking on the Impact of Distressing Memories
-
-
Thesis: The Effects of Team Building on Performance Outcomes
-
-
Thesis: The Effects of Anger on Prosocial Behavior
Licenties en certificaten
Ervaring als vrijwilliger
-
Data Analyst
GGzE
- 5 maanden
Gezondheid
The Relaxation Space is a test environment used for the GGzE and developed by Philips. Within this Relaxtion Space patients were given a relaxing experience through the use of smart (ambient) technology. It was my job help patients with the use of the Relaxation Space. Moreover, I helped analyze data from experients using SPSS.
-
Intake Vrijwilliger
Mozaiek Welzijnsdiensten
- 5 maanden
Gezondheid
I helped with the requirement, selection and placement of new volunteers. Many new volunteers were those with a physical and/or psychological handicap which required specialized help in finding what was right for them.
-
Volunteer
Rode Kruis Tilburg
- 1 jaar 4 maanden
Gezondheid
I was responsible for the promotion of the Rode Kruis in Tilburg. Moreover, I helped with the selection, placement and recruitment of new volunteers.
Publicaties
-
Beyond Bag-of-Concepts: Vectors of Locally Aggregated Concepts
Grootendorst, M., & Vanschoren, J. (2020). Beyond Bag-of-Concepts: Vectors of Locally Aggregated Concepts. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2019, Würzburg, Germany, September 16–20, 2019, Proceedings
Bag-of-Concepts, a model that counts the frequency of clustered word embeddings (i.e., concepts) in a document, has demonstrated the feasibility of leveraging clustered word embeddings to create features
for document representation. However, information is lost as the word embeddings themselves are not used in the resulting feature vector. This paper presents a novel text representation method, Vectors of Locally Aggregated Concepts (VLAC). Like Bag-of-Concepts, it clusters word embeddings…Bag-of-Concepts, a model that counts the frequency of clustered word embeddings (i.e., concepts) in a document, has demonstrated the feasibility of leveraging clustered word embeddings to create features
for document representation. However, information is lost as the word embeddings themselves are not used in the resulting feature vector. This paper presents a novel text representation method, Vectors of Locally Aggregated Concepts (VLAC). Like Bag-of-Concepts, it clusters word embeddings for its feature generation. However, instead of counting the frequency of clustered word embeddings, VLAC takes each cluster’s sum of residuals with respect to its centroid and concatenates those to create a feature vector. The resulting feature vectors contain more discriminative information than Bag-of-Concepts due to the additional inclusion of these first-order statistics.
Projecten
-
BERTopic: Topic Modeling with Transformer Models
-
BERTopic is a topic modeling technique that leverages BERT embeddings and c-TF-IDF to create dense clusters allowing for easily interpretable topics whilst keeping important words in the topic descriptions.
-
KeyBERT: Keyword Extraction with BERT
-
KeyBERT is a minimal and easy-to-use keyword extraction technique that leverages BERT embeddings to create keywords and keyphrases that are most similar to a document.
-
PolyFuzz: Fuzzy string matching, grouping, and evaluation in one
-
PolyFuzz performs fuzzy string matching, string grouping, and contains extensive evaluation functions. PolyFuzz is meant to bring fuzzy string matching techniques together within a single framework.
Currently, methods include a variety of edit distance measures, a character-based n-gram TF-IDF, word embedding techniques such as FastText and GloVe, and 🤗 transformers embeddings. -
ReinLife: Artificial Life with Reinforcement Learning
-
Although Evolutionary Algorithms have shown to result in interesting behavior, they focus on learning across generations whereas behavior could also be learned during one's lifetime. This is where Reinforcement Learning comes in, which learns through a reward/punishment system that allows it to learn new behavior during its life time. Using Reinforcement Learning, entities learn to survive, reproduce, and make sure to maximize the fitness of their kin.
-
Reviewer: Character Popularity
-
Reviewer can be used to scrape user reviews from IMDB, generate word clouds based on a custom class-based TF-IDF, and extract popular characters/actors from reviews using a combination of Named Entity Recognition and Sentiment Analyses.
-
SoAn: Analyzing WhatsApp Messages
-
Created a package that allows in-depth analyses on WhatsApp conversations. Analyses were initially done on WhatsApp messages between me and my fiancee to surprise her with on our wedding
Visualizations were done in such a way that it would make sense for someone not familiar with data science. Methods: Sentiment Analysis, TF-IDF, Topic Modeling, Wordclouds, etc.
-
VLAC: Vectors of Locally Aggregated Concepts
-
VLAC leverages clusters of word embeddings (i.e., concepts) to create features from a collection of documents allowing for classification of documents. Inspiration was drawn from VLAD, which is a feature generation method for image classification. Results and data are included in the repo.
Meer activiteiten van Maarten
-
Check out this great course by Maarten Grootendorst. Once more he does a great job at explaining one of the new architectures in LLMs. If you're…
Check out this great course by Maarten Grootendorst. Once more he does a great job at explaining one of the new architectures in LLMs. If you're…
Gemarkeerd als interessant door Maarten Grootendorst
-
Tokens and Embeddings: Chapter 2 of the LLM-book.com lays the foundation to understanding LLMs by breaking down two of their fundamental concepts…
Tokens and Embeddings: Chapter 2 of the LLM-book.com lays the foundation to understanding LLMs by breaking down two of their fundamental concepts…
Gemarkeerd als interessant door Maarten Grootendorst
Overige vergelijkbare profielen
Anderen hebben Maarten Grootendorst genoemd
2 anderen door wie Maarten Grootendorst is genoemd, gebruiken LinkedIn
Bekijk anderen die Maarten Grootendorst heten