Maarten Grootendorst

Maarten Grootendorst

Tilburg, Noord-Brabant, Nederland
16K volgers Meer dan 500 connecties

Info

▪️ Author of the upcoming "Hands-On Large Language Models" book.
▪️ Open source developer and author of BERTopic, KeyBERT, and PolyFuzz (>5 million downloads)
▪️ Author of the "Exploring Language Models" Newsletter

My path to this point has not been conventional, transitioning from psychology to data science, but it has left me with a strong desire to create data-driven solutions that make the world a slightly better place.

Overview of interesting links:
▪️ Book -- learning.oreilly.com/library/view/hands-on-large-language/9781098150952/
▪️ GitHub -- github.com/MaartenGr
▪️ Newsletter -- maartengrootendorst.substack.com
▪️ YouTube -- youtube.com/@MaartenGrootendorst
▪️ Medium -- medium.com/@maartengrootendorst
▪️ Personal Website -- maartengrootendorst.com

(Data Science) Languages & Frameworks:
▪️ Python (NumPy, Pandas, Sklearn, Pytorch, etc.)
▪️ Docker, Git (CI/CD), Flask, Uvicorn, Graylog, MLFlow, etc.
▪️ NLP (LLMs, Transformers, topic modeling, BERT, etc.)
▪️ Graph-based (Graphviz, Networkx)
▪️ Reinforcement Learning (PER-D3QN, PPO, A2C, etc.)
▪️ SQL, Qlik, SPSS

Activiteit

Neem nu deel om alle activiteiten te bekijken

Ervaring

  • O'Reilly grafisch
  • -

    Eindhoven, North Brabant, Netherlands

  • -

  • -

    Eindhoven, North Brabant, Netherlands

  • -

    Tilburg

  • -

    Tilburg

  • -

    Amsterdam Area, Netherlands

  • -

  • -

  • -

  • -

Opleiding

  •  grafisch

    -

    My thesis was accepted for publication in the European Conference on Machine Learning:

    Grootendorst, M., & Vanschoren, J. (2019, September). Beyond Bag-of-Concepts: Vectors of Locally Aggregated Concepts. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 681-696). Springer, Cham.

  • -

    Thesis: Influence of Emotional Valence in Dual Tasking on the Impact of Distressing Memories

  • -

    Thesis: The Effects of Team Building on Performance Outcomes

  • -

    Thesis: The Effects of Anger on Prosocial Behavior

Licenties en certificaten

Ervaring als vrijwilliger

  • GGzE grafisch

    Data Analyst

    GGzE

    - 5 maanden

    Gezondheid

    The Relaxation Space is a test environment used for the GGzE and developed by Philips. Within this Relaxtion Space patients were given a relaxing experience through the use of smart (ambient) technology. It was my job help patients with the use of the Relaxation Space. Moreover, I helped analyze data from experients using SPSS.

  • Mozaiek Welzijnsdiensten grafisch

    Intake Vrijwilliger

    Mozaiek Welzijnsdiensten

    - 5 maanden

    Gezondheid

    I helped with the requirement, selection and placement of new volunteers. Many new volunteers were those with a physical and/or psychological handicap which required specialized help in finding what was right for them.

  • Volunteer

    Rode Kruis Tilburg

    - 1 jaar 4 maanden

    Gezondheid

    I was responsible for the promotion of the Rode Kruis in Tilburg. Moreover, I helped with the selection, placement and recruitment of new volunteers.

Publicaties

  • Beyond Bag-of-Concepts: Vectors of Locally Aggregated Concepts

    Grootendorst, M., & Vanschoren, J. (2020). Beyond Bag-of-Concepts: Vectors of Locally Aggregated Concepts. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2019, Würzburg, Germany, September 16–20, 2019, Proceedings

    Bag-of-Concepts, a model that counts the frequency of clustered word embeddings (i.e., concepts) in a document, has demonstrated the feasibility of leveraging clustered word embeddings to create features
    for document representation. However, information is lost as the word embeddings themselves are not used in the resulting feature vector. This paper presents a novel text representation method, Vectors of Locally Aggregated Concepts (VLAC). Like Bag-of-Concepts, it clusters word embeddings…

    Bag-of-Concepts, a model that counts the frequency of clustered word embeddings (i.e., concepts) in a document, has demonstrated the feasibility of leveraging clustered word embeddings to create features
    for document representation. However, information is lost as the word embeddings themselves are not used in the resulting feature vector. This paper presents a novel text representation method, Vectors of Locally Aggregated Concepts (VLAC). Like Bag-of-Concepts, it clusters word embeddings for its feature generation. However, instead of counting the frequency of clustered word embeddings, VLAC takes each cluster’s sum of residuals with respect to its centroid and concatenates those to create a feature vector. The resulting feature vectors contain more discriminative information than Bag-of-Concepts due to the additional inclusion of these first-order statistics.

    Publicatie weergeven

Projecten

  • BERTopic: Topic Modeling with Transformer Models

    -

    BERTopic is a topic modeling technique that leverages BERT embeddings and c-TF-IDF to create dense clusters allowing for easily interpretable topics whilst keeping important words in the topic descriptions.

    Project weergeven
  • KeyBERT: Keyword Extraction with BERT

    -

    KeyBERT is a minimal and easy-to-use keyword extraction technique that leverages BERT embeddings to create keywords and keyphrases that are most similar to a document.

    Project weergeven
  • PolyFuzz: Fuzzy string matching, grouping, and evaluation in one

    -

    PolyFuzz performs fuzzy string matching, string grouping, and contains extensive evaluation functions. PolyFuzz is meant to bring fuzzy string matching techniques together within a single framework.

    Currently, methods include a variety of edit distance measures, a character-based n-gram TF-IDF, word embedding techniques such as FastText and GloVe, and 🤗 transformers embeddings.

    Project weergeven
  • ReinLife: Artificial Life with Reinforcement Learning

    -

    Although Evolutionary Algorithms have shown to result in interesting behavior, they focus on learning across generations whereas behavior could also be learned during one's lifetime. This is where Reinforcement Learning comes in, which learns through a reward/punishment system that allows it to learn new behavior during its life time. Using Reinforcement Learning, entities learn to survive, reproduce, and make sure to maximize the fitness of their kin.

    Project weergeven
  • Reviewer: Character Popularity

    -

    Reviewer can be used to scrape user reviews from IMDB, generate word clouds based on a custom class-based TF-IDF, and extract popular characters/actors from reviews using a combination of Named Entity Recognition and Sentiment Analyses.

    Project weergeven
  • SoAn: Analyzing WhatsApp Messages

    -

    Created a package that allows in-depth analyses on WhatsApp conversations. Analyses were initially done on WhatsApp messages between me and my fiancee to surprise her with on our wedding
    Visualizations were done in such a way that it would make sense for someone not familiar with data science. Methods: Sentiment Analysis, TF-IDF, Topic Modeling, Wordclouds, etc.

    Project weergeven
  • VLAC: Vectors of Locally Aggregated Concepts

    -

    VLAC leverages clusters of word embeddings (i.e., concepts) to create features from a collection of documents allowing for classification of documents. Inspiration was drawn from VLAD, which is a feature generation method for image classification. Results and data are included in the repo.

    Project weergeven

Meer activiteiten van Maarten

Bekijk het volledige profiel van Maarten

  • Bekijk wie u allebei kent
  • Word voorgesteld
  • Neem rechtstreeks contact op met Maarten
Word lid en bekijk het volledige profiel

Overige vergelijkbare profielen

Anderen hebben Maarten Grootendorst genoemd