Technical & Research AI

  • The AI revolution drove frenzied investment in both private and public companies and captured the public’s imagination in 2023. Transformational consumer products like ChatGPT are powered by Large Language Models (LLMs) that excel at modeling sequences of tokens that represent words or parts of words [2]. Amazingly, structural understanding emerges from learning next-token prediction, and […]
  • A deep dive into biases in machine learning, with a focus on historical (or social) biases.Humans are biased. To anyone who has had to deal with bigoted individuals, unfair bosses, or oppressive systems — in other words, all of us — this is no surprise. We should thus welcome machine learning models which can help us to make more objective […]
  • Customized backend; GCP Deployment; Data Versioning with GCS IntegrationImage by AuthorTable of Contents· Introduction· Overview ∘ Goal ∘ Why semiautomatic? ∘ Entering Label Studio ∘ 1 frontend + 2 backends· Implementation (Local) ∘ 1. Install git and docker & download backend code ∘ 2. Set up frontend to get access token ∘ 3. Set up backend containers […]
  • Dive into the “Curse of Dimensionality” concept and understand the math behind all the surprising phenomena that arise in high dimensions.Image from Dall-EIn the realm of machine learning, handling high-dimensional vectors is not just common; it’s essential. This is illustrated by the architecture of popular models like Transformers. For instance, BERT uses 768-dimensional vectors to encode […]
  • Learn how you can utilize a tiny large language model, fine-tune it, and achieve high performanceContinue reading on Towards Data Science »
  • Now that you know how to code & visualize data, what’s next?Continue reading on Towards Data Science »
  • Running the 7B and 22B Models in Google ColabContinue reading on Towards Data Science »
  • Created with Nightcafe — Image property of AuthorLearn how to reduce model latency when deploying Meta* Llama 3 on CPUsThe much-anticipated release of Meta’s third-generation batch of Llama is here, and I want to ensure you know how to deploy this state-of-the-art (SoTA) LLM optimally. In this tutorial, we will focus on performing weight-only-quantization (WOQ) to compress the 8B […]
  • Do I really sleep worse after drinking alcohol?Photo by Luke Chesser on UnsplashI first heard of N-of-1 trials in 2018 as a master’s student studying epidemiology. I was in my Intermediate Epidemiologic and Clinical Research Methods class, and we had a guest lecture from Dr. Eric Daza on N-of-1 study design. The N-of-1 study can be thought […]
  • How to do exploratory data analysis of a time seriesImage by author.One of the most popular types of data is a time series. Videos, images, pixels, signals, literally anything having a time component could be turned into it. Formally, a time series is a sequence of historical measurements of an observable variable at equal time intervals.In this […]
  • Part 1: Task-specific approaches for scenario forecastingImage by DALL-EIn product analytics, we quite often get "what-if" questions. Our teams are constantly inventing different ways to improve the product and want to understand how it can affect our KPI or other metrics.Let's look at some examples:Imagine we're in the fintech industry and facing new regulations requiring us to […]
  • Source: OpenArt SDXLHow to Implement Knowledge Graphs and Large Language Models (LLMs) Together at the Enterprise LevelA survey of the current methods of integrationLarge Language Models (LLMs) and Knowledge Graphs (KGs) are different ways of providing more people access to data. KGs use semantics to connect datasets via their meaning i.e. the entities they are representing. […]
  • Prompting ChatGPT and other chat-based language AI — and why you should (not) care about itForewordThis article sheds some light on the question of how to “talk” to Large Language Models (LLM) that are designed to interact in conversational ways, like ChatGPT, Claude and others, so that the answers you get from them are as useful as possible […]
  • It just might be worth you contributing tooContinue reading on Towards Data Science »
  • How can we use clustering techniques to combine and refactor a large number of disparate dashboards?Photo by Luke Chesser on UnsplashBackgroundOrganizations generate voluminous amounts of data on a daily basis. Dashboards are built to analyze this data and derive meaningful business insights as well as to track KPIs. Over time, we find ourselves with hundreds (or […]
  • Calculating the consumption based on meter data looks easy. However, complex situations can be challenging. Let’s see how we can solve…Continue reading on Towards Data Science »
  • Since the introduction of mass production in 1913 assembly lines are still mostly human — humanoids might change thisContinue reading on Towards Data Science »
  • A cheaper and faster unified fine-tuning techniqueImage generated with DALL-E 3 by authorORPO is a new exciting fine-tuning technique that combines the traditional supervised fine-tuning and preference alignment stages into a single process. This reduces the computational resources and time required for training. Moreover, empirical results demonstrate that ORPO outperforms other alignment methods on various model […]
  • Reflections from a humbling journey trying to find a job in 20232023 was a turbulent year for many job seekers. At least for me, it felt like quite a journey. Over the 11 months between January and November, I had 107 career-related conversations and applied to 80 positions, resulting in 2 offers (I took a “break” […]
  • Assigning code owners, hiring analytics engineers, and creating flywheelsContinue reading on Towards Data Science »

The following permanent links are provided because of the high submission rate of preprint and postprint manuscripts on the topics of Artificial Intelligence and Machine Learning. Be prepared to go down the rabbit hole!