Latest

A blog about AI, coding and tech

Published on
November 23, 2024
MLE-Bench: Benchmarking AI Agents in Machine Learning Engineering
AI-Benchmarking Machine-Learning-Engineering Kaggle-Competitions AIDE-Framework OpenAI-Research AI-Agents
MLE-Bench introduces a new benchmark to evaluate AI agents on real-world ML engineering tasks using Kaggle competitions. This post highlights key findings, including resource scaling effects, debugging challenges, and the performance of different agent frameworks.
Read more →
Published on
November 21, 2024
Understanding Scaled Dot-Product Attention in Neural Networks for Causal Discovery
Causal-Discovery Transformer-Neural-Network Attention-Mechanism Layer-Normalization Deep-Learning ADIA-Lab
This post explores a neural network model designed for the Causal Discovery Challenge organized by ADIA Lab. It highlights the use of a Transformer-based architecture with two layers of scaled dot-product attention and layer normalization, achieving a multi-balanced accuracy of 47.986%.
Read more →
Published on
October 21, 2023
Optimizing Language Model Prompts with Gradient-Based Tuning
Prompt-Optimization Reverse-Engineered-Prompt-Attack Large-Language-Model BLEU-score Transformers Weights-and-biases
I will provide my solution to the Trojan Detection Challenge 2023 (LLM Edition), a competition at NeurIPS 2023, which aims to advance our understanding and development of methods for detecting hidden functionality in large language models (LLMs). The primary task is to reverse-engineer the trigger prompts associated with a given target string.
Read more →
Published on
October 7, 2023
Evaluating Generated Text Quality using BLEU Score in Natural Language Processing
BLEU-score Natural-Language-Processing
The BLEU (Bilingual Evaluation Understudy) score is a metric used in Natural Language Processing (NLP) to evaluate the quality of text generated by machines, such as translations. It operates as a precision-based metric, examining the overlap of words or n-grams between the generated text and a reference text.
Read more →
Published on
October 6, 2023
How to Save and Load Timestamped Model Predictions in JSON Format to Google Drive from Google Colab
Google-Colab Google-Drive Python JSON
Working on Google Colab and storing files on Google Drive offers a streamlined workflow for managing and sharing data. In this post, I will guide you through the process of saving and reading model predictions in JSON format to Google Drive from Google Colab.
Read more →

All Posts →

Latest

MLE-Bench: Benchmarking AI Agents in Machine Learning Engineering

Understanding Scaled Dot-Product Attention in Neural Networks for Causal Discovery

Optimizing Language Model Prompts with Gradient-Based Tuning

Evaluating Generated Text Quality using BLEU Score in Natural Language Processing

How to Save and Load Timestamped Model Predictions in JSON Format to Google Drive from Google Colab