[D] Monthly Who's Hiring and Who wants to be Hired?
For Job Postings please use this templateHiring: [Location], Salary:[], [Remote | Relocation], [Full Time | Contract | Part Time] and [Brief overview, what you're looking for]For Those looking for jobs...
View Article[D] Self-Promotion Thread
Please post your personal projects, startups, product placements, collaboration needs, blogs etc.Please mention the payment and pricing requirements for products and services.Please do not post link...
View Article[D] Dynamic patch weighting in ViTs
Has anyone explored weighting non-overlapping patches in images using ViTs? The weights would be part of learnable parameters. For instance, the background patches are sometimes useless for an image...
View Article[P] [R] [D] I built a biomedical GNN + LLM pipeline (XplainMD) for...
Hi everyone,I'm an independent researcher and recently finished building XplainMD, an end-to-end explainable AI pipeline for biomedical knowledge graphs. It’s designed to predict and explain multiple...
View Article[D] Thoughts about ICASSP 2025
There were a lot of issues in visas so half of the poster boards were empty and in 2 sessions I attended were just videos playing. Why visa issues are there in conferences?I got my paper in CVPR 23 but...
View Article[D] Is research on discrete sampling / MCMC useful in industry? Feeling unsure.
Hi all,I’m currently a 2nd year PhD student in CS at a top 20 school. My research focuses on discrete sampling — designing MCMC-based algorithms for inference and generation over discrete spaces. While...
View ArticlePreviewing parquet directly from the OS [Discussion]
Hi!I've worked with Parquet for years at this point and it's my favorite format by far for data work.Nothing beats it. It compresses super well, fast as hell, maintains a schema, and doesn't corrupt...
View Article[P] A slop forensics toolkit for LLMs: computing over-represented lexical...
Releasing a few tools around LLM slop (over-represented words & phrases).It uses stylometric analysis to surface repetitive words & n-grams which occur more often in LLM output compared to...
View Article[P] B200 vs H100 Benchmarks: Early Tests Show Up to 57% Faster Training...
We at Lightly AI recently got early access to Nvidia B200 GPUs in Europe and ran some independent benchmarks comparing them against H100s, focusing on computer vision model training workloads. We...
View Article[D] Yann LeCun Auto-Regressive LLMs are Doomed
Yann LeCun at Josiah Willard Gibbs Lecture (2025)Not sure who else agrees, but I think Yann LeCun raises an interesting point here. Curious to hear other opinions on this!Lecture link:...
View Article[D] Need OpenSource TTS
So for the past week I'm working on developing a script for TTS. I require it to have multiple accents(only English) and to work on CPU and not GPU while keeping inference time as low as possible for...
View Article[Project] I created a crop generator that you might want to use.
Hello everyone, I created a python based crop generator that helps me with my image datasets.https://github.com/fegarza7/CropGeneratorI am training SDXL models to recognize features and concepts and I...
View Article[D] Anyone having experience working with GRF (Google Research Football)...
I'm basically facing severe issues while working with GRF. I was wondering if there was someone who's experienced and could guide me through them. submitted by /u/Anonymous_Life17 [link] [comments]
View Article[P] Building a Classifier for Time Series Forecasting
Hey everyone! I want to build a classifier that can automatically select the best forecasting model for a given univariate time series, based on which one results in the lowest MAPE (Mean Absolute...
View Article[P] Sub-2s cold starts for 13B+ LLMs + 50+ models per GPU — curious how...
We’re experimenting with an AI-native runtime that snapshot-loads LLMs (e.g., 13B–65B) in under 2–5 seconds and dynamically runs 50+ models per GPU — without keeping them always resident in...
View Article[P]We built an OS-like runtime for LLMs — curious if anyone else is doing...
We’re experimenting with an AI-native runtime that snapshot-loads LLMs (e.g., 13B–65B) in under 2–5 seconds and dynamically runs 50+ models per GPU — without keeping them always resident in...
View Article[D] Fine-tuned BART for product title & category normalization – still not...
Hi everyone, I’m building a price comparison website for products from various online stores in Moldova. I fine-tuned a BART model on a custom dataset of around 20,000 manually normalized product...
View Article[D] Adding new vocab tokens + fine-tuning LLMs to follow instructions is...
I've been experimenting on instruction-tuning LLMs and VLMs either with adding new specialized tokens to their corresponding tokenizer/processor, or not. The setup is typical: mask the...
View Article[P] A lightweight open-source model for generating manga
I posted this on r/StableDiffusion (see some nice discussion) and someone recommended it'd also fit here.TL;DRI finetuned Pixart-Sigma on 20 million manga images, and I'm making the model weights...
View Article[R] CAT: Circular-Convolutional Attention for Sub-Quadratic Transformers
https://arxiv.org/abs/2504.06704 CAT achieves O(NlogN) computations, requires fewer learnable parameters by streamlining fully-connected layers, and introduces no heavier operations, resulting in...
View Article[p] What if you could run 50+ LLMs per GPU — without keeping them in memory?
We’ve been experimenting with an AI-native runtime that snapshot-loads LLMs (13B–65B) in 2–5 seconds and dynamically runs 50+ models per GPU — without keeping them always resident in memory.Instead of...
View Article[D] Will traditional machine learning algorithms (such as neural nets,...
Dear Colleagues,I’m curious to hear from practitioners across industries about how large language models (LLMs) are reshaping your roles and evolving your workflows. Below, I’ve outlined a few emerging...
View Article[P] Simple standalone TFRecords dataset reader with Random Access and...
Hi, at work we are using tfrecords to store most of our datasets. However from time to time. we need to inspect the data to better undestand predictions of our models e.g. to find examples of...
View Article[P] Harmonic Activations: Periodic and Monotonic Function Extensions for...
Hey folks! I’ve recently released a preprint proposing a new family of activation functions designed for normalization-free deep networks. I’m an independent researcher working on expressive...
View Article[D] “Reasoning Models Don’t Always Say What They Think” – Anyone Got a Prompts?
Has anyone here tried replicating the results from the “Reasoning Models Don’t Always Say What They Think” paper using their own prompts? I'm working on reproducing these outputs. If you’ve...
View Article[R] d1: Scaling Reasoning in Diffusion Large Language Models via...
Recent large language models (LLMs) have demonstrated strong reasoning capabilities that benefits from online reinforcement learning (RL). These capabilities have primarily been demonstrated within the...
View Article[N] Google Open to let entreprises self host SOTA models
From a major player, this sounds like a big shift and would mostly offer enterprises an interesting perspective on data privacy. Mistral is already doing this a lot while OpenAI and Anthropic maintain...
View Article