Generative modelling in latent space šŸ¤–

🦾Plus: šŸ’» Wikipedia gives away data to AI developers

Hey folks! Let’s get into Big Data and AI craziness…

In today's edition: What's Shaping the Future of Data?

  • ā˜ļøGreen Cloud Computing – The Sustainable Way to Use the Cloud

  • šŸ“˜An Introduction to Stochastic Calculus

  • šŸš€A Field Guide to Rapidly Improving AI Products

  • šŸ”Migrating a large codebase to Polars

  • šŸ’» Wikipedia gives away data to AI developers

  • šŸ“ŗ Watch Sam Altman at TED 2025

  • šŸ”ChatGPT memory now personalizes web searches

  • šŸŽ“ One AI Premium plan offered free to US students

  • šŸ’” AI Tutorial:Turn your Google Sheets into a website with no-code

  • šŸ¤– AI Tools and Data Tools to checkout

Most contemporary generative models of images, sound and video do not operate directly on pixels or waveforms. They consist of two stages: first, a compact, higher-level latent representation is extracted, and then an iterative generative process operates on this representation instead. How does this work, and why is this approach so popular?

Did you know that the World Wide Web was born in Geneva, Switzerland? Indeed, the first version of the Internet cropped up at CERN in 1989. Today the world-renowned center is home to the largest particle accelerator and to the CERN Science Gateway – a must-see hub for science enthusiasts that features hands-on exhibits, immersive virtual reality experiences, and live demonstrations.

Energy-efficient solutions are necessary to minimize the impact of cloud computing on the environment. Green cloud computing, also known as green information technology, is a potential solution to aide in the reduction of energy consumption.

This post is about stochastic calculus, an extension of regular calculus to stochastic processes. It's not immediately obvious but the rigour needed to properly understand some of the key ideas requires going back to the measure theoretic definition of probability theory, so that's where I start in the background.

In this post, I’ll show you exactly how these successful teams operate. While every situation is unique, you’ll see patterns that apply regardless of your domain or team size. Let’s start by examining the most common mistake I see teams make: one that derails AI projects before they even begin…

In this community talk, Jeroen Janssens and Thijs Nieuwdorp share their experiences and best practices for migrating a large pandas codebase to Polars at one of the largest utility companies in the Netherlands. By implementing Polars, they achieved a 98% cost reduction. Watch the video to learn how you can start migrating your own codebase.

HubSpot’s AI-powered ecosystem presents a global opportunity projected to reach $10.2 billion by 2028. To capitalize on that growth potential, we are opening our platform more, starting with expanded APIs, customizable app UI, and tools that better support a unified data strategy.

šŸ‘Øā€šŸ’» Data Tools, Libraries

migrate-ai
A CLI tool designed to assist in migrating code from various frameworks and languages, such as Vue 2 to Vue 3 or JavaScript to TypeScript. It uses OpenAI to help perform these migrations and includes features for formatting code and managing configurations.

lsp-ai
An open-source language server that serves as a backend for AI-powered functionality, designed to assist and empower software engineers, not replace them.

Omakub
Opinionated Ubuntu Setup.

AI News:

As part of Ai2’s commitment to openness, and to empower open exploration of these questions, today we release DataDecide—a suite of models we pretrain on 25 corpora with differing sources, deduplication, and filtering up to 100B tokens, over 14 different model sizes ranging from 4M parameters up to 1B parameters (more than 30k model checkpoints in total).

Most hearing aids have one processor. These bad boys have two. They process speech and noise separately. What does this mean? It means speech gets clearer and crisper – more than ever before. Conversations and listening become effortless. Oh, and they’re so tiny, they’re practically invisible. No wonder over 425,000 customers love them. 

Wikipedia is trying to reduce the strain caused by AI bots scraping its content by releasing a machine-learning-friendly dataset in partnership with Kaggle. This new beta dataset, available in English and French, offers structured, machine-readable Wikipedia content—such as summaries, infoboxes, and article sections (excluding references and media files)—and is openly licensed.

OpenAI is upgrading ChatGPT’s ā€œmemoryā€ again. In a changelog and support pages on OpenAI’s website Thursday, the company quietly announced ā€œMemory with Search,ā€ a feature that lets ChatGPT draw on memories — details from past conversations, such as your favorite foods — to inform queries when the bot searches the web.

Google is offering US college students free access to its $20/month One AI Premium plan until June 30, 2026. The plan includes 2TB cloud storage and tools like Gemini Advanced (powered by Gemini 2.5 Pro), NotebookLM Plus, the Veo 2 text-to-video model, and Whisk for mixed media prompts. Students must register with a .edu email by June 30, 2025.

At TED 2025, OpenAI CEO Sam Altman discussed the company’s explosive growth to 800 million weekly users, the infrastructure challenges caused by high demand, and the growing scrutiny surrounding AI’s societal impact. He acknowledged OpenAI’s evolution from a nonprofit to a $300 billion tech giant and addressed criticisms about power consolidation and safety risks, especially with autonomous AI agents.

Through Squarespace’s cutting-edge features that combine automation, design presets, creative guidance, and generative AI, Design Intelligence makes it easy to build a beautiful and impactful website. With just a few pieces of information, Blueprint AI generates an entire website customized based off your brand’s goals, name, and personality. It’s AI speed, with Squarespace’s 20+ years of design expertise in website building. 

AI Tutorial

Turn your Google Sheets into a website with no-code

  1. Create a Google Sheet and fill it with your content: names, descriptions, prices, etc.

  2. Go to the SpreadSimple website and sign in.

  3. In the dashboard, click the + button.

  4. Copy your Google Sheet link and paste it into the designated field.

*Note: Make sure your Google Sheet is set to public view so that SpreadSimple can access it to read and display the data. 

  1. Click Continue, and within a few moments, a website will be created for you.

  2. You can now customize the design, the content representation, change the domain and other settings.

This guide is your go-to resource for streamlining payments, improving cash flow, and keeping your business running smoothly.

What’s inside:

āœ”ļø An actionable 8-step framework to create a seamless payment process

āœ”ļø Expert strategies to reduce late payments and enhance your professional image

A well-structured payment system leads to smoother operations, happier clients, and long-term financial success.

šŸ”„Top AI tools to increase productivity: 

  1. DOO: The leap in your team’s evolution. With DOO, your team doesn’t just grow in numbers but in capabilities too

  2. Interview Solver is an AI Copilot that helps you pass your live coding and system design interviews.

  3. Language Atlas is a freemium platform where people can learn languages with AI

  4. BlogFox is an AI-powered blogging tool that simplifies the creation of high-quality, SEO-optimized content.

  5. ProJourney allows you to use Midjourney without having to go through Discord.

  6. Moemate is an AI Studio which lets anyone create and chat with AI characters

View our database of all the best AI tools for your needs: aitoolsup.com

Have cool resources to share? Submit AI tool

A.I. Generated Image of the Day

šŸ‘€ Heralds of the Latent Empyrean

AI Tools Up NewsletterReceive a weekly email with updates on new AI tools, helpful prompts, and the latest AI developments. Join over 10000 + professionals from Google, OpenAI, Notion, Apple, and more.

SPONSOR US

Get your product in front of Big Data & AI enthusiasts

Our newsletter is read by thousands of tech professionals, investors, engineers, managers, and business owners around the world.

Interested in Sponsoring the Big Data News Weekly Newsletter?Get in touch today

What did you think of today's email?

Your feedback helps me create better emails for you!

Login or Subscribe to participate in polls.