this post was submitted on 28 Apr 2025

41 points (97.7% liked)

LocalLLaMA

2939 readers

15 users here now

Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.

Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.

As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.

founded 2 years ago

MODERATORS

SkySyrup@sh.itjust.works

pax@sh.itjust.works

noneabove1182@sh.itjust.works

Smokeydope@lemmy.world

MonsterBug@sh.itjust.works

Qwen3 officially released (qwenlm.github.io)

submitted 1 week ago by General_Effort@lemmy.world to c/localllama@sh.itjust.works

18 comments fedilink hide all child comments

https://huggingface.co/collections/Qwen/qwen3-67dd247413f0e2e4f653967f

https://github.com/QwenLM/Qwen3

https://modelscope.cn/collections/Qwen3-9743180bdc6b48

https://discord.gg/yPEP2vHTu4

https://www.kaggle.com/models/qwen-lm/qwen-3

you are viewing a single comment's thread
view the rest of the comments

[–] rcbrk@lemmy.ml 3 points 6 days ago

Qwen3-30B-A3B-Q6_K:

Amt: 8003, Process:10.67s (17.52T/s), Generate:1408.00s (5.68T/s), Total: 1418.67s

Qwen3-30B-A3B-Q6_K - Thoughts (29081 characters)

Okay, I need to write a Python script that sorts all .gguf files in the current directory into subdirectories based on their publisher. The publishers are determined from URLs stored in files named dl{YYYYMMDD}.txt. Each line in these files has a URL, and some might be commented out with a #. The publisher is the first part after the domain in the URL. Let me think through how to approach this.

First, I need to find all the .gguf files in the current directory. Then, for each of them, I need to determine their publisher. But how do I get the URLs associated with each .gguf file? Wait, the problem says that the URLs are in the dl{YYYYMMDD}.txt files. But how do I know which URL corresponds to which .gguf file? Hmm, maybe the URLs in the txt files are the ones that were used to download the .gguf files. So each .gguf file was downloaded via a URL listed in one of these txt files. But how do I associate each .gguf file with its URL?

Wait, the problem statement says that the URLs are in the files named dl{YYYYMMDD}.txt. But the .gguf files themselves don't have any metadata linking them to the URLs. So maybe the assumption is that each .gguf file was downloaded using one of the URLs in these txt files. But how to determine which URL corresponds to which .gguf file? That's a problem. Unless there's some naming convention I'm missing.

Wait, maybe the URLs in the txt files are the ones that were used to download the .gguf files. For example, if a URL is "https://huggingface.co/unsloth/.../file.gguf", then the corresponding .gguf file would be named something like "file.gguf". But the problem says that the .gguf files are in the current directory. So perhaps the script needs to look through all the URLs in the txt files, extract the publisher, and then for each .gguf file, find which URL it came from. But without any direct link, how?

Wait, maybe the .gguf files are named after the file part of the URL. For example, the URL ends with "Qwen3-30B-A3B-Q6_K.gguf?download=true", so the .gguf file would be named "Qwen3-30B-A3B-Q6_K.gguf". If that's the case, then for each .gguf file, we can check if there's a URL in the txt files that has the same filename. But that might not be reliable. Alternatively, maybe the script is supposed to process all the URLs in the txt files, extract their publishers, and then for each .gguf file, determine which publisher it belongs to based on the URL that was used to download it. However, without knowing which URL corresponds to which .gguf file, this seems impossible.

Wait, maybe the problem is that the user expects that each .gguf file was downloaded via one of the URLs in the txt files. So the script should process all the URLs in the txt files, extract the publisher, and then for each .gguf file, find which URL (if any) it corresponds to. But how?

Alternatively, perhaps the script is supposed to take each .gguf file, and for each of them, look through all the URLs in the txt files to find which one matches. But how? For example, if the .gguf file is named "Qwen3-30B-A3B-Q6_K.gguf", then the URL would have that as the filename. So perhaps the script can check for each .gguf file, whether there's a URL in the txt files that has that filename. But that's a bit of a stretch. However, given the problem statement, perhaps that's the intended approach.

Alternatively, maybe the problem is that the URLs in the txt files are the ones that were used to download the .gguf files, and each .gguf file's name is derived from the URL. So the script can process all the URLs, extract the publisher, and then for each .gguf file, determine which URL it came from by matching the filename. But this requires that the .gguf files are named exactly as the file part of the URL.

Assuming that's the case, here's the plan:

Find all .gguf files in the current directory.
For each .gguf file, extract the filename (without the path).
Search through all the URLs in the dl{YYYYMMDD}.txt files to find which URL has a filename that matches the .gguf file's name.
Once found, extract the publisher from that URL.
Create a subdirectory under the publisher's name (if it doesn't exist) and move the .gguf file there.

But how to handle cases where multiple URLs have the same filename? Or if no URL matches? The problem statement says that the URLs are in the txt files, but maybe not all .gguf files are accounted for. But the script should process as many as possible.

Alternatively, perhaps the script is supposed to process all URLs in the txt files, extract the publisher, and then create directories for each publisher, but how does that relate to the .gguf files? Maybe the .gguf files are already in the directory, and the script is supposed to move them into the publisher's directory based on the URLs in the txt files. But without knowing which .gguf file corresponds to which URL, this seems impossible.

Wait, maybe the problem is that the .gguf files were downloaded using the URLs in the txt files, and the script needs to associate each .gguf file with the URL that was used to download it. But how? Unless there's some other metadata. But the problem statement doesn't mention that.

This is a bit confusing. Let me re-read the problem statement.

The user says: "The URLs are in files in the current directory named in the format dl{YYYYMMDD}.txt, with one url per line. Some url lines may be commented out with a single # character, but they are still a valid source of info. An example url is ... and the publisher is always the first node after the domain, in this case unsloth."

So the URLs are stored in th [...trimmed because lemmy won't handle >10,000 chars]

Qwen3-30B-A3B-Q6_K - Answer

Here's a well-documented Python script that fulfills the requirements of sorting .gguf files into directories based on their source publishers, as extracted from URLs stored in dl{YYYYMMDD}.txt files.

✅ What This Script Does

Finds all .gguf files in the current working directory.
Reads all dl{YYYYMMDD}.txt files in the current directory.
Parses each URL in those files, including lines that are commented out (i.e., starting with #) but still considered valid.
Extracts the file name from the URL path.
Checks if the file exists in the current directory.
Determines the publisher by parsing the URL path.
Creates a directory for the publisher if it doesn't exist.
Moves the .gguf file into the corresponding publisher directory.

📁 Sample Input (from `dl20231010.txt`)

https://huggingface.co/unsloth/Qwen3-30B-A3B-GGUF/resolve/main/Qwen3-30B-A3B-Q6_K.gguf?download=true
# https://huggingface.co/another_publisher/model.gguf

📦 Python Script

import os
import shutil
from urllib.parse import urlparse

def main():
    # Step 1: Find all dl{YYYYMMDD}.txt files in the current directory
    txt_files = [f for f in os.listdir() if f.startswith('dl') and f.endswith('.txt')]

    for txt_file in txt_files:
        print(f"Processing {txt_file}...")
        try:
            with open(txt_file, 'r', encoding='utf-8') as f:
                lines = f.readlines()
        except Exception as e:
            print(f"Error reading {txt_file}: {e}")
            continue

        for line in lines:
            line = line.strip()

            # Step 2: Remove any leading '#' and whitespace
            if line.startswith('#'):
                line = line[1:].strip()
            if not line:
                continue

            try:
                # Step 3: Parse the URL and extract the filename
                parsed = urlparse(line)
                path = parsed.path
                filename = os.path.basename(path)

                # Skip non-.gguf files
                if not filename.endswith('.gguf'):
                    continue

                # Step 4: Check if the file exists in the current directory
                if not os.path.exists(filename):
                    print(f"File '{filename}' not found for URL '{line}'")
                    continue

                # Step 5: Extract publisher from the URL path
                path_parts = parsed.path.split('/')
                if len(path_parts) < 3:
                    print(f"Invalid URL format: '{line}'")
                    continue
                publisher = path_parts[1]

                # Step 6: Create the publisher directory if it doesn't exist
                publisher_dir = os.path.join(os.getcwd(), publisher)
                os.makedirs(publisher_dir, exist_ok=True)

                # Step 7: Move the file
                src_path = os.path.join(os.getcwd(), filename)
                dst_path = os.path.join(publisher_dir, filename)
                shutil.move(src_path, dst_path)
                print(f"Moved '{filename}' to '{publisher_dir}'")

            except Exception as e:
                print(f"Error processing URL '{line}': {e}")

if __name__ == "__main__":
    main()

📝 Notes

The script handles both commented and un-commented lines in dl{YYYYMMDD}.txt.
It assumes that .gguf files are named exactly as per the URL's file part (e.g., Qwen3-30B-A3B-Q6_K.gguf).
If a .gguf file is referenced by multiple URLs, it will be moved only once (after that, it will no longer exist in the current directory).
The script uses shutil.move, which is safe and works across file systems.

🧪 Example Output

Processing dl20231010.txt...
Moved 'Qwen3-30B-A3B-Q6_K.gguf' to 'unsloth'
File 'model.gguf' not found for URL 'https://huggingface.co/another_publisher/model.gguf'

This script is designed to be robust and user-friendly, ensuring all .gguf files are properly organized based on their source publishers.