Message boards : Questions and problems : Please Help Me if You Can
Message board moderation
Author | Message |
---|---|
Send message Joined: 3 Nov 21 Posts: 3 |
Greetings. I need to set up a script to fetch credits informations from some users in a team on R@H and then reward points in a "game" based proportionally on how many credits each participant scored on their Rosetta@Home activities. This is 100% non profitable work from me trying to bring more people to help and on the good spirit of helping I have started trying to create a webscraper but honestly I'm terrible since I'm no more than an amateur dev. I don't even know if this code generates payout(rewarding) data properly. Does anyone know if there is a rosetta@home api that could help or - even better - a webscraper for the rosetta@home website? Something similar to this project here would be great: https://github.com/stuckatsixpm/fah_scraper Here is what I have managed to write (probably not gonna work or is gonna work badly) as the webscrapper for the rosetta@home website but I don't think it would even work. If anyone have some minutes of spare time and can help me out I would be forever thankful: " import argparse import logging import sqlite3 import datetime import json from collections import namedtuple import requests from bs4 import BeautifulSoup UserStats = namedtuple("UserStats", ["name", "credit", "recent_average_credit"]) def init_db(db_file="./folding_data.db"): logging.info("Initializing database.") con = sqlite3.connect(db_file) cur = con.cursor() cur.execute( "CREATE TABLE IF NOT EXISTS folding_data (name TEXT PRIMARY KEY, credit INTEGER, recent_average_credit INTEGER, credit_delta INTEGER, date_delta INTEGER, date INTEGER)" ) con.commit() return con def fetch_stats(teamid=30157): url = "https://boinc.bakerlab.org/rosetta/team_display.php?teamid={}".format(teamid) r = requests.get(url) if r.ok: return r.text else: raise Exception("Failed to fetch: {}".format(url)) def log_stats(db, user_stats): logging.debug("logging stats for user: {}".format(user_stats.name)) cur = db.cursor() prev_credit = cur.execute( "SELECT credit FROM folding_data WHERE name == '{}'".format(user_stats.name) ).fetchone() prev_date = cur.execute( "SELECT date FROM folding_data WHERE name == '{}'".format(user_stats.name) ).fetchone() if prev_credit: prev_credit = prev_credit[0] else: prev_credit = 0 if prev_date: prev_date = prev_date[0] else: prev_date = 0 credit_delta = user_stats.credit - prev_credit date = int(datetime.datetime.utcnow().timestamp()) date_delta = date - prev_date cur.execute("DELETE FROM folding_data WHERE name == '{}'".format(user_stats.name)) cur.execute( "INSERT INTO folding_data VALUES ('{}', {}, {}, {}, {}, {})".format( user_stats.name, user_stats.credit, user_stats.recent_average_credit, credit_delta, date_delta, int(datetime.datetime.utcnow().timestamp()), ) ) def create_snapshot(db, teamid=30157): logging.info("Creating snapshot.") stats = fetch_stats(team=team) soup = BeautifulSoup(stats, "html.parser") members = soup.find_all("table", {"class": "members"}) rows = members[0].find_all("tr") for row in rows[1:]: try: _, _, name, credit, recent_average_credit = [item.text for item in row.find_all("td")] user_stats = UserStats(name, int(credit), int(recent_average_credit)) log_stats(db, user_stats) except Exception as e: logging.error("Failed to log data for user: {}".format(str(e))) db.commit() def save_snapshot(db, output="./folding_data.json"): logging.info("Saving snapshot as JSON.") cur = db.cursor() names = cur.execute("SELECT name FROM folding_data") snapshot = {} for name in names.fetchall(): name = name[0] data = cur.execute( "SELECT credit_delta, date_delta FROM folding_data WHERE name = '{}'".format( name ) ) data = data.fetchall() credit = data[0][0] time = data[0][1] if credit > 0: snapshot[name] = { "credit": credit, "time": time, } with open(output, "w") as outfile: outfile.write(json.dumps(snapshot)) def main(): parser = argparse.ArgumentParser() parser.add_argument("team", help="The team ID to scrape. Ex. 234980") parser.add_argument( "--db-file", default="./folding_data.db", help="The path to the local stats DB. This DB will be created if it doesn't exist.", ) parser.add_argument( "--json-file", default="./folding_data.json", help="The path to the output JSON file containing the credit and time deltas.", ) parser.add_argument("--verbose", action="store_true", help="Print debug logs.") args = parser.parse_args() if args.verbose: logging.basicConfig(level=logging.DEBUG) else: logging.basicConfig(level=logging.INFO) db = init_db(db_file=args.db_file) create_snapshot(db, team=args.team) save_snapshot(db, output=args.json_file) if __name__ == "__main__": main() " |
![]() Send message Joined: 29 Aug 05 Posts: 15618 ![]() |
a webscraper for the rosetta@home websiteIf everyone goes scraping the webpages of the projects, the webpages of the projects go down. Most projects will banish you if they find you scraping their pages without asking permission. All BOINC projects export the statistics. Rosetta does that via https://boinc.bakerlab.org/rosetta/stats/ (notice, there's no .html or .php extension to this link!), where you can download their data daily and search through that, locally. |
Send message Joined: 3 Nov 21 Posts: 3 |
I did not know that so thanks for the info. As I said I'm an amateur on the matter but I need to get the data so I asked devs and basically they unanimously said I would have to create a webscrapper. So how cant I get the data by those rosetta/stats/? Would I need to create a script to download the user.gz list daily and pick up the info one by one? Would you be able to help me? |
![]() Send message Joined: 29 Aug 05 Posts: 15618 ![]() |
So how cant I get the data by those rosetta/stats/? Would I need to create a script to download the user.gz list daily and pick up the info one by one? Would you be able to help me?I'm not in the script writing business, so cannot help you there. Perhaps someone else here can. Or you can find something on the interwebs. |
Send message Joined: 5 Oct 06 Posts: 5149 ![]() |
The exported stats are in a compressed XML format. Your favorite database probably has an XML import tool. Make sure you have plenty of disk space available. |
![]() Send message Joined: 28 Jun 10 Posts: 2822 ![]() |
Make sure you have plenty of disk space available.And Richard does mean plenty! |
![]() Send message Joined: 29 Aug 05 Posts: 15618 ![]() |
It's reasonably small, I downloaded and unpacked it: 412,003KB. |
Send message Joined: 3 Nov 21 Posts: 3 |
I'm not in the script writing business, so cannot help you there. Perhaps someone else here can. Or you can find something on the interwebs. Thanks. I asked around here because it would definitely bring many new users from the project into using Rosetta @ Home and consequently to BOINC. Hope a good soul with some time to spare appears that is able to help me with the script to download the user list daily and fetch the credit informations there for rewarding. |
Copyright © 2025 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.