1
0
Fork 0
mirror of https://github.com/mealie-recipes/mealie.git synced 2025-07-19 21:29:40 +02:00
mealie/dev/scripts/scrape_recipe.py

28 lines
865 B
Python
Raw Normal View History

"""
Helper script to download raw recipe data from a URL and dump it to disk.
The resulting files can be used as test input data.
"""
Release v0.1.0 Candidate (#85) * Changed uvicorn port to 80 * Changed port in docker-compose to match dockerfile * Readded environment variables in docker-compose * production image rework * Use opengraph metadata to make basic recipe cards when full recipe metadata is not available * fixed instrucitons on parse * add last_recipe * automated testing * roadmap update * Sqlite (#75) * file structure * auto-test * take 2 * refactor ap scheduler and startup process * fixed scraper error * database abstraction * database abstraction * port recipes over to new schema * meal migration * start settings migration * finale mongo port * backup improvements * migration imports to new DB structure * unused import cleanup * docs strings * settings and theme import logic * cleanup * fixed tinydb error * requirements * fuzzy search * remove scratch file * sqlalchemy models * improved search ui * recipe models almost done * sql modal population * del scratch * rewrite database model mixins * mostly grabage * recipe updates * working sqllite * remove old files and reorganize * final cleanup Co-authored-by: Hayden <hay-kot@pm.me> * Backup card (#78) * backup / import dialog * upgrade to new tag method * New import card * rename settings.py to app_config.py * migrate to poetry for development * fix failing test Co-authored-by: Hayden <hay-kot@pm.me> * added mkdocs to docker-compose * Translations (#72) * Translations + danish * changed back proxy target to use ENV * Resolved more merge conflicts * Removed test in translation * Documentation of translations * Updated translations * removed old packages Co-authored-by: Hayden <64056131+hay-kot@users.noreply.github.com> * fail to start bug fixes * feature: prep/cook/total time slots (#80) Co-authored-by: Hayden <hay-kot@pm.me> * missing bind attributes * Bug fixes (#81) * fix: url remains after succesful import * docs: changelog + update todos * arm image * arm compose * compose updates * update poetry * arm support Co-authored-by: Hayden <hay-kot@pm.me> * dockerfile hotfix * dockerfile hotfix * Version Release Final Touches (#84) * Remove slim * bug: opacity issues * bug: startup failure with no database * ci/cd on dev branch * formatting * v0.1.0 documentation Co-authored-by: Hayden <hay-kot@pm.me> * db init hotfix * bug: fix crash in mongo * fix mongo bug * fixed version notifier * finale changelog Co-authored-by: kentora <=> Co-authored-by: Hayden <hay-kot@pm.me> Co-authored-by: Richard Mitic <richard.h.mitic@gmail.com> Co-authored-by: kentora <kentora@kentora.dk>
2021-01-17 22:22:54 -09:00
import sys, json, pprint
import requests
import extruct
from scrape_schema_recipe import scrape_url
Release v0.1.0 Candidate (#85) * Changed uvicorn port to 80 * Changed port in docker-compose to match dockerfile * Readded environment variables in docker-compose * production image rework * Use opengraph metadata to make basic recipe cards when full recipe metadata is not available * fixed instrucitons on parse * add last_recipe * automated testing * roadmap update * Sqlite (#75) * file structure * auto-test * take 2 * refactor ap scheduler and startup process * fixed scraper error * database abstraction * database abstraction * port recipes over to new schema * meal migration * start settings migration * finale mongo port * backup improvements * migration imports to new DB structure * unused import cleanup * docs strings * settings and theme import logic * cleanup * fixed tinydb error * requirements * fuzzy search * remove scratch file * sqlalchemy models * improved search ui * recipe models almost done * sql modal population * del scratch * rewrite database model mixins * mostly grabage * recipe updates * working sqllite * remove old files and reorganize * final cleanup Co-authored-by: Hayden <hay-kot@pm.me> * Backup card (#78) * backup / import dialog * upgrade to new tag method * New import card * rename settings.py to app_config.py * migrate to poetry for development * fix failing test Co-authored-by: Hayden <hay-kot@pm.me> * added mkdocs to docker-compose * Translations (#72) * Translations + danish * changed back proxy target to use ENV * Resolved more merge conflicts * Removed test in translation * Documentation of translations * Updated translations * removed old packages Co-authored-by: Hayden <64056131+hay-kot@users.noreply.github.com> * fail to start bug fixes * feature: prep/cook/total time slots (#80) Co-authored-by: Hayden <hay-kot@pm.me> * missing bind attributes * Bug fixes (#81) * fix: url remains after succesful import * docs: changelog + update todos * arm image * arm compose * compose updates * update poetry * arm support Co-authored-by: Hayden <hay-kot@pm.me> * dockerfile hotfix * dockerfile hotfix * Version Release Final Touches (#84) * Remove slim * bug: opacity issues * bug: startup failure with no database * ci/cd on dev branch * formatting * v0.1.0 documentation Co-authored-by: Hayden <hay-kot@pm.me> * db init hotfix * bug: fix crash in mongo * fix mongo bug * fixed version notifier * finale changelog Co-authored-by: kentora <=> Co-authored-by: Hayden <hay-kot@pm.me> Co-authored-by: Richard Mitic <richard.h.mitic@gmail.com> Co-authored-by: kentora <kentora@kentora.dk>
2021-01-17 22:22:54 -09:00
from w3lib.html import get_base_url
for url in sys.argv[1:]:
try:
data = scrape_url(url)[0]
slug = list(filter(None, url.split("/")))[-1]
filename = f"{slug}.json"
with open(filename, "w") as f:
json.dump(data, f, indent=4, default=str)
print(f"Saved {filename}")
except Exception as e:
print(f"Error for {url}: {e}")
Release v0.1.0 Candidate (#85) * Changed uvicorn port to 80 * Changed port in docker-compose to match dockerfile * Readded environment variables in docker-compose * production image rework * Use opengraph metadata to make basic recipe cards when full recipe metadata is not available * fixed instrucitons on parse * add last_recipe * automated testing * roadmap update * Sqlite (#75) * file structure * auto-test * take 2 * refactor ap scheduler and startup process * fixed scraper error * database abstraction * database abstraction * port recipes over to new schema * meal migration * start settings migration * finale mongo port * backup improvements * migration imports to new DB structure * unused import cleanup * docs strings * settings and theme import logic * cleanup * fixed tinydb error * requirements * fuzzy search * remove scratch file * sqlalchemy models * improved search ui * recipe models almost done * sql modal population * del scratch * rewrite database model mixins * mostly grabage * recipe updates * working sqllite * remove old files and reorganize * final cleanup Co-authored-by: Hayden <hay-kot@pm.me> * Backup card (#78) * backup / import dialog * upgrade to new tag method * New import card * rename settings.py to app_config.py * migrate to poetry for development * fix failing test Co-authored-by: Hayden <hay-kot@pm.me> * added mkdocs to docker-compose * Translations (#72) * Translations + danish * changed back proxy target to use ENV * Resolved more merge conflicts * Removed test in translation * Documentation of translations * Updated translations * removed old packages Co-authored-by: Hayden <64056131+hay-kot@users.noreply.github.com> * fail to start bug fixes * feature: prep/cook/total time slots (#80) Co-authored-by: Hayden <hay-kot@pm.me> * missing bind attributes * Bug fixes (#81) * fix: url remains after succesful import * docs: changelog + update todos * arm image * arm compose * compose updates * update poetry * arm support Co-authored-by: Hayden <hay-kot@pm.me> * dockerfile hotfix * dockerfile hotfix * Version Release Final Touches (#84) * Remove slim * bug: opacity issues * bug: startup failure with no database * ci/cd on dev branch * formatting * v0.1.0 documentation Co-authored-by: Hayden <hay-kot@pm.me> * db init hotfix * bug: fix crash in mongo * fix mongo bug * fixed version notifier * finale changelog Co-authored-by: kentora <=> Co-authored-by: Hayden <hay-kot@pm.me> Co-authored-by: Richard Mitic <richard.h.mitic@gmail.com> Co-authored-by: kentora <kentora@kentora.dk>
2021-01-17 22:22:54 -09:00
print("Trying extruct instead")
pp = pprint.PrettyPrinter(indent=2)
r = requests.get(url)
base_url = get_base_url(r.text, r.url)
data = extruct.extract(r.text, base_url=base_url)
pp.pprint(data)