awesome-fastapi-projects/app/scrape.py

277 lines
9.1 KiB
Python
Raw Normal View History

Web App (#25) * Set up the web project dependencies - Add linters, pre-commit, and GitHub Actions - Add a Makefile - Add a pyproject.toml * Fix pyupgrade job - Remove continue_on_error everywhere * Remove old code * Rename a GithubActions job * Change README * Adjust pre-commit and GitHub actions * Add tables and set up alembic * Set up tests * Extend tests * Add coverage config * Adjust the GithubActions workflow * Fix GithubActions workflow * Try fixing pyproject-fmt config * Fix formatting of pyproject.toml * Fix formatting of pyproject.toml * Add coverage report * Test listing the repositories * Add a working prototype of SourceGraph client * Add parsing of the SourceGraph SSE data * Fix tests * Ged rid of packages replaced by ruff * Fix waits in the SourceGraph client * Refactor the models and add a mapper - A new mapper allows to create database repositories from the SourceGraph data * Add mypy * Try fixing mypy action * Remove redundant configs * Exclude tests from type checking * Fix mypy pre-commit and GitHub action * Ignore factories * Make upserting possible for source graph data * Add logic for parsing the dependencies and populating the database * Add a database and a cron GitHub Action job * Try manually trigger a workflow * Bring back the old config * Add ReadTimeout for errors to retry for in SourceGraph client * Add typer * Adjust the docstrings * Update the database * Refactor and optimize scraping and dependencies parsing * Make scraping run on push for now * Add a unique constraint for the repo url and source graph repo id * Change the index columns in on_conflict statement for repo creation * Optimize dependencies parsing - Do not parse dependencies for a repo when revision did not change * Scraped repositories from Source Graph * Refactor scraping * Set up frontend * Scraped repositories from Source Graph * Add TODOs * Skip scraping when testing * Fix a test with updating the repos * Scraped repositories from Source Graph * Add some more TODOs * Scraped repositories from Source Graph * Add some more TODO comments * Add chadcn/ui * Scraped repositories from Source Graph * Create index.json * Scraped repositories from Source Graph * Add a draft of data table and display all the repos * Scraped repositories from Source Graph * Implement stars badges and description with overflow * Format the links to Github repos * Fix link clicking * Scraped repositories from Source Graph * Add simple pagination and stars column sorting * Scraped repositories from Source Graph * Implement basic searching * Scraped repositories from Source Graph * Implement a multiselect for dependencies * Scraped repositories from Source Graph * Implement actual filtering by dependencies * Scraped repositories from Source Graph * Add a workflow to deploy nextjs on github pages * Try fixing the deployment job * Enable static exports for app router * Fix uploading arifacts for nextjs job * Set base path to properly load JS and CSS * Fix the base path * Scraped repositories from Source Graph * Add header * Remove language version * Scraped repositories from Source Graph * Add some more TODOs * Scraped repositories from Source Graph * Adjust the pre-commit config * Fix pipelines * Scraped repositories from Source Graph * Add a footer * Create the indexes * Scraped repositories from Source Graph * Add more TODOs * Introduce minor footer adjustments * Adjust the scraping actions * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Implement query params state * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Do not commit query state on unmount * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Hopefully fix query states and multiselect input * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Extend the Makefile * Resolve most of TODOs * Resolve the conflicts with anyio version, bring back httpx * Adjust the Makefile and README.md * Fix a typo in README.md * Adjust readme * Fix the Makefile * Fix some stuff * Make some adjustments * Possibly fix failing scraping jobs * Load the repo project URL from env --------- Co-authored-by: vladfedoriuk <vladfedoriuk@users.noreply.github.com> Co-authored-by: Vladyslav Fedoriuk <vladyslav.fedoriuk@deployed.pl>
2023-10-28 19:39:02 +00:00
"""The logic for scraping the source graph data processing it."""
import asyncio
import sqlalchemy.dialects.sqlite
import typer
from loguru import logger
from sqlalchemy.ext.asyncio import AsyncSession
2023-11-19 22:17:57 +00:00
from app.database import Dependency, Repo, RepoDependency, async_session_maker
Web App (#25) * Set up the web project dependencies - Add linters, pre-commit, and GitHub Actions - Add a Makefile - Add a pyproject.toml * Fix pyupgrade job - Remove continue_on_error everywhere * Remove old code * Rename a GithubActions job * Change README * Adjust pre-commit and GitHub actions * Add tables and set up alembic * Set up tests * Extend tests * Add coverage config * Adjust the GithubActions workflow * Fix GithubActions workflow * Try fixing pyproject-fmt config * Fix formatting of pyproject.toml * Fix formatting of pyproject.toml * Add coverage report * Test listing the repositories * Add a working prototype of SourceGraph client * Add parsing of the SourceGraph SSE data * Fix tests * Ged rid of packages replaced by ruff * Fix waits in the SourceGraph client * Refactor the models and add a mapper - A new mapper allows to create database repositories from the SourceGraph data * Add mypy * Try fixing mypy action * Remove redundant configs * Exclude tests from type checking * Fix mypy pre-commit and GitHub action * Ignore factories * Make upserting possible for source graph data * Add logic for parsing the dependencies and populating the database * Add a database and a cron GitHub Action job * Try manually trigger a workflow * Bring back the old config * Add ReadTimeout for errors to retry for in SourceGraph client * Add typer * Adjust the docstrings * Update the database * Refactor and optimize scraping and dependencies parsing * Make scraping run on push for now * Add a unique constraint for the repo url and source graph repo id * Change the index columns in on_conflict statement for repo creation * Optimize dependencies parsing - Do not parse dependencies for a repo when revision did not change * Scraped repositories from Source Graph * Refactor scraping * Set up frontend * Scraped repositories from Source Graph * Add TODOs * Skip scraping when testing * Fix a test with updating the repos * Scraped repositories from Source Graph * Add some more TODOs * Scraped repositories from Source Graph * Add some more TODO comments * Add chadcn/ui * Scraped repositories from Source Graph * Create index.json * Scraped repositories from Source Graph * Add a draft of data table and display all the repos * Scraped repositories from Source Graph * Implement stars badges and description with overflow * Format the links to Github repos * Fix link clicking * Scraped repositories from Source Graph * Add simple pagination and stars column sorting * Scraped repositories from Source Graph * Implement basic searching * Scraped repositories from Source Graph * Implement a multiselect for dependencies * Scraped repositories from Source Graph * Implement actual filtering by dependencies * Scraped repositories from Source Graph * Add a workflow to deploy nextjs on github pages * Try fixing the deployment job * Enable static exports for app router * Fix uploading arifacts for nextjs job * Set base path to properly load JS and CSS * Fix the base path * Scraped repositories from Source Graph * Add header * Remove language version * Scraped repositories from Source Graph * Add some more TODOs * Scraped repositories from Source Graph * Adjust the pre-commit config * Fix pipelines * Scraped repositories from Source Graph * Add a footer * Create the indexes * Scraped repositories from Source Graph * Add more TODOs * Introduce minor footer adjustments * Adjust the scraping actions * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Implement query params state * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Do not commit query state on unmount * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Hopefully fix query states and multiselect input * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Extend the Makefile * Resolve most of TODOs * Resolve the conflicts with anyio version, bring back httpx * Adjust the Makefile and README.md * Fix a typo in README.md * Adjust readme * Fix the Makefile * Fix some stuff * Make some adjustments * Possibly fix failing scraping jobs * Load the repo project URL from env --------- Co-authored-by: vladfedoriuk <vladfedoriuk@users.noreply.github.com> Co-authored-by: Vladyslav Fedoriuk <vladyslav.fedoriuk@deployed.pl>
2023-10-28 19:39:02 +00:00
from app.dependencies import acquire_dependencies_data_for_repository
from app.source_graph.client import AsyncSourceGraphSSEClient
from app.source_graph.mapper import create_or_update_repos_from_source_graph_repos_data
2023-11-19 22:17:57 +00:00
from app.source_graph.models import SourceGraphRepoData
Web App (#25) * Set up the web project dependencies - Add linters, pre-commit, and GitHub Actions - Add a Makefile - Add a pyproject.toml * Fix pyupgrade job - Remove continue_on_error everywhere * Remove old code * Rename a GithubActions job * Change README * Adjust pre-commit and GitHub actions * Add tables and set up alembic * Set up tests * Extend tests * Add coverage config * Adjust the GithubActions workflow * Fix GithubActions workflow * Try fixing pyproject-fmt config * Fix formatting of pyproject.toml * Fix formatting of pyproject.toml * Add coverage report * Test listing the repositories * Add a working prototype of SourceGraph client * Add parsing of the SourceGraph SSE data * Fix tests * Ged rid of packages replaced by ruff * Fix waits in the SourceGraph client * Refactor the models and add a mapper - A new mapper allows to create database repositories from the SourceGraph data * Add mypy * Try fixing mypy action * Remove redundant configs * Exclude tests from type checking * Fix mypy pre-commit and GitHub action * Ignore factories * Make upserting possible for source graph data * Add logic for parsing the dependencies and populating the database * Add a database and a cron GitHub Action job * Try manually trigger a workflow * Bring back the old config * Add ReadTimeout for errors to retry for in SourceGraph client * Add typer * Adjust the docstrings * Update the database * Refactor and optimize scraping and dependencies parsing * Make scraping run on push for now * Add a unique constraint for the repo url and source graph repo id * Change the index columns in on_conflict statement for repo creation * Optimize dependencies parsing - Do not parse dependencies for a repo when revision did not change * Scraped repositories from Source Graph * Refactor scraping * Set up frontend * Scraped repositories from Source Graph * Add TODOs * Skip scraping when testing * Fix a test with updating the repos * Scraped repositories from Source Graph * Add some more TODOs * Scraped repositories from Source Graph * Add some more TODO comments * Add chadcn/ui * Scraped repositories from Source Graph * Create index.json * Scraped repositories from Source Graph * Add a draft of data table and display all the repos * Scraped repositories from Source Graph * Implement stars badges and description with overflow * Format the links to Github repos * Fix link clicking * Scraped repositories from Source Graph * Add simple pagination and stars column sorting * Scraped repositories from Source Graph * Implement basic searching * Scraped repositories from Source Graph * Implement a multiselect for dependencies * Scraped repositories from Source Graph * Implement actual filtering by dependencies * Scraped repositories from Source Graph * Add a workflow to deploy nextjs on github pages * Try fixing the deployment job * Enable static exports for app router * Fix uploading arifacts for nextjs job * Set base path to properly load JS and CSS * Fix the base path * Scraped repositories from Source Graph * Add header * Remove language version * Scraped repositories from Source Graph * Add some more TODOs * Scraped repositories from Source Graph * Adjust the pre-commit config * Fix pipelines * Scraped repositories from Source Graph * Add a footer * Create the indexes * Scraped repositories from Source Graph * Add more TODOs * Introduce minor footer adjustments * Adjust the scraping actions * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Implement query params state * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Do not commit query state on unmount * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Hopefully fix query states and multiselect input * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Extend the Makefile * Resolve most of TODOs * Resolve the conflicts with anyio version, bring back httpx * Adjust the Makefile and README.md * Fix a typo in README.md * Adjust readme * Fix the Makefile * Fix some stuff * Make some adjustments * Possibly fix failing scraping jobs * Load the repo project URL from env --------- Co-authored-by: vladfedoriuk <vladfedoriuk@users.noreply.github.com> Co-authored-by: Vladyslav Fedoriuk <vladyslav.fedoriuk@deployed.pl>
2023-10-28 19:39:02 +00:00
from app.uow import async_session_uow
async def _create_dependencies_for_repo(session: AsyncSession, repo: Repo) -> None:
"""
Create dependencies for a repo.
For each parsed dependency, creates a new record in the database, if such a
dependency does not exist.
Then, assigns the dependencies to the given repo.
:param session: An asynchronous session object
:param repo: A repo for which to create and assign the dependencies
"""
# Acquire the dependencies data for the repo
logger.info(
"Acquiring the dependencies data for the repo with id {repo_id}.",
repo_id=repo.id,
enqueue=True,
)
try:
(
revision,
dependencies_create_data,
) = await acquire_dependencies_data_for_repository(repo)
except RuntimeError:
# If the parsing fails,
# just skip creating the dependencies
logger.error(
"Failed to acquire the dependencies data for the repo with id {repo_id}.",
repo_id=repo.id,
enqueue=True,
)
return
if repo.last_checked_revision == revision:
# If the repo has already been updated,
# just skip creating the dependencies
logger.info(
"The repo with id {repo_id} has fresh dependencies.",
repo_id=repo.id,
enqueue=True,
)
return
if not dependencies_create_data:
# If there are no dependencies,
# just skip creating the dependencies
logger.info(
"The repo with id {repo_id} has no dependencies.",
repo_id=repo.id,
enqueue=True,
)
return
# Update the repo with the revision hash
logger.info(
"Updating the repo with id {repo_id} with the revision hash {revision}.",
repo_id=repo.id,
revision=revision,
enqueue=True,
)
update_repo_statement = (
sqlalchemy.update(Repo)
.where(Repo.id == repo.id)
.values(last_checked_revision=revision)
)
await session.execute(update_repo_statement)
# Create dependencies - on conflict do nothing.
# This is to avoid creating duplicate dependencies.
logger.info(
"Creating the dependencies for the repo with id {repo_id}.",
repo_id=repo.id,
enqueue=True,
)
insert_dependencies_statement = sqlalchemy.dialects.sqlite.insert(
Dependency
).on_conflict_do_nothing(index_elements=[Dependency.name])
await session.execute(
insert_dependencies_statement.returning(Dependency),
[
{
"name": dependency_data.name,
}
for dependency_data in dependencies_create_data
],
)
# Re-fetch the dependencies from the database
dependencies = (
await session.scalars(
sqlalchemy.select(Dependency).where(
Dependency.name.in_(
[
dependency_data.name
for dependency_data in dependencies_create_data
]
)
)
)
).all()
# Add the dependencies to the repo
insert_repo_dependencies_statement = sqlalchemy.dialects.sqlite.insert(
RepoDependency
).on_conflict_do_nothing([RepoDependency.repo_id, RepoDependency.dependency_id])
await session.execute(
insert_repo_dependencies_statement,
[
{
"repo_id": repo.id,
"dependency_id": dependency.id,
}
for dependency in dependencies
],
)
2023-11-19 22:17:57 +00:00
async def _save_scraped_repos_from_source_graph_repos_data(
source_graph_repos_data: list[SourceGraphRepoData],
) -> None:
"""
Save the scraped repos from the source graph repos data.
.. note::
This function is meant to be used in a task group.
From the SQLAlchemy documentation:
::
https://docs.sqlalchemy.org/en/20/orm/extensions/asyncio.html#using-asyncsession-with-concurrent-tasks
The AsyncSession object is a mutable, stateful object
which represents a single, stateful database
transaction in progress. Using concurrent tasks with asyncio,
with APIs such as asyncio.gather() for example, should use
a separate AsyncSession per individual task.
:param source_graph_repos_data: The source graph repos data.
:return: None
""" # noqa: E501
async with async_session_maker() as session, async_session_uow(session):
saved_repos = await create_or_update_repos_from_source_graph_repos_data(
session=session,
source_graph_repos_data=source_graph_repos_data,
)
logger.info(
"Saving {count} repos.",
count=len(saved_repos),
enqueue=True,
)
await session.commit()
Web App (#25) * Set up the web project dependencies - Add linters, pre-commit, and GitHub Actions - Add a Makefile - Add a pyproject.toml * Fix pyupgrade job - Remove continue_on_error everywhere * Remove old code * Rename a GithubActions job * Change README * Adjust pre-commit and GitHub actions * Add tables and set up alembic * Set up tests * Extend tests * Add coverage config * Adjust the GithubActions workflow * Fix GithubActions workflow * Try fixing pyproject-fmt config * Fix formatting of pyproject.toml * Fix formatting of pyproject.toml * Add coverage report * Test listing the repositories * Add a working prototype of SourceGraph client * Add parsing of the SourceGraph SSE data * Fix tests * Ged rid of packages replaced by ruff * Fix waits in the SourceGraph client * Refactor the models and add a mapper - A new mapper allows to create database repositories from the SourceGraph data * Add mypy * Try fixing mypy action * Remove redundant configs * Exclude tests from type checking * Fix mypy pre-commit and GitHub action * Ignore factories * Make upserting possible for source graph data * Add logic for parsing the dependencies and populating the database * Add a database and a cron GitHub Action job * Try manually trigger a workflow * Bring back the old config * Add ReadTimeout for errors to retry for in SourceGraph client * Add typer * Adjust the docstrings * Update the database * Refactor and optimize scraping and dependencies parsing * Make scraping run on push for now * Add a unique constraint for the repo url and source graph repo id * Change the index columns in on_conflict statement for repo creation * Optimize dependencies parsing - Do not parse dependencies for a repo when revision did not change * Scraped repositories from Source Graph * Refactor scraping * Set up frontend * Scraped repositories from Source Graph * Add TODOs * Skip scraping when testing * Fix a test with updating the repos * Scraped repositories from Source Graph * Add some more TODOs * Scraped repositories from Source Graph * Add some more TODO comments * Add chadcn/ui * Scraped repositories from Source Graph * Create index.json * Scraped repositories from Source Graph * Add a draft of data table and display all the repos * Scraped repositories from Source Graph * Implement stars badges and description with overflow * Format the links to Github repos * Fix link clicking * Scraped repositories from Source Graph * Add simple pagination and stars column sorting * Scraped repositories from Source Graph * Implement basic searching * Scraped repositories from Source Graph * Implement a multiselect for dependencies * Scraped repositories from Source Graph * Implement actual filtering by dependencies * Scraped repositories from Source Graph * Add a workflow to deploy nextjs on github pages * Try fixing the deployment job * Enable static exports for app router * Fix uploading arifacts for nextjs job * Set base path to properly load JS and CSS * Fix the base path * Scraped repositories from Source Graph * Add header * Remove language version * Scraped repositories from Source Graph * Add some more TODOs * Scraped repositories from Source Graph * Adjust the pre-commit config * Fix pipelines * Scraped repositories from Source Graph * Add a footer * Create the indexes * Scraped repositories from Source Graph * Add more TODOs * Introduce minor footer adjustments * Adjust the scraping actions * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Implement query params state * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Do not commit query state on unmount * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Hopefully fix query states and multiselect input * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Extend the Makefile * Resolve most of TODOs * Resolve the conflicts with anyio version, bring back httpx * Adjust the Makefile and README.md * Fix a typo in README.md * Adjust readme * Fix the Makefile * Fix some stuff * Make some adjustments * Possibly fix failing scraping jobs * Load the repo project URL from env --------- Co-authored-by: vladfedoriuk <vladfedoriuk@users.noreply.github.com> Co-authored-by: Vladyslav Fedoriuk <vladyslav.fedoriuk@deployed.pl>
2023-10-28 19:39:02 +00:00
async def scrape_source_graph_repos() -> None:
"""
Iterate over the source graph repos and create or update them in the database.
:return: None
"""
2023-11-19 22:17:57 +00:00
async with AsyncSourceGraphSSEClient() as sg_client, asyncio.TaskGroup() as tg:
logger.info(
"Creating or updating repos from source graph repos data.",
enqueue=True,
)
async for sg_repos_data in sg_client.aiter_fastapi_repos():
logger.info(
"Received {count} repos.",
count=len(sg_repos_data),
enqueue=True,
)
tg.create_task(
_save_scraped_repos_from_source_graph_repos_data(
source_graph_repos_data=sg_repos_data
Web App (#25) * Set up the web project dependencies - Add linters, pre-commit, and GitHub Actions - Add a Makefile - Add a pyproject.toml * Fix pyupgrade job - Remove continue_on_error everywhere * Remove old code * Rename a GithubActions job * Change README * Adjust pre-commit and GitHub actions * Add tables and set up alembic * Set up tests * Extend tests * Add coverage config * Adjust the GithubActions workflow * Fix GithubActions workflow * Try fixing pyproject-fmt config * Fix formatting of pyproject.toml * Fix formatting of pyproject.toml * Add coverage report * Test listing the repositories * Add a working prototype of SourceGraph client * Add parsing of the SourceGraph SSE data * Fix tests * Ged rid of packages replaced by ruff * Fix waits in the SourceGraph client * Refactor the models and add a mapper - A new mapper allows to create database repositories from the SourceGraph data * Add mypy * Try fixing mypy action * Remove redundant configs * Exclude tests from type checking * Fix mypy pre-commit and GitHub action * Ignore factories * Make upserting possible for source graph data * Add logic for parsing the dependencies and populating the database * Add a database and a cron GitHub Action job * Try manually trigger a workflow * Bring back the old config * Add ReadTimeout for errors to retry for in SourceGraph client * Add typer * Adjust the docstrings * Update the database * Refactor and optimize scraping and dependencies parsing * Make scraping run on push for now * Add a unique constraint for the repo url and source graph repo id * Change the index columns in on_conflict statement for repo creation * Optimize dependencies parsing - Do not parse dependencies for a repo when revision did not change * Scraped repositories from Source Graph * Refactor scraping * Set up frontend * Scraped repositories from Source Graph * Add TODOs * Skip scraping when testing * Fix a test with updating the repos * Scraped repositories from Source Graph * Add some more TODOs * Scraped repositories from Source Graph * Add some more TODO comments * Add chadcn/ui * Scraped repositories from Source Graph * Create index.json * Scraped repositories from Source Graph * Add a draft of data table and display all the repos * Scraped repositories from Source Graph * Implement stars badges and description with overflow * Format the links to Github repos * Fix link clicking * Scraped repositories from Source Graph * Add simple pagination and stars column sorting * Scraped repositories from Source Graph * Implement basic searching * Scraped repositories from Source Graph * Implement a multiselect for dependencies * Scraped repositories from Source Graph * Implement actual filtering by dependencies * Scraped repositories from Source Graph * Add a workflow to deploy nextjs on github pages * Try fixing the deployment job * Enable static exports for app router * Fix uploading arifacts for nextjs job * Set base path to properly load JS and CSS * Fix the base path * Scraped repositories from Source Graph * Add header * Remove language version * Scraped repositories from Source Graph * Add some more TODOs * Scraped repositories from Source Graph * Adjust the pre-commit config * Fix pipelines * Scraped repositories from Source Graph * Add a footer * Create the indexes * Scraped repositories from Source Graph * Add more TODOs * Introduce minor footer adjustments * Adjust the scraping actions * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Implement query params state * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Do not commit query state on unmount * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Hopefully fix query states and multiselect input * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Extend the Makefile * Resolve most of TODOs * Resolve the conflicts with anyio version, bring back httpx * Adjust the Makefile and README.md * Fix a typo in README.md * Adjust readme * Fix the Makefile * Fix some stuff * Make some adjustments * Possibly fix failing scraping jobs * Load the repo project URL from env --------- Co-authored-by: vladfedoriuk <vladfedoriuk@users.noreply.github.com> Co-authored-by: Vladyslav Fedoriuk <vladyslav.fedoriuk@deployed.pl>
2023-10-28 19:39:02 +00:00
)
2023-11-19 22:17:57 +00:00
)
async def parse_dependencies_for_repo(semaphore: asyncio.Semaphore, repo: Repo) -> None:
Web App (#25) * Set up the web project dependencies - Add linters, pre-commit, and GitHub Actions - Add a Makefile - Add a pyproject.toml * Fix pyupgrade job - Remove continue_on_error everywhere * Remove old code * Rename a GithubActions job * Change README * Adjust pre-commit and GitHub actions * Add tables and set up alembic * Set up tests * Extend tests * Add coverage config * Adjust the GithubActions workflow * Fix GithubActions workflow * Try fixing pyproject-fmt config * Fix formatting of pyproject.toml * Fix formatting of pyproject.toml * Add coverage report * Test listing the repositories * Add a working prototype of SourceGraph client * Add parsing of the SourceGraph SSE data * Fix tests * Ged rid of packages replaced by ruff * Fix waits in the SourceGraph client * Refactor the models and add a mapper - A new mapper allows to create database repositories from the SourceGraph data * Add mypy * Try fixing mypy action * Remove redundant configs * Exclude tests from type checking * Fix mypy pre-commit and GitHub action * Ignore factories * Make upserting possible for source graph data * Add logic for parsing the dependencies and populating the database * Add a database and a cron GitHub Action job * Try manually trigger a workflow * Bring back the old config * Add ReadTimeout for errors to retry for in SourceGraph client * Add typer * Adjust the docstrings * Update the database * Refactor and optimize scraping and dependencies parsing * Make scraping run on push for now * Add a unique constraint for the repo url and source graph repo id * Change the index columns in on_conflict statement for repo creation * Optimize dependencies parsing - Do not parse dependencies for a repo when revision did not change * Scraped repositories from Source Graph * Refactor scraping * Set up frontend * Scraped repositories from Source Graph * Add TODOs * Skip scraping when testing * Fix a test with updating the repos * Scraped repositories from Source Graph * Add some more TODOs * Scraped repositories from Source Graph * Add some more TODO comments * Add chadcn/ui * Scraped repositories from Source Graph * Create index.json * Scraped repositories from Source Graph * Add a draft of data table and display all the repos * Scraped repositories from Source Graph * Implement stars badges and description with overflow * Format the links to Github repos * Fix link clicking * Scraped repositories from Source Graph * Add simple pagination and stars column sorting * Scraped repositories from Source Graph * Implement basic searching * Scraped repositories from Source Graph * Implement a multiselect for dependencies * Scraped repositories from Source Graph * Implement actual filtering by dependencies * Scraped repositories from Source Graph * Add a workflow to deploy nextjs on github pages * Try fixing the deployment job * Enable static exports for app router * Fix uploading arifacts for nextjs job * Set base path to properly load JS and CSS * Fix the base path * Scraped repositories from Source Graph * Add header * Remove language version * Scraped repositories from Source Graph * Add some more TODOs * Scraped repositories from Source Graph * Adjust the pre-commit config * Fix pipelines * Scraped repositories from Source Graph * Add a footer * Create the indexes * Scraped repositories from Source Graph * Add more TODOs * Introduce minor footer adjustments * Adjust the scraping actions * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Implement query params state * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Do not commit query state on unmount * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Hopefully fix query states and multiselect input * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Extend the Makefile * Resolve most of TODOs * Resolve the conflicts with anyio version, bring back httpx * Adjust the Makefile and README.md * Fix a typo in README.md * Adjust readme * Fix the Makefile * Fix some stuff * Make some adjustments * Possibly fix failing scraping jobs * Load the repo project URL from env --------- Co-authored-by: vladfedoriuk <vladfedoriuk@users.noreply.github.com> Co-authored-by: Vladyslav Fedoriuk <vladyslav.fedoriuk@deployed.pl>
2023-10-28 19:39:02 +00:00
"""
Parse the dependencies for a given repo and create them in the database.
2023-11-19 22:17:57 +00:00
.. note::
This function is meant to be used in a task group.
From the SQLAlchemy documentation:
::
https://docs.sqlalchemy.org/en/20/orm/extensions/asyncio.html#using-asyncsession-with-concurrent-tasks
The AsyncSession object is a mutable, stateful object
which represents a single, stateful database
transaction in progress. Using concurrent tasks with asyncio,
with APIs such as asyncio.gather() for example, should use
a separate AsyncSession per individual task.
Web App (#25) * Set up the web project dependencies - Add linters, pre-commit, and GitHub Actions - Add a Makefile - Add a pyproject.toml * Fix pyupgrade job - Remove continue_on_error everywhere * Remove old code * Rename a GithubActions job * Change README * Adjust pre-commit and GitHub actions * Add tables and set up alembic * Set up tests * Extend tests * Add coverage config * Adjust the GithubActions workflow * Fix GithubActions workflow * Try fixing pyproject-fmt config * Fix formatting of pyproject.toml * Fix formatting of pyproject.toml * Add coverage report * Test listing the repositories * Add a working prototype of SourceGraph client * Add parsing of the SourceGraph SSE data * Fix tests * Ged rid of packages replaced by ruff * Fix waits in the SourceGraph client * Refactor the models and add a mapper - A new mapper allows to create database repositories from the SourceGraph data * Add mypy * Try fixing mypy action * Remove redundant configs * Exclude tests from type checking * Fix mypy pre-commit and GitHub action * Ignore factories * Make upserting possible for source graph data * Add logic for parsing the dependencies and populating the database * Add a database and a cron GitHub Action job * Try manually trigger a workflow * Bring back the old config * Add ReadTimeout for errors to retry for in SourceGraph client * Add typer * Adjust the docstrings * Update the database * Refactor and optimize scraping and dependencies parsing * Make scraping run on push for now * Add a unique constraint for the repo url and source graph repo id * Change the index columns in on_conflict statement for repo creation * Optimize dependencies parsing - Do not parse dependencies for a repo when revision did not change * Scraped repositories from Source Graph * Refactor scraping * Set up frontend * Scraped repositories from Source Graph * Add TODOs * Skip scraping when testing * Fix a test with updating the repos * Scraped repositories from Source Graph * Add some more TODOs * Scraped repositories from Source Graph * Add some more TODO comments * Add chadcn/ui * Scraped repositories from Source Graph * Create index.json * Scraped repositories from Source Graph * Add a draft of data table and display all the repos * Scraped repositories from Source Graph * Implement stars badges and description with overflow * Format the links to Github repos * Fix link clicking * Scraped repositories from Source Graph * Add simple pagination and stars column sorting * Scraped repositories from Source Graph * Implement basic searching * Scraped repositories from Source Graph * Implement a multiselect for dependencies * Scraped repositories from Source Graph * Implement actual filtering by dependencies * Scraped repositories from Source Graph * Add a workflow to deploy nextjs on github pages * Try fixing the deployment job * Enable static exports for app router * Fix uploading arifacts for nextjs job * Set base path to properly load JS and CSS * Fix the base path * Scraped repositories from Source Graph * Add header * Remove language version * Scraped repositories from Source Graph * Add some more TODOs * Scraped repositories from Source Graph * Adjust the pre-commit config * Fix pipelines * Scraped repositories from Source Graph * Add a footer * Create the indexes * Scraped repositories from Source Graph * Add more TODOs * Introduce minor footer adjustments * Adjust the scraping actions * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Implement query params state * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Do not commit query state on unmount * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Hopefully fix query states and multiselect input * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Extend the Makefile * Resolve most of TODOs * Resolve the conflicts with anyio version, bring back httpx * Adjust the Makefile and README.md * Fix a typo in README.md * Adjust readme * Fix the Makefile * Fix some stuff * Make some adjustments * Possibly fix failing scraping jobs * Load the repo project URL from env --------- Co-authored-by: vladfedoriuk <vladfedoriuk@users.noreply.github.com> Co-authored-by: Vladyslav Fedoriuk <vladyslav.fedoriuk@deployed.pl>
2023-10-28 19:39:02 +00:00
:param semaphore: A semaphore to limit the number of concurrent requests
2023-11-19 22:17:57 +00:00
:param repo: A repo for which to create and assign the dependencies
Web App (#25) * Set up the web project dependencies - Add linters, pre-commit, and GitHub Actions - Add a Makefile - Add a pyproject.toml * Fix pyupgrade job - Remove continue_on_error everywhere * Remove old code * Rename a GithubActions job * Change README * Adjust pre-commit and GitHub actions * Add tables and set up alembic * Set up tests * Extend tests * Add coverage config * Adjust the GithubActions workflow * Fix GithubActions workflow * Try fixing pyproject-fmt config * Fix formatting of pyproject.toml * Fix formatting of pyproject.toml * Add coverage report * Test listing the repositories * Add a working prototype of SourceGraph client * Add parsing of the SourceGraph SSE data * Fix tests * Ged rid of packages replaced by ruff * Fix waits in the SourceGraph client * Refactor the models and add a mapper - A new mapper allows to create database repositories from the SourceGraph data * Add mypy * Try fixing mypy action * Remove redundant configs * Exclude tests from type checking * Fix mypy pre-commit and GitHub action * Ignore factories * Make upserting possible for source graph data * Add logic for parsing the dependencies and populating the database * Add a database and a cron GitHub Action job * Try manually trigger a workflow * Bring back the old config * Add ReadTimeout for errors to retry for in SourceGraph client * Add typer * Adjust the docstrings * Update the database * Refactor and optimize scraping and dependencies parsing * Make scraping run on push for now * Add a unique constraint for the repo url and source graph repo id * Change the index columns in on_conflict statement for repo creation * Optimize dependencies parsing - Do not parse dependencies for a repo when revision did not change * Scraped repositories from Source Graph * Refactor scraping * Set up frontend * Scraped repositories from Source Graph * Add TODOs * Skip scraping when testing * Fix a test with updating the repos * Scraped repositories from Source Graph * Add some more TODOs * Scraped repositories from Source Graph * Add some more TODO comments * Add chadcn/ui * Scraped repositories from Source Graph * Create index.json * Scraped repositories from Source Graph * Add a draft of data table and display all the repos * Scraped repositories from Source Graph * Implement stars badges and description with overflow * Format the links to Github repos * Fix link clicking * Scraped repositories from Source Graph * Add simple pagination and stars column sorting * Scraped repositories from Source Graph * Implement basic searching * Scraped repositories from Source Graph * Implement a multiselect for dependencies * Scraped repositories from Source Graph * Implement actual filtering by dependencies * Scraped repositories from Source Graph * Add a workflow to deploy nextjs on github pages * Try fixing the deployment job * Enable static exports for app router * Fix uploading arifacts for nextjs job * Set base path to properly load JS and CSS * Fix the base path * Scraped repositories from Source Graph * Add header * Remove language version * Scraped repositories from Source Graph * Add some more TODOs * Scraped repositories from Source Graph * Adjust the pre-commit config * Fix pipelines * Scraped repositories from Source Graph * Add a footer * Create the indexes * Scraped repositories from Source Graph * Add more TODOs * Introduce minor footer adjustments * Adjust the scraping actions * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Implement query params state * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Do not commit query state on unmount * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Hopefully fix query states and multiselect input * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Extend the Makefile * Resolve most of TODOs * Resolve the conflicts with anyio version, bring back httpx * Adjust the Makefile and README.md * Fix a typo in README.md * Adjust readme * Fix the Makefile * Fix some stuff * Make some adjustments * Possibly fix failing scraping jobs * Load the repo project URL from env --------- Co-authored-by: vladfedoriuk <vladfedoriuk@users.noreply.github.com> Co-authored-by: Vladyslav Fedoriuk <vladyslav.fedoriuk@deployed.pl>
2023-10-28 19:39:02 +00:00
:return: None
2023-11-19 22:17:57 +00:00
""" # noqa: E501
async with semaphore, async_session_maker() as session, async_session_uow(session):
# Associate the repo object with a fresh session instance
repo = await session.merge(repo)
Web App (#25) * Set up the web project dependencies - Add linters, pre-commit, and GitHub Actions - Add a Makefile - Add a pyproject.toml * Fix pyupgrade job - Remove continue_on_error everywhere * Remove old code * Rename a GithubActions job * Change README * Adjust pre-commit and GitHub actions * Add tables and set up alembic * Set up tests * Extend tests * Add coverage config * Adjust the GithubActions workflow * Fix GithubActions workflow * Try fixing pyproject-fmt config * Fix formatting of pyproject.toml * Fix formatting of pyproject.toml * Add coverage report * Test listing the repositories * Add a working prototype of SourceGraph client * Add parsing of the SourceGraph SSE data * Fix tests * Ged rid of packages replaced by ruff * Fix waits in the SourceGraph client * Refactor the models and add a mapper - A new mapper allows to create database repositories from the SourceGraph data * Add mypy * Try fixing mypy action * Remove redundant configs * Exclude tests from type checking * Fix mypy pre-commit and GitHub action * Ignore factories * Make upserting possible for source graph data * Add logic for parsing the dependencies and populating the database * Add a database and a cron GitHub Action job * Try manually trigger a workflow * Bring back the old config * Add ReadTimeout for errors to retry for in SourceGraph client * Add typer * Adjust the docstrings * Update the database * Refactor and optimize scraping and dependencies parsing * Make scraping run on push for now * Add a unique constraint for the repo url and source graph repo id * Change the index columns in on_conflict statement for repo creation * Optimize dependencies parsing - Do not parse dependencies for a repo when revision did not change * Scraped repositories from Source Graph * Refactor scraping * Set up frontend * Scraped repositories from Source Graph * Add TODOs * Skip scraping when testing * Fix a test with updating the repos * Scraped repositories from Source Graph * Add some more TODOs * Scraped repositories from Source Graph * Add some more TODO comments * Add chadcn/ui * Scraped repositories from Source Graph * Create index.json * Scraped repositories from Source Graph * Add a draft of data table and display all the repos * Scraped repositories from Source Graph * Implement stars badges and description with overflow * Format the links to Github repos * Fix link clicking * Scraped repositories from Source Graph * Add simple pagination and stars column sorting * Scraped repositories from Source Graph * Implement basic searching * Scraped repositories from Source Graph * Implement a multiselect for dependencies * Scraped repositories from Source Graph * Implement actual filtering by dependencies * Scraped repositories from Source Graph * Add a workflow to deploy nextjs on github pages * Try fixing the deployment job * Enable static exports for app router * Fix uploading arifacts for nextjs job * Set base path to properly load JS and CSS * Fix the base path * Scraped repositories from Source Graph * Add header * Remove language version * Scraped repositories from Source Graph * Add some more TODOs * Scraped repositories from Source Graph * Adjust the pre-commit config * Fix pipelines * Scraped repositories from Source Graph * Add a footer * Create the indexes * Scraped repositories from Source Graph * Add more TODOs * Introduce minor footer adjustments * Adjust the scraping actions * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Implement query params state * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Do not commit query state on unmount * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Hopefully fix query states and multiselect input * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Extend the Makefile * Resolve most of TODOs * Resolve the conflicts with anyio version, bring back httpx * Adjust the Makefile and README.md * Fix a typo in README.md * Adjust readme * Fix the Makefile * Fix some stuff * Make some adjustments * Possibly fix failing scraping jobs * Load the repo project URL from env --------- Co-authored-by: vladfedoriuk <vladfedoriuk@users.noreply.github.com> Co-authored-by: Vladyslav Fedoriuk <vladyslav.fedoriuk@deployed.pl>
2023-10-28 19:39:02 +00:00
# Create the dependencies for the repo
logger.info(
"Creating the dependencies for the repo with id {repo_id}.",
2023-11-19 22:17:57 +00:00
repo_id=repo.id,
Web App (#25) * Set up the web project dependencies - Add linters, pre-commit, and GitHub Actions - Add a Makefile - Add a pyproject.toml * Fix pyupgrade job - Remove continue_on_error everywhere * Remove old code * Rename a GithubActions job * Change README * Adjust pre-commit and GitHub actions * Add tables and set up alembic * Set up tests * Extend tests * Add coverage config * Adjust the GithubActions workflow * Fix GithubActions workflow * Try fixing pyproject-fmt config * Fix formatting of pyproject.toml * Fix formatting of pyproject.toml * Add coverage report * Test listing the repositories * Add a working prototype of SourceGraph client * Add parsing of the SourceGraph SSE data * Fix tests * Ged rid of packages replaced by ruff * Fix waits in the SourceGraph client * Refactor the models and add a mapper - A new mapper allows to create database repositories from the SourceGraph data * Add mypy * Try fixing mypy action * Remove redundant configs * Exclude tests from type checking * Fix mypy pre-commit and GitHub action * Ignore factories * Make upserting possible for source graph data * Add logic for parsing the dependencies and populating the database * Add a database and a cron GitHub Action job * Try manually trigger a workflow * Bring back the old config * Add ReadTimeout for errors to retry for in SourceGraph client * Add typer * Adjust the docstrings * Update the database * Refactor and optimize scraping and dependencies parsing * Make scraping run on push for now * Add a unique constraint for the repo url and source graph repo id * Change the index columns in on_conflict statement for repo creation * Optimize dependencies parsing - Do not parse dependencies for a repo when revision did not change * Scraped repositories from Source Graph * Refactor scraping * Set up frontend * Scraped repositories from Source Graph * Add TODOs * Skip scraping when testing * Fix a test with updating the repos * Scraped repositories from Source Graph * Add some more TODOs * Scraped repositories from Source Graph * Add some more TODO comments * Add chadcn/ui * Scraped repositories from Source Graph * Create index.json * Scraped repositories from Source Graph * Add a draft of data table and display all the repos * Scraped repositories from Source Graph * Implement stars badges and description with overflow * Format the links to Github repos * Fix link clicking * Scraped repositories from Source Graph * Add simple pagination and stars column sorting * Scraped repositories from Source Graph * Implement basic searching * Scraped repositories from Source Graph * Implement a multiselect for dependencies * Scraped repositories from Source Graph * Implement actual filtering by dependencies * Scraped repositories from Source Graph * Add a workflow to deploy nextjs on github pages * Try fixing the deployment job * Enable static exports for app router * Fix uploading arifacts for nextjs job * Set base path to properly load JS and CSS * Fix the base path * Scraped repositories from Source Graph * Add header * Remove language version * Scraped repositories from Source Graph * Add some more TODOs * Scraped repositories from Source Graph * Adjust the pre-commit config * Fix pipelines * Scraped repositories from Source Graph * Add a footer * Create the indexes * Scraped repositories from Source Graph * Add more TODOs * Introduce minor footer adjustments * Adjust the scraping actions * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Implement query params state * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Do not commit query state on unmount * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Hopefully fix query states and multiselect input * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Extend the Makefile * Resolve most of TODOs * Resolve the conflicts with anyio version, bring back httpx * Adjust the Makefile and README.md * Fix a typo in README.md * Adjust readme * Fix the Makefile * Fix some stuff * Make some adjustments * Possibly fix failing scraping jobs * Load the repo project URL from env --------- Co-authored-by: vladfedoriuk <vladfedoriuk@users.noreply.github.com> Co-authored-by: Vladyslav Fedoriuk <vladyslav.fedoriuk@deployed.pl>
2023-10-28 19:39:02 +00:00
enqueue=True,
)
await _create_dependencies_for_repo(session=session, repo=repo)
await session.commit()
async def parse_dependencies_for_repos() -> None:
"""
Parse the dependencies for all the repos in the database.
:return: None.
"""
logger.info("Fetching the repos from the database.", enqueue=True)
2023-11-19 22:17:57 +00:00
async with async_session_maker() as session:
repos = (
Web App (#25) * Set up the web project dependencies - Add linters, pre-commit, and GitHub Actions - Add a Makefile - Add a pyproject.toml * Fix pyupgrade job - Remove continue_on_error everywhere * Remove old code * Rename a GithubActions job * Change README * Adjust pre-commit and GitHub actions * Add tables and set up alembic * Set up tests * Extend tests * Add coverage config * Adjust the GithubActions workflow * Fix GithubActions workflow * Try fixing pyproject-fmt config * Fix formatting of pyproject.toml * Fix formatting of pyproject.toml * Add coverage report * Test listing the repositories * Add a working prototype of SourceGraph client * Add parsing of the SourceGraph SSE data * Fix tests * Ged rid of packages replaced by ruff * Fix waits in the SourceGraph client * Refactor the models and add a mapper - A new mapper allows to create database repositories from the SourceGraph data * Add mypy * Try fixing mypy action * Remove redundant configs * Exclude tests from type checking * Fix mypy pre-commit and GitHub action * Ignore factories * Make upserting possible for source graph data * Add logic for parsing the dependencies and populating the database * Add a database and a cron GitHub Action job * Try manually trigger a workflow * Bring back the old config * Add ReadTimeout for errors to retry for in SourceGraph client * Add typer * Adjust the docstrings * Update the database * Refactor and optimize scraping and dependencies parsing * Make scraping run on push for now * Add a unique constraint for the repo url and source graph repo id * Change the index columns in on_conflict statement for repo creation * Optimize dependencies parsing - Do not parse dependencies for a repo when revision did not change * Scraped repositories from Source Graph * Refactor scraping * Set up frontend * Scraped repositories from Source Graph * Add TODOs * Skip scraping when testing * Fix a test with updating the repos * Scraped repositories from Source Graph * Add some more TODOs * Scraped repositories from Source Graph * Add some more TODO comments * Add chadcn/ui * Scraped repositories from Source Graph * Create index.json * Scraped repositories from Source Graph * Add a draft of data table and display all the repos * Scraped repositories from Source Graph * Implement stars badges and description with overflow * Format the links to Github repos * Fix link clicking * Scraped repositories from Source Graph * Add simple pagination and stars column sorting * Scraped repositories from Source Graph * Implement basic searching * Scraped repositories from Source Graph * Implement a multiselect for dependencies * Scraped repositories from Source Graph * Implement actual filtering by dependencies * Scraped repositories from Source Graph * Add a workflow to deploy nextjs on github pages * Try fixing the deployment job * Enable static exports for app router * Fix uploading arifacts for nextjs job * Set base path to properly load JS and CSS * Fix the base path * Scraped repositories from Source Graph * Add header * Remove language version * Scraped repositories from Source Graph * Add some more TODOs * Scraped repositories from Source Graph * Adjust the pre-commit config * Fix pipelines * Scraped repositories from Source Graph * Add a footer * Create the indexes * Scraped repositories from Source Graph * Add more TODOs * Introduce minor footer adjustments * Adjust the scraping actions * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Implement query params state * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Do not commit query state on unmount * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Hopefully fix query states and multiselect input * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Extend the Makefile * Resolve most of TODOs * Resolve the conflicts with anyio version, bring back httpx * Adjust the Makefile and README.md * Fix a typo in README.md * Adjust readme * Fix the Makefile * Fix some stuff * Make some adjustments * Possibly fix failing scraping jobs * Load the repo project URL from env --------- Co-authored-by: vladfedoriuk <vladfedoriuk@users.noreply.github.com> Co-authored-by: Vladyslav Fedoriuk <vladyslav.fedoriuk@deployed.pl>
2023-10-28 19:39:02 +00:00
await session.scalars(
2023-11-19 22:17:57 +00:00
sqlalchemy.select(Repo).order_by(
Web App (#25) * Set up the web project dependencies - Add linters, pre-commit, and GitHub Actions - Add a Makefile - Add a pyproject.toml * Fix pyupgrade job - Remove continue_on_error everywhere * Remove old code * Rename a GithubActions job * Change README * Adjust pre-commit and GitHub actions * Add tables and set up alembic * Set up tests * Extend tests * Add coverage config * Adjust the GithubActions workflow * Fix GithubActions workflow * Try fixing pyproject-fmt config * Fix formatting of pyproject.toml * Fix formatting of pyproject.toml * Add coverage report * Test listing the repositories * Add a working prototype of SourceGraph client * Add parsing of the SourceGraph SSE data * Fix tests * Ged rid of packages replaced by ruff * Fix waits in the SourceGraph client * Refactor the models and add a mapper - A new mapper allows to create database repositories from the SourceGraph data * Add mypy * Try fixing mypy action * Remove redundant configs * Exclude tests from type checking * Fix mypy pre-commit and GitHub action * Ignore factories * Make upserting possible for source graph data * Add logic for parsing the dependencies and populating the database * Add a database and a cron GitHub Action job * Try manually trigger a workflow * Bring back the old config * Add ReadTimeout for errors to retry for in SourceGraph client * Add typer * Adjust the docstrings * Update the database * Refactor and optimize scraping and dependencies parsing * Make scraping run on push for now * Add a unique constraint for the repo url and source graph repo id * Change the index columns in on_conflict statement for repo creation * Optimize dependencies parsing - Do not parse dependencies for a repo when revision did not change * Scraped repositories from Source Graph * Refactor scraping * Set up frontend * Scraped repositories from Source Graph * Add TODOs * Skip scraping when testing * Fix a test with updating the repos * Scraped repositories from Source Graph * Add some more TODOs * Scraped repositories from Source Graph * Add some more TODO comments * Add chadcn/ui * Scraped repositories from Source Graph * Create index.json * Scraped repositories from Source Graph * Add a draft of data table and display all the repos * Scraped repositories from Source Graph * Implement stars badges and description with overflow * Format the links to Github repos * Fix link clicking * Scraped repositories from Source Graph * Add simple pagination and stars column sorting * Scraped repositories from Source Graph * Implement basic searching * Scraped repositories from Source Graph * Implement a multiselect for dependencies * Scraped repositories from Source Graph * Implement actual filtering by dependencies * Scraped repositories from Source Graph * Add a workflow to deploy nextjs on github pages * Try fixing the deployment job * Enable static exports for app router * Fix uploading arifacts for nextjs job * Set base path to properly load JS and CSS * Fix the base path * Scraped repositories from Source Graph * Add header * Remove language version * Scraped repositories from Source Graph * Add some more TODOs * Scraped repositories from Source Graph * Adjust the pre-commit config * Fix pipelines * Scraped repositories from Source Graph * Add a footer * Create the indexes * Scraped repositories from Source Graph * Add more TODOs * Introduce minor footer adjustments * Adjust the scraping actions * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Implement query params state * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Do not commit query state on unmount * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Hopefully fix query states and multiselect input * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Extend the Makefile * Resolve most of TODOs * Resolve the conflicts with anyio version, bring back httpx * Adjust the Makefile and README.md * Fix a typo in README.md * Adjust readme * Fix the Makefile * Fix some stuff * Make some adjustments * Possibly fix failing scraping jobs * Load the repo project URL from env --------- Co-authored-by: vladfedoriuk <vladfedoriuk@users.noreply.github.com> Co-authored-by: Vladyslav Fedoriuk <vladyslav.fedoriuk@deployed.pl>
2023-10-28 19:39:02 +00:00
Repo.last_checked_revision.is_(None).desc()
)
)
).all()
2023-11-19 22:17:57 +00:00
logger.info("Fetched {count} repos.", count=len(repos), enqueue=True)
Web App (#25) * Set up the web project dependencies - Add linters, pre-commit, and GitHub Actions - Add a Makefile - Add a pyproject.toml * Fix pyupgrade job - Remove continue_on_error everywhere * Remove old code * Rename a GithubActions job * Change README * Adjust pre-commit and GitHub actions * Add tables and set up alembic * Set up tests * Extend tests * Add coverage config * Adjust the GithubActions workflow * Fix GithubActions workflow * Try fixing pyproject-fmt config * Fix formatting of pyproject.toml * Fix formatting of pyproject.toml * Add coverage report * Test listing the repositories * Add a working prototype of SourceGraph client * Add parsing of the SourceGraph SSE data * Fix tests * Ged rid of packages replaced by ruff * Fix waits in the SourceGraph client * Refactor the models and add a mapper - A new mapper allows to create database repositories from the SourceGraph data * Add mypy * Try fixing mypy action * Remove redundant configs * Exclude tests from type checking * Fix mypy pre-commit and GitHub action * Ignore factories * Make upserting possible for source graph data * Add logic for parsing the dependencies and populating the database * Add a database and a cron GitHub Action job * Try manually trigger a workflow * Bring back the old config * Add ReadTimeout for errors to retry for in SourceGraph client * Add typer * Adjust the docstrings * Update the database * Refactor and optimize scraping and dependencies parsing * Make scraping run on push for now * Add a unique constraint for the repo url and source graph repo id * Change the index columns in on_conflict statement for repo creation * Optimize dependencies parsing - Do not parse dependencies for a repo when revision did not change * Scraped repositories from Source Graph * Refactor scraping * Set up frontend * Scraped repositories from Source Graph * Add TODOs * Skip scraping when testing * Fix a test with updating the repos * Scraped repositories from Source Graph * Add some more TODOs * Scraped repositories from Source Graph * Add some more TODO comments * Add chadcn/ui * Scraped repositories from Source Graph * Create index.json * Scraped repositories from Source Graph * Add a draft of data table and display all the repos * Scraped repositories from Source Graph * Implement stars badges and description with overflow * Format the links to Github repos * Fix link clicking * Scraped repositories from Source Graph * Add simple pagination and stars column sorting * Scraped repositories from Source Graph * Implement basic searching * Scraped repositories from Source Graph * Implement a multiselect for dependencies * Scraped repositories from Source Graph * Implement actual filtering by dependencies * Scraped repositories from Source Graph * Add a workflow to deploy nextjs on github pages * Try fixing the deployment job * Enable static exports for app router * Fix uploading arifacts for nextjs job * Set base path to properly load JS and CSS * Fix the base path * Scraped repositories from Source Graph * Add header * Remove language version * Scraped repositories from Source Graph * Add some more TODOs * Scraped repositories from Source Graph * Adjust the pre-commit config * Fix pipelines * Scraped repositories from Source Graph * Add a footer * Create the indexes * Scraped repositories from Source Graph * Add more TODOs * Introduce minor footer adjustments * Adjust the scraping actions * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Implement query params state * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Do not commit query state on unmount * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Hopefully fix query states and multiselect input * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Extend the Makefile * Resolve most of TODOs * Resolve the conflicts with anyio version, bring back httpx * Adjust the Makefile and README.md * Fix a typo in README.md * Adjust readme * Fix the Makefile * Fix some stuff * Make some adjustments * Possibly fix failing scraping jobs * Load the repo project URL from env --------- Co-authored-by: vladfedoriuk <vladfedoriuk@users.noreply.github.com> Co-authored-by: Vladyslav Fedoriuk <vladyslav.fedoriuk@deployed.pl>
2023-10-28 19:39:02 +00:00
logger.info("Parsing the dependencies for the repos.", enqueue=True)
semaphore = asyncio.Semaphore(10)
async with asyncio.TaskGroup() as tg:
2023-11-19 22:17:57 +00:00
for repo in repos:
Web App (#25) * Set up the web project dependencies - Add linters, pre-commit, and GitHub Actions - Add a Makefile - Add a pyproject.toml * Fix pyupgrade job - Remove continue_on_error everywhere * Remove old code * Rename a GithubActions job * Change README * Adjust pre-commit and GitHub actions * Add tables and set up alembic * Set up tests * Extend tests * Add coverage config * Adjust the GithubActions workflow * Fix GithubActions workflow * Try fixing pyproject-fmt config * Fix formatting of pyproject.toml * Fix formatting of pyproject.toml * Add coverage report * Test listing the repositories * Add a working prototype of SourceGraph client * Add parsing of the SourceGraph SSE data * Fix tests * Ged rid of packages replaced by ruff * Fix waits in the SourceGraph client * Refactor the models and add a mapper - A new mapper allows to create database repositories from the SourceGraph data * Add mypy * Try fixing mypy action * Remove redundant configs * Exclude tests from type checking * Fix mypy pre-commit and GitHub action * Ignore factories * Make upserting possible for source graph data * Add logic for parsing the dependencies and populating the database * Add a database and a cron GitHub Action job * Try manually trigger a workflow * Bring back the old config * Add ReadTimeout for errors to retry for in SourceGraph client * Add typer * Adjust the docstrings * Update the database * Refactor and optimize scraping and dependencies parsing * Make scraping run on push for now * Add a unique constraint for the repo url and source graph repo id * Change the index columns in on_conflict statement for repo creation * Optimize dependencies parsing - Do not parse dependencies for a repo when revision did not change * Scraped repositories from Source Graph * Refactor scraping * Set up frontend * Scraped repositories from Source Graph * Add TODOs * Skip scraping when testing * Fix a test with updating the repos * Scraped repositories from Source Graph * Add some more TODOs * Scraped repositories from Source Graph * Add some more TODO comments * Add chadcn/ui * Scraped repositories from Source Graph * Create index.json * Scraped repositories from Source Graph * Add a draft of data table and display all the repos * Scraped repositories from Source Graph * Implement stars badges and description with overflow * Format the links to Github repos * Fix link clicking * Scraped repositories from Source Graph * Add simple pagination and stars column sorting * Scraped repositories from Source Graph * Implement basic searching * Scraped repositories from Source Graph * Implement a multiselect for dependencies * Scraped repositories from Source Graph * Implement actual filtering by dependencies * Scraped repositories from Source Graph * Add a workflow to deploy nextjs on github pages * Try fixing the deployment job * Enable static exports for app router * Fix uploading arifacts for nextjs job * Set base path to properly load JS and CSS * Fix the base path * Scraped repositories from Source Graph * Add header * Remove language version * Scraped repositories from Source Graph * Add some more TODOs * Scraped repositories from Source Graph * Adjust the pre-commit config * Fix pipelines * Scraped repositories from Source Graph * Add a footer * Create the indexes * Scraped repositories from Source Graph * Add more TODOs * Introduce minor footer adjustments * Adjust the scraping actions * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Implement query params state * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Do not commit query state on unmount * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Hopefully fix query states and multiselect input * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Extend the Makefile * Resolve most of TODOs * Resolve the conflicts with anyio version, bring back httpx * Adjust the Makefile and README.md * Fix a typo in README.md * Adjust readme * Fix the Makefile * Fix some stuff * Make some adjustments * Possibly fix failing scraping jobs * Load the repo project URL from env --------- Co-authored-by: vladfedoriuk <vladfedoriuk@users.noreply.github.com> Co-authored-by: Vladyslav Fedoriuk <vladyslav.fedoriuk@deployed.pl>
2023-10-28 19:39:02 +00:00
logger.info(
"Parsing the dependencies for repo {repo_id}.",
2023-11-19 22:17:57 +00:00
repo_id=repo.id,
Web App (#25) * Set up the web project dependencies - Add linters, pre-commit, and GitHub Actions - Add a Makefile - Add a pyproject.toml * Fix pyupgrade job - Remove continue_on_error everywhere * Remove old code * Rename a GithubActions job * Change README * Adjust pre-commit and GitHub actions * Add tables and set up alembic * Set up tests * Extend tests * Add coverage config * Adjust the GithubActions workflow * Fix GithubActions workflow * Try fixing pyproject-fmt config * Fix formatting of pyproject.toml * Fix formatting of pyproject.toml * Add coverage report * Test listing the repositories * Add a working prototype of SourceGraph client * Add parsing of the SourceGraph SSE data * Fix tests * Ged rid of packages replaced by ruff * Fix waits in the SourceGraph client * Refactor the models and add a mapper - A new mapper allows to create database repositories from the SourceGraph data * Add mypy * Try fixing mypy action * Remove redundant configs * Exclude tests from type checking * Fix mypy pre-commit and GitHub action * Ignore factories * Make upserting possible for source graph data * Add logic for parsing the dependencies and populating the database * Add a database and a cron GitHub Action job * Try manually trigger a workflow * Bring back the old config * Add ReadTimeout for errors to retry for in SourceGraph client * Add typer * Adjust the docstrings * Update the database * Refactor and optimize scraping and dependencies parsing * Make scraping run on push for now * Add a unique constraint for the repo url and source graph repo id * Change the index columns in on_conflict statement for repo creation * Optimize dependencies parsing - Do not parse dependencies for a repo when revision did not change * Scraped repositories from Source Graph * Refactor scraping * Set up frontend * Scraped repositories from Source Graph * Add TODOs * Skip scraping when testing * Fix a test with updating the repos * Scraped repositories from Source Graph * Add some more TODOs * Scraped repositories from Source Graph * Add some more TODO comments * Add chadcn/ui * Scraped repositories from Source Graph * Create index.json * Scraped repositories from Source Graph * Add a draft of data table and display all the repos * Scraped repositories from Source Graph * Implement stars badges and description with overflow * Format the links to Github repos * Fix link clicking * Scraped repositories from Source Graph * Add simple pagination and stars column sorting * Scraped repositories from Source Graph * Implement basic searching * Scraped repositories from Source Graph * Implement a multiselect for dependencies * Scraped repositories from Source Graph * Implement actual filtering by dependencies * Scraped repositories from Source Graph * Add a workflow to deploy nextjs on github pages * Try fixing the deployment job * Enable static exports for app router * Fix uploading arifacts for nextjs job * Set base path to properly load JS and CSS * Fix the base path * Scraped repositories from Source Graph * Add header * Remove language version * Scraped repositories from Source Graph * Add some more TODOs * Scraped repositories from Source Graph * Adjust the pre-commit config * Fix pipelines * Scraped repositories from Source Graph * Add a footer * Create the indexes * Scraped repositories from Source Graph * Add more TODOs * Introduce minor footer adjustments * Adjust the scraping actions * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Implement query params state * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Do not commit query state on unmount * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Hopefully fix query states and multiselect input * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Extend the Makefile * Resolve most of TODOs * Resolve the conflicts with anyio version, bring back httpx * Adjust the Makefile and README.md * Fix a typo in README.md * Adjust readme * Fix the Makefile * Fix some stuff * Make some adjustments * Possibly fix failing scraping jobs * Load the repo project URL from env --------- Co-authored-by: vladfedoriuk <vladfedoriuk@users.noreply.github.com> Co-authored-by: Vladyslav Fedoriuk <vladyslav.fedoriuk@deployed.pl>
2023-10-28 19:39:02 +00:00
enqueue=True,
)
2023-11-19 22:17:57 +00:00
tg.create_task(parse_dependencies_for_repo(semaphore=semaphore, repo=repo))
Web App (#25) * Set up the web project dependencies - Add linters, pre-commit, and GitHub Actions - Add a Makefile - Add a pyproject.toml * Fix pyupgrade job - Remove continue_on_error everywhere * Remove old code * Rename a GithubActions job * Change README * Adjust pre-commit and GitHub actions * Add tables and set up alembic * Set up tests * Extend tests * Add coverage config * Adjust the GithubActions workflow * Fix GithubActions workflow * Try fixing pyproject-fmt config * Fix formatting of pyproject.toml * Fix formatting of pyproject.toml * Add coverage report * Test listing the repositories * Add a working prototype of SourceGraph client * Add parsing of the SourceGraph SSE data * Fix tests * Ged rid of packages replaced by ruff * Fix waits in the SourceGraph client * Refactor the models and add a mapper - A new mapper allows to create database repositories from the SourceGraph data * Add mypy * Try fixing mypy action * Remove redundant configs * Exclude tests from type checking * Fix mypy pre-commit and GitHub action * Ignore factories * Make upserting possible for source graph data * Add logic for parsing the dependencies and populating the database * Add a database and a cron GitHub Action job * Try manually trigger a workflow * Bring back the old config * Add ReadTimeout for errors to retry for in SourceGraph client * Add typer * Adjust the docstrings * Update the database * Refactor and optimize scraping and dependencies parsing * Make scraping run on push for now * Add a unique constraint for the repo url and source graph repo id * Change the index columns in on_conflict statement for repo creation * Optimize dependencies parsing - Do not parse dependencies for a repo when revision did not change * Scraped repositories from Source Graph * Refactor scraping * Set up frontend * Scraped repositories from Source Graph * Add TODOs * Skip scraping when testing * Fix a test with updating the repos * Scraped repositories from Source Graph * Add some more TODOs * Scraped repositories from Source Graph * Add some more TODO comments * Add chadcn/ui * Scraped repositories from Source Graph * Create index.json * Scraped repositories from Source Graph * Add a draft of data table and display all the repos * Scraped repositories from Source Graph * Implement stars badges and description with overflow * Format the links to Github repos * Fix link clicking * Scraped repositories from Source Graph * Add simple pagination and stars column sorting * Scraped repositories from Source Graph * Implement basic searching * Scraped repositories from Source Graph * Implement a multiselect for dependencies * Scraped repositories from Source Graph * Implement actual filtering by dependencies * Scraped repositories from Source Graph * Add a workflow to deploy nextjs on github pages * Try fixing the deployment job * Enable static exports for app router * Fix uploading arifacts for nextjs job * Set base path to properly load JS and CSS * Fix the base path * Scraped repositories from Source Graph * Add header * Remove language version * Scraped repositories from Source Graph * Add some more TODOs * Scraped repositories from Source Graph * Adjust the pre-commit config * Fix pipelines * Scraped repositories from Source Graph * Add a footer * Create the indexes * Scraped repositories from Source Graph * Add more TODOs * Introduce minor footer adjustments * Adjust the scraping actions * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Implement query params state * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Do not commit query state on unmount * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Hopefully fix query states and multiselect input * Scraped repositories from Source Graph, parsed the dependencies, and generated the indexes * Extend the Makefile * Resolve most of TODOs * Resolve the conflicts with anyio version, bring back httpx * Adjust the Makefile and README.md * Fix a typo in README.md * Adjust readme * Fix the Makefile * Fix some stuff * Make some adjustments * Possibly fix failing scraping jobs * Load the repo project URL from env --------- Co-authored-by: vladfedoriuk <vladfedoriuk@users.noreply.github.com> Co-authored-by: Vladyslav Fedoriuk <vladyslav.fedoriuk@deployed.pl>
2023-10-28 19:39:02 +00:00
app = typer.Typer()
@app.command()
def scrape_repos() -> None:
"""
Scrape the FastAPI-related repositories utilizing the source graph API.
:return: None
"""
logger.info("Scraping the source graph repos.", enqueue=True)
asyncio.run(scrape_source_graph_repos())
@app.command()
def parse_dependencies() -> None:
"""
Parse the dependencies for all the repos in the database.
:return: None.
"""
logger.info(
"Parsing the dependencies for all the repos in the database.", enqueue=True
)
asyncio.run(parse_dependencies_for_repos())
if __name__ == "__main__":
app()