Python/web_programming/crawl_google_results.py
Christian Clauss a2fa32c7ad
Lukazlim: Replace dependency requests with httpx (#12744)
* Replace dependency `requests` with `httpx`

Fixes #12742
Signed-off-by: Lim, Lukaz Wei Hwang <lukaz.wei.hwang.lim@intel.com>

* updating DIRECTORY.md

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Lim, Lukaz Wei Hwang <lukaz.wei.hwang.lim@intel.com>
Co-authored-by: Lim, Lukaz Wei Hwang <lukaz.wei.hwang.lim@intel.com>
Co-authored-by: cclauss <cclauss@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-05-14 04:42:11 +03:00

39 lines
1002 B
Python

# /// script
# requires-python = ">=3.13"
# dependencies = [
# "beautifulsoup4",
# "fake-useragent",
# "httpx",
# ]
# ///
import sys
import webbrowser
import httpx
from bs4 import BeautifulSoup
from fake_useragent import UserAgent
if __name__ == "__main__":
print("Googling.....")
url = "https://www.google.com/search?q=" + " ".join(sys.argv[1:])
res = httpx.get(
url,
headers={"UserAgent": UserAgent().random},
timeout=10,
follow_redirects=True,
)
# res.raise_for_status()
with open("project1a.html", "wb") as out_file: # only for knowing the class
for data in res.iter_content(10000):
out_file.write(data)
soup = BeautifulSoup(res.text, "html.parser")
links = list(soup.select(".eZt8xd"))[:5]
print(len(links))
for link in links:
if link.text == "Maps":
webbrowser.open(link.get("href"))
else:
webbrowser.open(f"https://google.com{link.get('href')}")