Quantcast
Channel: Active questions tagged mongodb-atlas - Stack Overflow
Viewing all articles
Browse latest Browse all 271

Why can't I connect to MongoDB Atlas with GitHub Actions?

$
0
0

I want to create a workflow that automatically scrapes reviews of an app from Google Play Store at a scheduled time every day and stores it in my collection in MongoDB Atlas. So first, I created a Python script called scraping_daily.py that will scrape 5,000 new reviews and filter out any that were previously collected. When I tested it and ran it manually, the script worked perfectly fine. Here's what the script looks like:

# Import librariesimport numpy as npimport pandas as pdfrom google_play_scraper import Sort, reviews, reviews_all, appfrom pymongo import MongoClient# Create a connection to MongoDBclient = MongoClient("mongodb+srv://<MY_USERNAME>:<MY_PASSWORD>@project1.lpu4kvx.mongodb.net/?retryWrites=true&w=majority")db = client["vidio"]collection = db["google_play_store_reviews"]# Load the data from MongoDBdf = pd.DataFrame(list(collection.find()))df = df.drop("_id", axis=1)df = df.sort_values("at", ascending=False)# Collect 5000 new reviewsresult = reviews("com.vidio.android",    lang="id",    country="id",    sort=Sort.NEWEST,    count=5000)new_reviews = pd.DataFrame(result[0])new_reviews = new_reviews.fillna("empty")# Filter the scraped reviews to exclude any that were previously collectedcommon = new_reviews.merge(df, on=["reviewId", "userName"])new_reviews_sliced = new_reviews[(~new_reviews.reviewId.isin(common.reviewId)) & (~new_reviews.userName.isin(common.userName))]# Update MongoDB with any new reviews that were not previously scrapedif len(new_reviews_sliced) > 0:    new_reviews_sliced_dict = new_reviews_sliced.to_dict("records")    batch_size = 1_000    num_records = len(new_reviews_sliced_dict)    num_batches = num_records // batch_size    if num_records % batch_size != 0:        num_batches += 1    for i in range(num_batches):        start_idx = i * batch_size        end_idx = min(start_idx + batch_size, num_records)        batch = new_reviews_sliced_dict[start_idx:end_idx]        if batch:            collection.insert_many(batch)

Next, I want to schedule my script using GitHub Actions. Just like what I followed from YouTube tutorials, I created an actions.yml file in the .github/workflows folder. Here's what the YAML file looks like:

name: Scraping Google Play Reviewson:  schedule:    - cron: 50 16 * * * # At 16:50 every dayjobs:  build:    runs-on: ubuntu-latest    steps:      - name: check out the repository content        uses: actions/checkout@v2      - name: set up python        uses: actions/setup-python@v4        with:          python-version: '3.10'      - name: install requirements        run:          python -m pip install --upgrade pip          pip install -r requirements.txt      - name: execute the script        run: python -m scraping_daily.py

However, it always throws an error when it executes my script. The error message is:

Traceback (most recent call last):  File "/opt/hostedtoolcache/Python/3.10.10/x64/lib/python3.10/runpy.py", line 187, in _run_module_as_main    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)  File "/opt/hostedtoolcache/Python/3.10.10/x64/lib/python3.10/runpy.py", line 110, in _get_module_details    __import__(pkg_name)  File "/home/runner/work/vidio_google_play_store_reviews/vidio_google_play_store_reviews/scraping_daily.py", line 16, in <module>    df = pd.DataFrame(list(collection.find()))  File "/opt/hostedtoolcache/Python/3.10.10/x64/lib/python3.10/site-packages/pymongo/cursor.py", line 1248, in next    if len(self.__data) or self._refresh():  File "/opt/hostedtoolcache/Python/3.10.10/x64/lib/python3.10/site-packages/pymongo/cursor.py", line 1139, in _refresh    self.__session = self.__collection.database.client._ensure_session()  File "/opt/hostedtoolcache/Python/3.10.10/x64/lib/python3.10/site-packages/pymongo/mongo_client.py", line 1740, in _ensure_session    return self.__start_session(True, causal_consistency=False)  File "/opt/hostedtoolcache/Python/3.10.10/x64/lib/python3.10/site-packages/pymongo/mongo_client.py", line 1685, in __start_session    self._topology._check_implicit_session_support()  File "/opt/hostedtoolcache/Python/3.10.10/x64/lib/python3.10/site-packages/pymongo/topology.py", line 538, in _check_implicit_session_support    self._check_session_support()  File "/opt/hostedtoolcache/Python/3.10.10/x64/lib/python3.10/site-packages/pymongo/topology.py", line 554, in _check_session_support    self._select_servers_loop(  File "/opt/hostedtoolcache/Python/3.10.10/x64/lib/python3.10/site-packages/pymongo/topology.py", line 238, in _select_servers_loop    raise ServerSelectionTimeoutError(pymongo.errors.ServerSelectionTimeoutError: ac-dc8axn9-shard-00-01.lpu4kvx.mongodb.net:27017: connection closed,ac-dc8axn9-shard-00-02.lpu4kvx.mongodb.net:27017: connection closed,ac-dc8axn9-shard-00-00.lpu4kvx.mongodb.net:27017: connection closed, Timeout: 300.0s, Topology Description: <TopologyDescription id: 641dd5b78e0efba394e00ffc, topology_type: ReplicaSetNoPrimary, servers: [<ServerDescription ('ac-dc8axn9-shard-00-00.lpu4kvx.mongodb.net', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('ac-dc8axn9-shard-00-00.lpu4kvx.mongodb.net:27017: connection closed')>, <ServerDescription ('ac-dc8axn9-shard-00-01.lpu4kvx.mongodb.net', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('ac-dc8axn9-shard-00-01.lpu4kvx.mongodb.net:27017: connection closed')>, <ServerDescription ('ac-dc8axn9-shard-00-02.lpu4kvx.mongodb.net', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('ac-dc8axn9-shard-00-02.lpu4kvx.mongodb.net:27017: connection closed')>]>Error: Process completed with exit code 1.

I tried to increase the timeout setting by adding serverSelectionTimeoutMS=300000 inside MongoClient(), but it still gave me the same error. How can I solve this?

By the way, I'm using a Windows machine (I'm not sure if it's useful information though).


Viewing all articles
Browse latest Browse all 271

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>