Try /Except inside for loop not behaving as expected

Question:

CODE :

def ValidateProxy(LIST_PROXIES):
    '''
    Checks if scraped proxies allow HTTPS connection
    '''

    for proxy in LIST_PROXIES:

        print('using', proxy)

        host, port = str(proxy).split(":")

        try:
            resp = requests.get('https://amazon.com', 
                                proxies=dict(https=f'socks5://{host}:{port}'),
                                timeout=6)

        except ConnectionError:
            print(proxy, 'REMOVED')
            LIST_PROXIES.remove(proxy)


    print(len(LIST_PROXIES), 'PROXIES GATHERED')

    if len(LIST_PROXIES) != 0:
        return LIST_PROXIES
    else:
        return None

INPUT :

['46.4.96.137:1080', '138.197.157.32:1080', '138.68.240.218:1080'.....] #15 proxies

OUTPUT :

using 46.4.96.137:1080
46.4.96.137:1080 REMOVED
using 138.68.240.218:1080
138.68.240.218:1080 REMOVED
using 207.154.231.213:1080
207.154.231.213:1080 REMOVED
using 198.199.120.102:1080
198.199.120.102:1080 REMOVED
using 88.198.24.108:1080
88.198.24.108:1080 REMOVED
using 188.226.141.211:1080
188.226.141.211:1080 REMOVED
using 92.222.180.156:1080
92.222.180.156:1080 REMOVED
using 183.233.183.70:1081
183.233.183.70:1081 REMOVED
7 PROXIES GATHERED # len(LIST_PROXIES) == 7, so 8 are removed which are printed above

MY DOUBTS :

  1. Why print('using', proxy) is not getting executed everytime ? (becuase input list has 15 items and this line is printed only 8 times)

  2. Are try and except both blocks getting executed everytime ? Becuase everytime REMOVED is printed on console.

  3. I want to function it like print('using', proxy) for every proxy and if ConnectionError then print(proxy, 'REMOVED') and remove that proxy from list.

EDIT : FULL INPUT

['46.4.96.137:1080', '138.197.157.32:1080', '138.68.240.218:1080', '162.243.108.129:1080', '207.154.231.213:1080', '176.9.119.170:1080', '198.199.120.102:1080', '176.9.75.42:1080', '88.198.24.108:1080', '188.226.141.61:1080', '188.226.141.211:1080', '125.124.185.167:38801', '92.222.180.156:1080', '188.166.83.17:1080', '183.233.183.70:1081']
Asked By: hack3r-0m

||

Answers:

The issue is caused by the fact that you are mutating the list whilst you are still looping over it in this line.

LIST_PROXIES.remove(proxy)

This means that just before the for loop looks for the ‘next’ item in the list, the ‘next’ item moves left in the list and therefore is missed completely.

Check out this question/answer:
strange result when removing item from a list

Answered By: SimonN

You are removing items from a list you are iterating over. NOT GOOD. You should iterate over a copy of the list, leaving you free to modify the original. Simply replace for proxy in LIST_PROXIES: with for proxy in list(LIST_PROXIES):

Answered By: Sy Ker

Edit 2022-08-09

I would separate the logic into two functions. Also, please follow PEP-8 (I did not point that in the original answer)

from typing import Iterable

import requests

def is_valid_proxy(proxy: str) -> bool:
    try:
        requests.get(
            'https://amazon.com',
             proxies={'https': f'socks5://{proxy}'},
             timeout=6,
        )
        return True
    except ConnectionError:
        return False


def get_valid_proxies(proxies: Iterable[str]) -> list[str]:
    return [proxy for proxy in proxies if is_valid_proxy(proxy)]

Instead of printing to stdout, you could use the logging module.

Original Answer

The problem is you are iterating over the LIST_PROXIES and removing elements from it at the same time.

If you only want to iterate over the LIST_PROXIES once, something like this could work:

def ValidateProxy(LIST_PROXIES):
    index = 0 
    for i in range(len(LIST_PROXIES)):
        proxy = LIST_PROXIES[index]
        print('using', proxy)
        host, port = str(proxy).split(":")
        try:
            resp = requests.get('https://amazon.com', 
                                proxies=dict(https=f'socks5://{host}:{port}'),
                                timeout=6)
            index += 1
        except ConnectionError:
            print(proxy, 'REMOVED')
            LIST_PROXIES.pop(index) # Index is not incremented
    print(len(LIST_PROXIES), 'PROXIES GATHERED')
    if len(LIST_PROXIES) != 0:
        return LIST_PROXIES
    else:
        return None

However, if iterating over the list twice is not a problem, you can just make a copy of the list, as Sy Ker pointed out.

Answered By: Miguel Alorda
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.