Skip to content

Commit 3e1a120

Browse files
authored
Merge pull request #8 from proxymesh/feature/cloudscraper-extension
Add cloudscraper extension for proxy header support
2 parents 7a92676 + bf8c175 commit 3e1a120

4 files changed

Lines changed: 397 additions & 0 deletions

File tree

docs/cloudscraper.rst

Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
CloudScraper
2+
============
3+
4+
The ``cloudscraper_proxy`` module provides proxy header support for CloudScraper.
5+
6+
Installation
7+
------------
8+
9+
First, install CloudScraper::
10+
11+
pip install cloudscraper
12+
13+
Then you can use the proxy header extension.
14+
15+
Usage
16+
-----
17+
18+
Using create_scraper()
19+
~~~~~~~~~~~~~~~~~~~~~~
20+
21+
The ``create_scraper()`` function is a drop-in replacement for ``cloudscraper.create_scraper()``
22+
that adds proxy header capabilities:
23+
24+
.. code-block:: python
25+
26+
from python_proxy_headers.cloudscraper_proxy import create_scraper
27+
28+
# Create a scraper with proxy headers
29+
scraper = create_scraper(
30+
proxy_headers={'X-ProxyMesh-Country': 'US'},
31+
browser='chrome'
32+
)
33+
34+
# Set proxy
35+
scraper.proxies = {'https': 'http://user:pass@proxy.example.com:8080'}
36+
37+
# Make requests - proxy headers are automatically sent
38+
response = scraper.get('https://httpbin.org/ip')
39+
print(response.text)
40+
41+
Using ProxyCloudScraper Class
42+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
43+
44+
You can also use the ``ProxyCloudScraper`` class directly:
45+
46+
.. code-block:: python
47+
48+
from python_proxy_headers.cloudscraper_proxy import ProxyCloudScraper
49+
50+
scraper = ProxyCloudScraper(
51+
proxy_headers={'X-Custom-Header': 'value'},
52+
enable_stealth=True
53+
)
54+
55+
scraper.proxies = {'https': 'http://proxy.example.com:8080'}
56+
response = scraper.get('https://example.com')
57+
58+
Updating Proxy Headers
59+
~~~~~~~~~~~~~~~~~~~~~~
60+
61+
You can update proxy headers after creating the scraper:
62+
63+
.. code-block:: python
64+
65+
scraper = create_scraper(proxy_headers={'X-Header': 'initial'})
66+
67+
# Later, update headers
68+
scraper.set_proxy_headers({'X-Header': 'updated', 'X-New': 'value'})
69+
70+
All CloudScraper Features Preserved
71+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
72+
73+
The extension preserves all CloudScraper features:
74+
75+
- Cloudflare bypass (v1, v2, v3, Turnstile)
76+
- Browser emulation and user agent handling
77+
- Cipher suite customization
78+
- Proxy rotation
79+
- Stealth mode
80+
- Session management
81+
82+
.. code-block:: python
83+
84+
scraper = create_scraper(
85+
proxy_headers={'X-ProxyMesh-Country': 'US'},
86+
browser='chrome',
87+
enable_stealth=True,
88+
stealth_options={
89+
'min_delay': 1.0,
90+
'max_delay': 3.0,
91+
'human_like_delays': True
92+
}
93+
)
94+
95+
API Reference
96+
-------------
97+
98+
create_scraper()
99+
~~~~~~~~~~~~~~~~
100+
101+
.. py:function:: create_scraper(proxy_headers=None, sess=None, **kwargs)
102+
103+
Create a CloudScraper with proxy header support.
104+
105+
:param proxy_headers: Dict of headers to send to proxy servers
106+
:param sess: Existing session to copy attributes from
107+
:param kwargs: All other arguments passed to CloudScraper
108+
:returns: ProxyCloudScraper instance
109+
110+
ProxyCloudScraper Class
111+
~~~~~~~~~~~~~~~~~~~~~~~
112+
113+
.. py:class:: ProxyCloudScraper(proxy_headers=None, **kwargs)
114+
115+
CloudScraper subclass with proxy header support.
116+
117+
Inherits all methods and attributes from ``cloudscraper.CloudScraper``.
118+
119+
:param proxy_headers: Dict of headers to send to proxy servers
120+
:param kwargs: All other arguments passed to CloudScraper
121+
122+
.. py:method:: set_proxy_headers(proxy_headers)
123+
124+
Update the proxy headers and remount adapters.
125+
126+
:param proxy_headers: New proxy headers to use

docs/index.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ We currently provide extensions to the following packages:
1515
* :doc:`aiohttp <aiohttp>` - Async HTTP client/server framework
1616
* :doc:`httpx <httpx>` - Modern HTTP client library
1717
* :doc:`pycurl <pycurl>` - Python interface to libcurl
18+
* :doc:`cloudscraper <cloudscraper>` - Cloudflare bypass library
1819
* :doc:`autoscraper <autoscraper>` - Smart automatic web scraper
1920

2021
Purpose
@@ -53,6 +54,7 @@ Contents
5354
aiohttp
5455
httpx
5556
pycurl
57+
cloudscraper
5658
autoscraper
5759

5860
Indices and tables
Lines changed: 213 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,213 @@
1+
"""
2+
CloudScraper extension for sending and receiving proxy headers.
3+
4+
This module provides a CloudScraper subclass that enables:
5+
1. Sending custom headers to proxy servers during CONNECT
6+
2. Capturing response headers from proxy servers
7+
8+
Example usage:
9+
from python_proxy_headers.cloudscraper_proxy import create_scraper
10+
11+
scraper = create_scraper(proxy_headers={'X-ProxyMesh-Country': 'US'})
12+
scraper.proxies = {'https': 'http://proxy:8080'}
13+
response = scraper.get('https://example.com')
14+
15+
# Access proxy response headers (stored on the response object)
16+
print(response.proxy_headers)
17+
"""
18+
19+
from typing import Dict, Optional, Any
20+
21+
try:
22+
import cloudscraper
23+
from cloudscraper import CipherSuiteAdapter
24+
except ImportError:
25+
raise ImportError(
26+
"cloudscraper is required for this module. "
27+
"Install it with: pip install cloudscraper"
28+
)
29+
30+
from .urllib3_proxy_manager import proxy_from_url
31+
32+
33+
class CipherSuiteProxyHeaderAdapter(CipherSuiteAdapter):
34+
"""
35+
Combines CloudScraper's CipherSuiteAdapter with proxy header support.
36+
37+
This adapter:
38+
- Maintains CloudScraper's TLS/cipher suite customization
39+
- Adds the ability to send custom headers to proxy servers
40+
- Uses our custom ProxyManager that captures proxy response headers
41+
"""
42+
43+
def __init__(self, proxy_headers: Optional[Dict[str, str]] = None, **kwargs):
44+
self._proxy_headers = proxy_headers or {}
45+
super().__init__(**kwargs)
46+
47+
def proxy_manager_for(self, proxy, **proxy_kwargs):
48+
"""
49+
Return a ProxyManager for the given proxy with custom header support.
50+
51+
Overrides the default proxy_manager_for to use our custom ProxyManager
52+
that supports sending and receiving proxy headers.
53+
"""
54+
if proxy in self.proxy_manager:
55+
manager = self.proxy_manager[proxy]
56+
elif proxy.lower().startswith("socks"):
57+
# SOCKS proxies don't support custom headers
58+
return super().proxy_manager_for(proxy, **proxy_kwargs)
59+
else:
60+
# Get standard proxy headers (e.g., Proxy-Authorization)
61+
_proxy_headers = self.proxy_headers(proxy)
62+
63+
# Merge with our custom proxy headers
64+
if self._proxy_headers:
65+
_proxy_headers.update(self._proxy_headers)
66+
67+
# Pass SSL context if available
68+
if hasattr(self, 'ssl_context') and self.ssl_context:
69+
proxy_kwargs['ssl_context'] = self.ssl_context
70+
71+
if hasattr(self, 'source_address') and self.source_address:
72+
proxy_kwargs['source_address'] = self.source_address
73+
74+
manager = self.proxy_manager[proxy] = proxy_from_url(
75+
proxy,
76+
proxy_headers=_proxy_headers,
77+
num_pools=self._pool_connections,
78+
maxsize=self._pool_maxsize,
79+
block=self._pool_block,
80+
**proxy_kwargs,
81+
)
82+
83+
return manager
84+
85+
86+
class ProxyCloudScraper(cloudscraper.CloudScraper):
87+
"""
88+
CloudScraper with proxy header support.
89+
90+
This class extends CloudScraper to add the ability to:
91+
- Send custom headers to proxy servers during CONNECT tunneling
92+
- Receive and access headers from proxy server responses
93+
94+
Args:
95+
proxy_headers: Dict of headers to send to proxy servers
96+
**kwargs: All other arguments passed to CloudScraper
97+
98+
Example:
99+
scraper = ProxyCloudScraper(proxy_headers={'X-ProxyMesh-Country': 'US'})
100+
scraper.proxies = {'https': 'http://proxy.example.com:8080'}
101+
response = scraper.get('https://httpbin.org/ip')
102+
print(response.proxy_headers) # Headers from proxy CONNECT response
103+
"""
104+
105+
def __init__(self, proxy_headers: Optional[Dict[str, str]] = None, **kwargs):
106+
self._proxy_headers = proxy_headers or {}
107+
108+
# Call parent init
109+
super().__init__(**kwargs)
110+
111+
# Replace the HTTPS adapter with our proxy-header-aware version
112+
# We need to preserve the cipher suite settings from the parent
113+
self.mount(
114+
'https://',
115+
CipherSuiteProxyHeaderAdapter(
116+
proxy_headers=self._proxy_headers,
117+
cipherSuite=self.cipherSuite,
118+
ecdhCurve=getattr(self, 'ecdhCurve', 'prime256v1'),
119+
server_hostname=getattr(self, 'server_hostname', None),
120+
source_address=getattr(self, 'source_address', None),
121+
ssl_context=getattr(self, 'ssl_context', None)
122+
)
123+
)
124+
125+
# Also mount for HTTP (though proxy headers are mainly for HTTPS CONNECT)
126+
self.mount(
127+
'http://',
128+
CipherSuiteProxyHeaderAdapter(
129+
proxy_headers=self._proxy_headers,
130+
cipherSuite=self.cipherSuite,
131+
ecdhCurve=getattr(self, 'ecdhCurve', 'prime256v1'),
132+
server_hostname=getattr(self, 'server_hostname', None),
133+
source_address=getattr(self, 'source_address', None),
134+
ssl_context=getattr(self, 'ssl_context', None)
135+
)
136+
)
137+
138+
def set_proxy_headers(self, proxy_headers: Dict[str, str]):
139+
"""
140+
Update the proxy headers and remount adapters.
141+
142+
Args:
143+
proxy_headers: New proxy headers to use
144+
"""
145+
self._proxy_headers = proxy_headers
146+
147+
# Remount adapters with new headers
148+
self.mount(
149+
'https://',
150+
CipherSuiteProxyHeaderAdapter(
151+
proxy_headers=self._proxy_headers,
152+
cipherSuite=self.cipherSuite,
153+
ecdhCurve=getattr(self, 'ecdhCurve', 'prime256v1'),
154+
server_hostname=getattr(self, 'server_hostname', None),
155+
source_address=getattr(self, 'source_address', None),
156+
ssl_context=getattr(self, 'ssl_context', None)
157+
)
158+
)
159+
self.mount(
160+
'http://',
161+
CipherSuiteProxyHeaderAdapter(
162+
proxy_headers=self._proxy_headers,
163+
cipherSuite=self.cipherSuite,
164+
ecdhCurve=getattr(self, 'ecdhCurve', 'prime256v1'),
165+
server_hostname=getattr(self, 'server_hostname', None),
166+
source_address=getattr(self, 'source_address', None),
167+
ssl_context=getattr(self, 'ssl_context', None)
168+
)
169+
)
170+
171+
172+
def create_scraper(
173+
proxy_headers: Optional[Dict[str, str]] = None,
174+
sess: Optional[Any] = None,
175+
**kwargs
176+
) -> ProxyCloudScraper:
177+
"""
178+
Create a CloudScraper with proxy header support.
179+
180+
This is a drop-in replacement for cloudscraper.create_scraper() that
181+
adds proxy header capabilities.
182+
183+
Args:
184+
proxy_headers: Dict of headers to send to proxy servers
185+
sess: Existing session to copy attributes from
186+
**kwargs: All other arguments passed to CloudScraper
187+
188+
Returns:
189+
ProxyCloudScraper instance
190+
191+
Example:
192+
from python_proxy_headers.cloudscraper_proxy import create_scraper
193+
194+
scraper = create_scraper(
195+
proxy_headers={'X-ProxyMesh-Country': 'US'},
196+
browser='chrome'
197+
)
198+
scraper.proxies = {'https': 'http://proxy:8080'}
199+
response = scraper.get('https://example.com')
200+
"""
201+
scraper = ProxyCloudScraper(proxy_headers=proxy_headers, **kwargs)
202+
203+
if sess:
204+
for attr in ['auth', 'cert', 'cookies', 'headers', 'hooks', 'params', 'proxies', 'data']:
205+
val = getattr(sess, attr, None)
206+
if val is not None:
207+
setattr(scraper, attr, val)
208+
209+
return scraper
210+
211+
212+
# Convenience alias
213+
session = create_scraper

0 commit comments

Comments
 (0)