Common Encryption and Decryption Algorithm Feature Collection for Crawlers
Introduction
When developing web crawlers, encountering encrypted data is inevitable. Modern websites use various encryption algorithms to protect their data, making reverse engineering an essential skill for crawler developers. This article provides a comprehensive collection of common encryption algorithm features and their identification methods.
Why Understanding Encryption Matters
Encryption algorithms are widely used in web applications for:
- Data Protection: Securing sensitive information during transmission
- Anti-Crawling: Preventing automated data extraction
- Authentication: Verifying user identity and requests
- Data Integrity: Ensuring data hasn't been tampered with
As a crawler developer, understanding these algorithms helps you:
- Identify encryption types quickly
- Implement appropriate decryption methods
- Debug and troubleshoot encrypted requests
- Build more robust crawling solutions
Common Encryption Algorithms and Their Features
1. Base64 Encoding
Characteristics:
- Output contains only A-Z, a-z, 0-9, +, /, and = (padding)
- Length is always a multiple of 4
- Ends with 0-2 padding characters (=)
Python Implementation:
import base64
# Encoding
def encode_base64(text):
"""Encode text to Base64"""
encoded = base64.b64encode(text.encode('utf-8'))
return encoded.decode('utf-8')
# Decoding
def decode_base64(encoded_text):
"""Decode Base64 to text"""
decoded = base64.b64decode(encoded_text)
return decoded.decode('utf-8')
# Example
original = "Hello, Crawler!"
encoded = encode_base64(original)
print(f"Encoded: {encoded}") # SGVsbG8sIENyYXdsZXIh
decoded = decode_base64(encoded)
print(f"Decoded: {decoded}") # Hello, Crawler!Identification Tips:
- Look for strings ending with
=or== - Check if characters match Base64 alphabet
- Length divisible by 4
2. MD5 Hash
Characteristics:
- Always produces 32-character hexadecimal string
- Fixed output length (128 bits)
- One-way function (cannot be reversed)
Python Implementation:
import hashlib
def md5_hash(text):
"""Generate MD5 hash"""
return hashlib.md5(text.encode('utf-8')).hexdigest()
# Example
text = "crawler_secret_key"
hash_value = md5_hash(text)
print(f"MD5 Hash: {hash_value}") # 32-character hex stringIdentification Tips:
- 32-character hexadecimal string
- Lowercase letters and numbers (0-9, a-f)
- Used for checksums, signatures, and password hashing
3. SHA-256 Hash
Characteristics:
- 64-character hexadecimal string
- Fixed output length (256 bits)
- More secure than MD5
Python Implementation:
import hashlib
def sha256_hash(text):
"""Generate SHA-256 hash"""
return hashlib.sha256(text.encode('utf-8')).hexdigest()
# Example
text = "crawler_data"
hash_value = sha256_hash(text)
print(f"SHA-256 Hash: {hash_value}") # 64-character hex stringIdentification Tips:
- 64-character hexadecimal string
- Commonly used in modern applications
- More secure alternative to MD5
4. AES Encryption
Characteristics:
- Block cipher (128-bit blocks)
- Key sizes: 128, 192, or 256 bits
- Output is binary (often Base64 encoded)
Python Implementation:
from Crypto.Cipher import AES
from Crypto.Util.Padding import pad, unpad
import base64
def aes_encrypt(plaintext, key):
"""Encrypt using AES-256"""
cipher = AES.new(key, AES.MODE_CBC)
padded = pad(plaintext.encode('utf-8'), AES.block_size)
ciphertext = cipher.encrypt(padded)
iv = cipher.iv
return base64.b64encode(iv + ciphertext).decode('utf-8')
def aes_decrypt(ciphertext, key):
"""Decrypt AES-256 encrypted data"""
data = base64.b64decode(ciphertext)
iv = data[:16]
ciphertext = data[16:]
cipher = AES.new(key, AES.MODE_CBC, iv)
plaintext = unpad(cipher.decrypt(ciphertext), AES.block_size)
return plaintext.decode('utf-8')
# Example
key = b'12345678901234567890123456789012' # 32 bytes for AES-256
plaintext = "Crawler secret data"
encrypted = aes_encrypt(plaintext, key)
print(f"Encrypted: {encrypted}")
decrypted = aes_decrypt(encrypted, key)
print(f"Decrypted: {decrypted}")Identification Tips:
- Often Base64 encoded
- Requires key and IV (initialization vector)
- Block size is 16 bytes (128 bits)
5. RSA Encryption
Characteristics:
- Asymmetric encryption (public/private key pair)
- Variable output length
- Commonly used for key exchange
Python Implementation:
from Crypto.PublicKey import RSA
from Crypto.Cipher import PKCS1_v1_5
import base64
def rsa_encrypt(plaintext, public_key_pem):
"""Encrypt using RSA public key"""
key = RSA.import_key(public_key_pem)
cipher = PKCS1_v1_5.new(key)
ciphertext = cipher.encrypt(plaintext.encode('utf-8'))
return base64.b64encode(ciphertext).decode('utf-8')
def rsa_decrypt(ciphertext, private_key_pem):
"""Decrypt using RSA private key"""
key = RSA.import_key(private_key_pem)
cipher = PKCS1_v1_5.new(key)
data = base64.b64decode(ciphertext)
plaintext = cipher.decrypt(data, None)
return plaintext.decode('utf-8') if plaintext else NoneIdentification Tips:
- Requires public/private key pair
- Used for secure key exchange
- Often combined with symmetric encryption
Practical Crawler Examples
Example 1: Decoding Base64 API Responses
import requests
import base64
import json
def crawl_encrypted_api(url):
"""Crawl API with Base64 encoded responses"""
response = requests.get(url)
data = response.json()
# Decode Base64 encoded content
if 'encrypted_data' in data:
decoded = base64.b64decode(data['encrypted_data'])
content = json.loads(decoded.decode('utf-8'))
return content
return dataExample 2: Handling MD5 Signed Requests
import requests
import hashlib
import time
def create_signed_request(url, params, secret_key):
"""Create request with MD5 signature"""
# Add timestamp
params['timestamp'] = int(time.time())
# Sort parameters
sorted_params = sorted(params.items())
# Create signature string
sign_str = '&'.join([f"{k}={v}" for k, v in sorted_params])
sign_str += f"&key={secret_key}"
# Generate MD5 signature
signature = hashlib.md5(sign_str.encode('utf-8')).hexdigest()
params['sign'] = signature
# Make request
response = requests.get(url, params=params)
return response.json()Algorithm Identification Checklist
When analyzing encrypted data in crawlers:
Check Length:
- Base64: Multiple of 4
- MD5: 32 characters
- SHA-256: 64 characters
Check Character Set:
- Base64: A-Z, a-z, 0-9, +, /, =
- Hex: 0-9, a-f (lowercase)
Check Patterns:
- Padding characters (=)
- Fixed length outputs
- Encoding indicators
Use Tools:
- Browser DevTools (Network tab)
- Python
hashlibandbase64modules - Online decoders (for testing)
Best Practices
- Always Test Decryption: Verify your decryption works before implementing
- Handle Errors Gracefully: Encryption can fail - add error handling
- Respect Rate Limits: Don't abuse decryption endpoints
- Cache Results: Store decrypted data to avoid repeated operations
- Use Libraries: Leverage established libraries like
pycryptodome
Related Articles
- Reverse Engineering JS Webpack Tips for Crawlers - Advanced JavaScript reverse engineering
- Python Coroutines for Async Crawling - Efficient async crawling patterns
- Building RAG System with Golang - Data processing techniques
Conclusion
Understanding encryption algorithms is crucial for modern web crawlers. By recognizing algorithm features and implementing appropriate decryption methods, you can build more robust and effective crawling solutions. Remember to always respect website terms of service and use these techniques responsibly.
Key Takeaways:
- Base64 is encoding, not encryption
- MD5 and SHA are one-way hashes
- AES is symmetric encryption (needs key)
- RSA is asymmetric encryption (needs key pair)
- Always test your decryption implementations
"The best crawler is one that can adapt to different encryption schemes." — PFinal南丞

