timwhitez starred fake-useragent
2024-7-4 12:56:23 Author: github.com(查看原文) 阅读量:3 收藏

Test & Deploy fake-useragent Ruff linter CodeQL

Up-to-date simple useragent faker with real world database.

Features

  • Data is pre-downloaded from https://user-agents.net/ and the data is part of the package
  • The data consists of the current browser versions or one version lower
  • Retrieves user-agent strings locally (both desktop and mobile UAs)
  • Retrieve user-agent Python dictionary
  • Supports Python 3.x

Installation

pip install fake-useragent

Or if you have multiple Python / pip versions installed, use pip3:

pip3 install fake-useragent

Usage

Simple usage examples below, see also next chapters in this readme for more advanced usages:

from fake_useragent import UserAgent
ua = UserAgent()

# Get a random browser user-agent string
print(ua.random)

# Or get user-agent string from a specific browser
print(ua.chrome)
# Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36
print(ua.google)
# Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_4) AppleWebKit/537.13 (KHTML, like Gecko) Chrome/24.0.1290.1 Safari/537.13
print(ua['google chrome'])
# Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36
print(ua.firefox)
# Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/115.0
print(ua.ff)
# Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0
print(ua.safari)
# Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.2 Safari/605.1.15

Additional usage

Additional features that fake-useragent now offers since v1.2.0.

If you want to specify your own browser list, you can do that via the browsers argument (default is: ["chrome", "edge", "firefox", "safari"]).
This example will only return random user-agents from Edge and Chrome:

from fake_useragent import UserAgent
ua = UserAgent(browsers=['edge', 'chrome'])
ua.random

Note: Fakeuser-agent knowns about: Chrome, Edge, Firefox and Safari. Other browsers are not popular enough and aren't part of our dataset we use.


If you want to specify your own operating systems, you can do that via the os argument (default is: ["windows", "macos", "linux"]).
In this example you will only get Linux user-agents back:

from fake_useragent import UserAgent
ua = UserAgent(os='linux')
ua.random

You can also specify the type of platforms you want to use, you can do that via the platforms argument (default is ["pc", "mobile", "tablet"]. This example will only return random user-agents from a mobile device:

from fake_useragent import UserAgent
ua = UserAgent(platforms='mobile')
ua.random

If you want to return more recent user-agent strings, you can play with the min_version argument (default is: 0.0, meaning all user agents will match).
In this example you get only user agents that have a minimum version of 120.0:

from fake_useragent import UserAgent
ua = UserAgent(min_version=120.0)
ua.random

For backwards compatibility, a minimum usage percentage can still be specified with the min_percentage argument. However, the current list of user agents does not contain this statistic. Therefore all of the user-agents will match.


Hint: Of-course you can combine all those arguments to you liking!

User-agent Python Dictionary

Since version 1.3.0 we now also offer you the following "get" properties which return the whole Python dictionary of the UA, instead of only the user-agent string:

Warning Raw JSON objects (in a Python dictionaries) are returned "as is". Meaning, this data structure could change in the future!

Be aware that these "get" properties below might not return the same key/value pairs in the future. Use ua.random or alike as mentioned above, if you want to use a stable interface.

from fake_useragent import UserAgent
ua = UserAgent()

# Random user-agent dictionary (object)
ua.getRandom
# {'percent': 0.8, 'useragent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36 Edg/116.0.1938.76', 'system': 'Edge 116.0 Win10', 'browser': 'edge', 'version': 116.0, 'os': 'win10'}

# More get properties:
ua.getFirefox
# {'percent': 0.3, 'useragent': 'Mozilla/5.0 (Windows NT 10.0; rv:109.0) Gecko/20100101 Firefox/118.0', 'system': 'Firefox 118.0 Win10', 'browser': 'firefox', 'version': 118.0, 'os': 'win10'}
ua.getChrome
ua.getSafari
ua.getEdge

# And a method with an argument.
# This is exactly the same as using: ua.getFirefox
ua.getBrowser('firefox')

Notes

You can override the fallback string using the fallback parameter, in very rare cases something failed:

import fake_useragent

ua = fake_useragent.UserAgent(fallback='your favorite Browser')
# in case if something went wrong, one more time it is REALLY!!! rare case
ua.random == 'your favorite Browser'

If you will try to get unknown browser:

from fake_useragent import UserAgent
ua = UserAgent()
print(ua.unknown)
#Error occurred during getting browser: randm, but was suppressed with fallback.
#Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36 Edg/122.0.0.0

If you need to safe some attributes from overriding them in UserAgent by __getattr__ method use safe_attrs you can pass there attributes names. At least this will prevent you from raising FakeUserAgentError when attribute not found.

For example, when using fakeuseragent with injections <https://github.com/tailhook/injections> you need to:

import fake_useragent

ua = fake_useragent.UserAgent(safe_attrs=('__injections__',))

Please, do not use if you don't understand why you need this. This is magic for rarely extreme case.

Experiencing issues?

Make sure that you using latest version!

pip install --upgrade fake-useragent

Or if that isn't working, try to install the latest package version like this (1.5.1 is an example, check what the latest version is on PyPi):

pip install fake-useragent==1.5.1

Check version via the Python console:

import fake_useragent

print(fake_useragent.VERSION)

And you are always welcome to post issues.

Please do not forget to mention the version that you are using.

For Developers

Since GitHub Actions is unable to reach willshouse.com and has Cloudflare protection. We can run the script below to automatically scrape the user-agent strings from the external data source. The script will copy the JSONlines file to the src/fake_useragent/data directory. Execute:

The data JSON file is part of the Python package, see pyproject.toml. Read more about Data files support.

Python Virtual Environment

We encourage to use Python virtual environment before installing Pip packages, like so:

python -m virtualenv env
source env/bin/activate

Tests

pip install -r requirements.txt
tox

Linting

To fix imports using ruff:

pip install -r requirements.txt
ruff check --select="I" --fix .

Fix black code formatting errors:

pip install -r requirements.txt
black .

Note: When ruff v1.0 releases, we most likely move fully towards ruff instead of black.

Changelog

  • 1.5.1 March 16, 2024

    • Remove trailing spaces in user agent strings
  • 1.5.0 March 8, 2024

    • Migrated to new user-agent data source (thanks @BoudewijnZwart), backwards compatible API.
    • Update all pip package dependencies to latest stable versions
  • 1.4.0 November 24, 2023

    • Update all PIP packages
    • Support Python 3.12 (thanks @vladkens)
    • Fix package conflict in cache scraper
    • Improve ruff CLI calls
  • 1.3.0 October 2, 2023

    • Introducing new ua.getRandom, ua.getFirefox, ua.getChrome, ua.getSafari. And a generic method: ua.getBrowser(..) (eg. getBrowser('firefox'))
      • These new properties above allows you to retrieve the whole raw Python dictionary, instead of only the UA string.
      • These properties might return different key/values pairs in the future!
    • Fix the os argument 'windows' to check for both win10and win7 values (previously only checking on win10), thus returning more UAs
    • Improved user-agent scraper (now also containing Safari browser again)
    • Updated browsers.json data file
  • 1.2.1 August 2, 2023

    • Small improvements in the min_percentage check
    • Update all Pip package dependencies
  • 1.2.0 August 2, 2023

    • Updated browser useragent data
    • Allow filters on browser, OS and usage percentage
    • Update the cache scraper to scape the new data source for user-agent strings
    • Adapted the code to work with the new JSON data format
    • Parameter use_external_data=True and verify_ssl are removed. If you use those parameters, just remove it in your code!
  • 1.1.3 March 20, 2023

    • Update dependencies
  • 1.1.2 February 8, 2023

    • Security fixes
  • 1.1.1 December 4, 2022

    • Remove whitespaces from user agent strings, this is a patch release
  • 1.1.0 November 26, 2022

    • Add pkg_resource as fallback mechanism in trying to retrieve the local JSON data file
  • 1.0.1 November 10, 2022

    • Add importlib-metadata & importlib-resources as dependencies
    • Check on specific Python version regarding the importlib resources (python v3.10 or higher) in order to have files() working
    • importlib_metadata should now also work on Python version before 3.8
    • Remove obsolete MANIFEST.in file
  • 1.0.0 November 17, 2022

    • Make the JSON Lines data file part of the Python package, data is retrieved locally
      • Extend the myproject.toml file with package-data support
    • Remove centralized caching server implementation
    • Make real unit-tests which should run reliable, fast, independent and without Internet connection
  • 0.1.14 November 5, 2022

    • Improve code quality standards using modern Python >=3.7 syntax
    • Migrated to pyproject.toml build system format + syntax check
    • Add additional classifiers to the toml file
    • Improved tox.ini file
    • Improved GitHub Actions job using pip cache
    • And various small fixes
  • 0.1.13 October 21, 2022

    • Implement browsers argument, allowing you to override the browser names you want to use
    • Fix browser listing of Internet Explorer and Edge
    • Don't depend on w3schools.com anymore
    • Clean-up data (temp) file format
    • Update fallback cache server URL / use JSON Lines as file format
    • Move to GitHub Actions instead of Travis
    • Using black Python formatter in favour of Flake
  • 0.1.12 March 31, 2022

    • forked
  • 0.1.11 October 4, 2018

    • moved s3 + cloudfront fallback to heroku.com, cuz someone from Florida did ~25M requests last month
  • 0.1.10 February 11, 2018

    • Minor fix docs cloudfront url
  • 0.1.9 February 11, 2018

    • fix w3schools.com renamed IE/Edge to Edge/IE
    • moved heroku.com fallback to s3 + cloudfront
    • stop testing Python3.3 and pypy
  • 0.1.8 November 2, 2017

    • fix useragentstring.com Can't connect to local MySQL server through socket
  • 0.1.7 April 2, 2017

    • fix broken README.rst
  • 0.1.6 April 2, 2017

    • fixes bug use_cache_server do not affected anything
    • w3schools.com <https://www.w3schools.com/browsers/browsers_stats.asp>_ moved to https
    • verify_ssl options added, by default it is True (urllib.urlopen ssl context for Python 2.7.9- and 3.4.3- is not supported)
  • 0.1.5 February 28, 2017

    • added ua.edge alias to Internet Explorer
    • w3schools.com starts displaying Edge statistic
    • Python 2.6 is not tested anymore
    • use_cache_server option added
    • Increased fake_useragent.settings.HTTP_TIMEOUT to 5 seconds
  • 0.1.4 December 14, 2016

    • Added custom data file location support
    • Added fallback browser support, in case of unavailable data sources
    • Added alias fake_useragent.FakeUserAgent for fake_useragent.UserAgent
    • Added alias fake_useragent.UserAgentError for fake_useragent.FakeUserAgentError
    • Reduced fake_useragent.settings.HTTP_TIMEOUT to 3 seconds
    • Started migration to new data file format
    • Simplified a lot 4+ years out of date code
    • Better thread/greenlet safety
    • Added verbose logging
    • Added safe_attrs for prevent overriding by __getattr__
  • 0.1.3 November 24, 2016

    • Added hosted data file, when remote services is unavailable
    • Raises fake_useragent.errors.FakeUserAgentError in case when there is not way to download data
    • Raises fake_useragent.errors.FakeUserAgentError instead of None in case of unknown browser
    • Added gevent.sleep support in gevent patched environment when trying to download data
  • X.X.X xxxxxxx xx, xxxx

    • xxxxx ?????

Authors

You can visit authors page.


文章来源: https://github.com/fake-useragent/fake-useragent
如有侵权请联系:admin#unsafe.sh