Downloading 1000s of images from Google and preparing them for image dataset generation.

April 9, 2018, 8:20 p.m. By: Kirti Bakshi

image dataset

This is a command line program in Python that is used in order to search keywords or key-phrases on Google Images and then optionally download images to your computer. The user can also invoke this script from another python file.

The program that is being talked about here is a small and ready-to-run program that requires no dependencies to be installed if in case the user only wants to download up to 100 images per keyword. If the user wishes to download more than 100 images per keyword, then there would be a need to install Selenium library along with chrome driver. You can find the Detailed instructions of the same in the troubleshooting section of the link mentioned in the end.

Moving onto Compatibility:

This ready-to-run program finds its compatibility with both the versions of Python - 2.x and 3.x (recommended). It is a download-and-run program with no changes to the file. All that a user has to do is to just specify the parameters through the command line.

Installation:

In order to download and use this repository, one can make the use of any one of the below-mentioned methods.

  • Using pip

  • Manually using CLI

  • Manually using UI

(Complete details on how to do so can be found on GitHub link mentioned.)

Arguments:

  • config_file: cf (Shorthand): You can pass the arguments inside a config file. This is an alternative to passing arguments on the command line directly.

  • Keyword: k(Shorthand): Denotes the keywords or key phrases you want to search for. For more than one keywords, you need to wrap it in single quotes.

  • keywords_from_file: kf(Shorthand): Denotes the file name from where you would want to import the keywords. Add only one keyword per line and kee in mind that Blank/Empty lines are truncated automatically.

  • prefix_keywords: pk(Shorthand): Denotes additional words that are added before the main keyword while making the search query.

  • suffix_keywords: sk(shorthand): Denotes additional words that are added after the main keyword while making the search query.

  • limit: l(shorthand): Denotes a number of images that you want to download. You can specify any integer value here. It will try and get from the Google image search page all the images that it finds.

  • related_images: ri(shorthand): This argument downloads a ton of images related to the keyword that has been provided.

  • format: f(shorthand): Denotes the format/extension of the image that you want to download. Possible values are jpg, gif, png, bmp, svg, webp, ico

  • color: co(shorthand): Denotes the colour filter that you want to apply to the images.

  • Possible values are: red, orange, yellow, green, teal, blue, purple, pink, white, grey, black, brown

  • color_type: ct(shorthand): Denotes the colour type you want to apply to the images.

Possible values: full-colour, black-and-white, transparent.

There were only a few among the list of arguments. For detailed info, one can go through the link mentioned in the end.

Now, moving onto the structure:

image dataset rule

In order to Contribute:

If anyone wants to contribute to this script, they are welcomed. Also, If you would like to make a change, you can open a pull request. For more issues as well as discussion, you can visit the Issue Tracker.

The ultimate aim of this repo is to keep it simple, stand-alone, backward compatible and 3rd party dependency proof.

Keep in mind:

This program allows you download a number of images from Google and Google Images is merely a search engine that indexes images and allows you to find them, hence does not produce its own images. So, do not download or use any image that violates its copyright terms and be very careful before its usage!

For more Information: GitHub