Fetching data living on Google Drive sounds like piece of cake, especially having native Talend connector. NOT!

Google provides a decent way for developers to access their API using step by step wizard. But in case of automating data flow using Talend you have to go against the stream in few points. There are several tutorials in the Internet showing super complicated ways of obtaining tokens for the application, refreshing it, handling errors, reporting etc.. But what I wanted was an easy script to generate a token (which I know will not change in the future) to handle simple task of uploading single file to database without all bells and whistles.

To pull Google Sheet from Google drive in a programmatic way using Talend you will need:

  • access to Google Developer Console
  • client id token (you will only use it once -during setup)
  • refresh token

Creating user (and it’s token)

To get all necessary tokens, you need to create a new project in the Google Developer Console. Go to https://console.developers.google.com/ and create new project …

…and name it in the way you know what it’s about in couple months 😛

Then select the newly created project from the top menu and access the credentials tab on the left

What you have to do right now is to create Oath 2.0 Client id. You should select that option from the Create credentials menu but you will be most likely forced to create OAuth consent screen:

Consent Screen is the typical oath screen you can see in most Google Auth backed applications. This allows app to act on behalf of your Google account. In our case, as we will not show any dialog and rather use hardcoded token, it’s not a big deal for us. However this screen has to be setup for us to move forward. After completing this step, you can continue to create credentials.

Select Web Application (yes, that’s right..) and make sure that for both Authorized JavaScript origins and Authorized redirect URIs you enter:  http://localhost. 

If user creation is sucessful, you will receive your client ID and secret

Ok, we are closer to the end 🙂 let’s create tokens.

Create refresh token

By default Google issues an access token which allows you to download the content (Google Sheet in our case) from it’s repositories. Problem is, this token expires very quickly and requires either manual refresh or couple additional steps in the program to obtain. Fortunately, you can generate refresh token which can be (guess what) refreshed in Talend without unnecessary hassle.

To achieve what we want, we need to run this URL in the browser

https://accounts.google.com/o/oauth2/auth?access_type=offline&client_id=your client id&scope=https://www.googleapis.com/auth/drive&response_type=code&redirect_uri=https://localhost

your client ID is the identifier you received in the OAuth client creation. For example it will look like this:

https://accounts.google.com/o/oauth2/auth?access_type=offline&client_id=695230079990.apps.googleusercontent.com&scope=https://www.googleapis.com/auth/drive&response_type=code&redirect_uri=https://localhost

You will be asked to login to google (consent screen) and after you login, you should see error message saying that page cannot be reached. That’s the localhost we entered during the client id setup. Note the browser url:

Yes, everything after code= is your refresh token you will use in the Talend application. Be aware that this code is show only once so make sure that you have it stored somewhere!

Setup Google Sheet download job

Let’s setup Talend job to make this whole thing running!

As you can see, whole flow is pretty simple and consists of

  • getting access token using refresh token
  • querying and downloading files from GDrive

Getting access token

tRESTClient performs query over the GDrive api to get the access token

you require client id, client secret and refresh token you obtained in the previous steps. Also note the grant type and redirect_uri. Those should be specified as on the screenshot above if you setup API as I did in the this tutorial.

tExtractXMLField is pretty straightforward. We need to extract new access token from the response of the API and then store it in global variable and later on update context. I am pretty sure that adding access token to the global variable can be omitted but I was to lazy to change it 😛

Querying and downloading files from GDrive

Having new access token in context, we can proceed with downloading files. First let’s setup tGoogleDriveConnection with the token. Note the application name. This should match with your API access setup.

And then use this connection to search specific folder

I then used if condition for simple name filtering as my folder might accidentaly contain some random files and I don’t want to parse them

And then I am ready to download the file to desired location

And that’s it. Simple Google Sheet downloading using Talend and Google Drive. Without all those fancy stuff – just working as expected.