aws-glue

How can I access data from a nested dynamic frame to properly format it in Pyspark?

How can I access data from a nested dynamic frame to properly format it in Pyspark? Question: I’ve uploaded some semi-structed data into AWS glue using a Dynamic frame. From the dynamic frame I just the payload element which I selected by executing the following code in a Glue notebook df_p = df.select_fields(["payload"]) I’m trying …

Total answers: 1

how to Read .Sql file stored in S3 containing multiple SQL statements

how to Read .Sql file stored in S3 containing multiple SQL statements Question: I have a .sql file stored in S3 location in AWS which contains multiple SQL statements separated by semi colon as below: Query1; _______________ Query2; _______________ Query3; tried using 2 methods in AWS Glue job to read this S3 .sql file but …

Total answers: 1

How to save result of for loop in existing input dataframe

How to save result of for loop in existing input dataframe Question: My input data frame df is below external_id sw_1 sw_2 sw_3 Sw_55 and my output data frame output_df should be external_id : Status sw_1 :Hello Sw_1 sw_2 :Hello sw_2 sw_3 :hello sw_3 Sw_55 :Hello sw_55 Till now I have done this. Able to …

Total answers: 1

AWS Glue error – Invalid input provided while running python shell program

AWS Glue error – Invalid input provided while running python shell program Question: I have Glue job, a python shell code. When I try to run it I end up getting the below error. Job Name : xxxxx Job Run Id : yyyyyy failed to execute with exception Internal service error : Invalid input provided …

Total answers: 6

AWS Glue Job Cloudformation – Values Set in Cloudformation Not Sticking

AWS Glue Job Cloudformation – Values Set in Cloudformation Not Sticking Question: Cloudformation Setup Below is not behaving as I expected. The following variables are not being set with the the template below. When the variables are set manually the job runs successfully. IAM Role Type Language Description: "AWS Glue Job Test" Resources: MyJobRole: Type: …

Total answers: 2

AWS Glue python shell Job fails with Internal Service error

AWS Glue python shell Job fails with Internal Service error Question: I am running a python shell program in AWS Glue but after running for around 10 minutes its failing with error Internal service error. The logs or error logs does not give any information. Most of the time it fails by just saying Internal …

Total answers: 1

Error while using "pd_writer" to write data to snowflake – 'not all arguments converted during string formatting'

Error while using "pd_writer" to write data to snowflake – 'not all arguments converted during string formatting' Question: Getting this error while calling the "pd_writer" method: DatabaseError("Execution failed on sql ‘SELECT name FROM sqlite_master WHERE type=’table’ AND name=?;’: not all arguments converted during string formatting",) def sf_To_df(self): sf_data = self.salesForceAuth.query_all(self.queryColumns) sf_data=dict(sf_data) try: cursor=snowflake.connector.connect(user=snowflake_user_path, ​ paccount=snowflake_account, …

Total answers: 1

Get tables from AWS Glue using boto3

Get tables from AWS Glue using boto3 Question: I need to harvest tables and column names from AWS Glue crawler metadata catalogue. I used boto3 but constantly getting number of 100 tables even though there are more. Setting up NextToken doesn’t help. Please help if possible. Desired results is list as follows: lst = [table_one.col_one, …

Total answers: 4

How to filter remove null values in spark python

How to filter remove null values in spark python Question: I’m trying to filter out the null values in a column and count if its greater than 1. badRows = df.filter($"_corrupt_record".isNotNull) if badRows.count > 0: logger.error("throwing bad rows exception…") schema_mismatch_exception(None, "cdc", item ) I’m getting a syntax error. Also tried to check using : badRows …

Total answers: 1