hadoop

how to write a regex expression extract hadoop mr counter data from stderr logfile

how to write a regex expression extract hadoop mr counter data from stderr logfile Question: how to write a regex expression extract hadoop mr counter data from stderr logfile, how to findall t line with tt line pair data, I wrote a regular (re.findall(r'(t[a-zA-Zs]+)n(.*?)ntw+’, text, re.S|re.M)) but it is not correct this is the stderr …

Total answers: 1

What is the int needed for in map(int, icount) in Pydoop

What is the int needed for in map(int, icount) in Pydoop Question: In the official Pydoop tutorial there is a word count example. I understand how it works, but I am wondering about the inner workings of map(int, icounts)). Do I follow correctly that icounts is a list of 1s? Where does the int come …

Total answers: 1

Pyspark cannot export large dataframe to csv. Session setup incorrect?

Pyspark cannot export large dataframe to csv. Session setup incorrect? Question: My session in pyspark 2.3: spark = SparkSession .builder .appName("test_app") .config(‘spark.executor.instances’,’4′) .config(‘spark.executor.cores’, ‘4’) .config(‘spark.executor.memory’, ’24g’) .config(‘spark.driver.maxResultSize’, ’24g’) .config(‘spark.rpc.message.maxSize’, ‘512’) .config(‘spark.yarn.executor.memoryOverhead’, ‘10000’) .enableHiveSupport() .getOrCreate() I work on cloudera with a 32GB RAM session and handle dataframes containing approx. 30,000,000 rows and up to 20 columns. …

Total answers: 3

How to Create HDFS file

How to Create HDFS file Question: I know it is possible to create directory HDFS with python using snakebite But I am looking to create a file on HDFS directory Asked By: Zak_Stack || Source Answers: You can use touchz to create an empty file on HDFS… I see rename command in the docs, which …

Total answers: 2

Error "PipeMapRed.waitOutputThreads(): subprocess failed with code 1" when accessing a list of lists by index on hadoop for mapreduce python program

Error "PipeMapRed.waitOutputThreads(): subprocess failed with code 1" when accessing a list of lists by index on hadoop for mapreduce python program Question: I wrote a mapreduce program to resolve matrix operation "X-MN" where M,N,X are matrices with integer values. In order to do that I need to have a list of lists. For instance: M=[[1,2,3],[4,5,6],[7,8,9]] …

Total answers: 1

Pycharm doesn't recognise Sqoop libraries

Pycharm doesn't recognise Sqoop libraries Question: I am on Pycharm trying to use Sqoop import job to load MySQL data in to HDFS. I downloaded this package on terminal pip install pysqoop I tried running this package from pysqoop.SqoopImport import Sqoop sqoop = Sqoop(help=True) code = sqoop.perform_import() This was the error /home/amel/PycharmProjects/pythonProject/venv/bin/python /home/amel/PycharmProjects/pythonProject/Hello.py Traceback (most …

Total answers: 2

Yield both max and min in a single mapreduce

Yield both max and min in a single mapreduce Question: I am a beginner just getting started with writing MapReduce programs in Python using MRJob library. One of the example worked out in the video tutorial is to find a max temperature by location_id. Following on from that writing another program to find the min …

Total answers: 3

How to restart a failed task on Airflow

How to restart a failed task on Airflow Question: I am using a LocalExecutor and my dag has 3 tasks where task(C) is dependant on task(A). Task(B) and task(A) can run in parallel something like below A–>C B So task(A) has failed and but task(B) ran fine. Task(C) is yet to run as task(A) has …

Total answers: 2