Sqoop Export a Long Text Field

I have a text field on HDFS that can have a very long value of more than 4000 characters length.

I had to export this to Oracle using Sqoop. In Oracle table, this field is defined with data type VARCHAR2(4000 BYTE).

I get an error when Sqoop’ing.

Caused by: java.sql.SQLException: ORA-01461: can bind a LONG value only for insert into a LONG column

Why?

Oracle treats value longer than the defined limit for that field as LONG. Hence the error. The error is not very informative though.

Solution:

Use a CLOB datatype for that field on Oracle. CLOB can store value longer than 4000 characters.

Don’t forget to add –map-column-java option to Sqoop export command. As there is no COLB type in Java/Hive. Hive has to know how to treat this value. Full command is shown below.

sqoop export --connect jdbc:oracle:thin:@hostname:1521/sid \
   --username user --password passwd \ 
   --table Oracle_Table_Name \
   --columns col1,col2,col3 \
   --map-column-java col3=String \
   --export-dir 'location' \
   --input-fields-terminated-by '\001' --input-lines-terminated-by "\n" \
   --input-null-string '\\N' --input-null-non-string '\\N'
Advertisements

Issues with Sqoop Export to Oracle

Sqoop is a tool used to move bulk data from an RDBMS Database like MySQL, Oracle, Postgres to HDFS (or AWS S3).

Couple of issues I faced with Sqoop export are summarized below.

Timestamp Format Issue

HDFS/S3 record has date and time format ‘2015-03-03T08:28:47.484Z’. This value goes to a column on Oracle transaction_datetime with a data type timestamp(9).

Got an error from Sqoop.

Caused by: java.lang.RuntimeException: Can't parse input data: '2015-03-03T08:28:47.484Z'
Caused by: java.lang.IllegalArgumentException: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff]

I could not find a direct solution with Sqoop. So what I did is to insert the data into Hive table by converting the date and time to a format expected by Oracle.

INSERT OVERWRITE TABLE hiveTable
SELECT
 id,
 from_unixtime(unix_timestamp(regexp_replace(transaction_datetime,'[TZ]',' ')), 'yyyy-MM-dd HH:mm:ss'),
 .
 .
FROM sourceHiveTable;

Now the date stored on HDFS files is ‘2015-03-03 08:28:47’ that would work for Oracle.

Null Columns Issue

HDFS/Hive stores NULL values as ‘\N’. This would be exported to Oracle as is; and that does not look good on Oracle.

Adding some options to Sqoop utility would help recognize NULLs correctly. The full command is shown below.

sqoop export --connect jdbc:oracle:thin:@dbhost:1521/ORCL --username user --password password \
 --table Table_In_Oracle \
 --columns col1, col2 \
 --export-dir 'src-directory' \
 --input-fields-terminated-by ',' --input-lines-terminated-by "\n" \
 --input-null-string '\\N' --input-null-non-string '\\N'

After running that command, now I see a value of ‘(null)’ for that column on Oracle.

That solves the problem little, but I’m hoping to see an empty value (”) in Oracle. Trying to find a solution…

Kill a Running Sqoop Job

Sometimes I start a Sqoop export job but wanted to kill it in the middle for some reason.

I write the above command in a shell script and run using nohup command.

nohup sqoop_export.sh > /tmp/job.out 2>&1 &

To kill it I get the process id using ‘ps -aef | grep nohup’ and kill using ‘kill <processid>’.

The job is killed but the problem? If you try to truncate the table on Oracle later, you get error like below. I tried to find the PID of that process on Oracle, but could not find what process locked it up, and I had to drop the table to quickly solve the problem.

ERROR at line 1:
ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired

The best way to kill Hadoop job is using ‘hadoop job’ command.

$ hadoop job -list
Total jobs:1
JobId     State     StartTime     UserName       Queue   Priority UsedContainers RsvdContainers UsedMem RsvdMem NeededMem   AM info
job_1452010418661_0082   RUNNING 1452198342153       userid     default     NORMAL               9               0   15360M       0M     15360M ...

$ hadoop job -kill job_1452010418661_008
Killed job job_1452010418661_0082