mysql - PDI or mysqldump to extract data without blocking the database nor getting inconsistent data? -


i have etl process run periodically. using kettle (pdi) extract data source database , copy stage database. use several transformations table input , table output steps. however, think inconsistent data if source database modified during process, since way don't snapshot of data. furthermore, don't know if source database blocked. problem if extraction takes minutes (and take them). advantage of pdi can select necessary columns , use timestamps new data.

by other hand, think mysqldump --single-transaction allows me data in consistent way , don't block source database (all tables innodb). disadventage innecessary data.

can use pdi, or need mysqldump?

pd: need read specific tables specific databases, think xtrabackup it's not option.

however, think inconsistent data if source database modified during process, since way don't snapshot of data

i think "table input" step doesn't take account modifications happening when reading. try simple experiment:

take .ktr file single table input , table output. try loading data target table. while in middle of data load, insert few records in source database. find records not read target table. (note tried postgresql db , number of rows read : 1000000)

now question, suggest using pdi since gives more control on data in terms of versioning, sequences, scds , dwbi related activities. pdi makes easier load stage env. rather dumping entire tables.

hope helps :)


Comments

Popular posts from this blog

html - Firefox flex bug applied to buttons? -

html - Missing border-right in select on Firefox -

c# - two queries in same method -