Apache Hudi 0.12.1发布

肖钟城
  • 大数据技术栈
  • Hudi
大约 12 分钟

Apache Hudi 0.12.1发布

    Release Notes - Apache Hudi - Version 0.12.1

Sub-task

  • [HUDI-4488] - Improve S3 File listing efficiency

Bug

  • [HUDI-1275] - Incremental TImeline Syncing causes compaction to fail with FileNotFound exception
  • [HUDI-2529] - Flaky test: ITTestHoodieFlinkCompactor.testHoodieFlinkCompactor:88
  • [HUDI-2780] - Mor reads the log file and skips the complete block as a bad block, resulting in data loss
  • [HUDI-3391] - presto and hive beeline fails to read MOR table w/ 2 or more array fields
  • [HUDI-3861] - 'path' in CatalogTable#properties failed to be updated when renaming table
  • [HUDI-3983] - ClassNotFoundException when using hudi-spark-bundle to write table with hbase index
  • [HUDI-3998] - getCommitsSinceLastCleaning failed when async cleaning
  • [HUDI-4136] - Run snapshot query in hive throw ' IOException: java.lang.IllegalArgumentException: HoodieRealtimeRecordReader can only work on RealtimeSplit and not with a empty file'. When not execut compaction plan
  • [HUDI-4193] - Fail to compile in osx aarch_64 environment
  • [HUDI-4199] - Clean up row writer path for url encoding, consistent logical timestamp
  • [HUDI-4237] - spark.sql.sources.schema.partCol.0 is non-empty in HiveMetaStore when create non-partition hudi table in Spark
  • [HUDI-4256] - Bulk insert of a large dataset with S3 fails w/ timeline server based markers
  • [HUDI-4282] - Throws IOException in method HoodieLogFileReader.isBlockCorrupted()
  • [HUDI-4326] - Hudi spark datasource error after migrate from 0.8 to 0.11
  • [HUDI-4340] - DeltaStreamer bootstrap failed when metrics on caused by DateTimeParseException: Text '00000000000001999' could not be parsed
  • [HUDI-4383] - Make hudi-flink-bundle module compile with the correct flink version
  • [HUDI-4412] - Multiple writers NPE when Insert_overwrite
  • [HUDI-4438] - Fix flaky TestCopyOnWriteActionExecutor.testPartitionMetafileFormat test
  • [HUDI-4451] - Multiple writer using insert_overwrite loses some hive partitions
  • [HUDI-4485] - Hudi cli got empty result for command show fsview all
  • [HUDI-4515] - savepoints will be clean in keeping latest versions policy
  • [HUDI-4538] - FIx hive sync lose parititons issue
  • [HUDI-4549] - hive sync bundle causes class loader issue
  • [HUDI-4555] - The behavior of "show fsview all" is confusing
  • [HUDI-4577] - Add more test coverage for Spark SQL, Spark Quickstart guide
  • [HUDI-4584] - SQLConf is not propagated correctly into RDDs
  • [HUDI-4601] - Read error from MOR table after compaction with timestamp partitioning
  • [HUDI-4615] - Fix empty commits being made by deltastreamer with S3EventsSource when there is no data in SQS on starting a new pipeline
  • [HUDI-4619] - The retry mechanism of remotehoodietablefilesystemview needs to be thread safe
  • [HUDI-4620] - No expected exception is thrown when create hudi table without primaryKey
  • [HUDI-4621] - Add validation that bucket index fields should be subset of primary keys
  • [HUDI-4637] - Release thread in RateLimiter is not terminated
  • [HUDI-4720] - HoodieInternalRow return wrong num of fields when source not contains meta fields
  • [HUDI-4729] - File group in pending compaction can not be queried when query ro table with spark
  • [HUDI-4730] - Fix batch job cannot clean old commits files
  • [HUDI-4736] - Fix inflight clean action preventing clean service to continue when multiple cleans are not allowed
  • [HUDI-4739] - Wrong value returned when length equals 1
  • [HUDI-4740] - Add metadata fields for hive catalog #createTable
  • [HUDI-4742] - Fixing AWS Glue partition's location is wrong when updatePartition
  • [HUDI-4757] - Enhance hudi-examples to add pyspark examples
  • [HUDI-4758] - Enhance validations for hudi-examples quick start for spark and pyspark
  • [HUDI-4759] - Fix website Quick start guide to add validations
  • [HUDI-4760] - Clustering results in repeated triggers of clustering execution
  • [HUDI-4762] - Hive sync update schema removes columns
  • [HUDI-4765] - Compared inserting data via spark-sql with spark-shell,_hoodie_record_key generation logic is different, which might affects data upsert
  • [HUDI-4766] - Fix HoodieFlinkClusteringJob
  • [HUDI-4775] - Incremental source for MOR fails
  • [HUDI-4776] - missing specify value for the preCombineField when use merge into
  • [HUDI-4780] - hoodie.logfile.max.size It does not take effect, causing the log file to be too large
  • [HUDI-4793] - Fix ScalaTest not respecting Log4j2 configs
  • [HUDI-4795] - Fix KryoException when bulk insert into a not bucket index hudi table
  • [HUDI-4806] - Use Avro version from root pom file for Flink bundle
  • [HUDI-4807] - Use correct instant in metadata initialization
  • [HUDI-4808] - HoodieSimpleBucketIndex should also consider bucket num in log file not in base file which written by flink mor table
  • [HUDI-4810] - Fix Hudi bundles requiring log4j2 on the classpath
  • [HUDI-4813] - Infer keygen not work in sparksql side
  • [HUDI-4814] - Schedules new clustering plan based on latest clustering instant
  • [HUDI-4817] - Markers are not deleted after bootstrap operation
  • [HUDI-4825] - Commit metadata in Json contains redundant information
  • [HUDI-4830] - testNoGlobalConfFileConfigured will throw exception when add hudi-defaults.conf in DEFAULT_PATH
  • [HUDI-4831] - AWSDMSAvroPayload fails w/ null operation type after 0.10.1
  • [HUDI-4836] - Remove "hbase-default.xml" colliding w/ "hbase-site.xml" in Hudi bundles
  • [HUDI-4841] - Fix BlockLocation array sorting idempotency issue
  • [HUDI-4848] - Fix tooling for deprecated partition
  • [HUDI-4851] - Fix CSI not supporting InSet operator
  • [HUDI-4853] - Get field by name for OverwriteNonDefaultsWithLatestAvroPayload to avoid schema mismatch
  • [HUDI-4856] - Missing option for HoodieCatalogFactory
  • [HUDI-4860] - Presto/Trino Cannot parse partition value '\N' of type 'integer' for partition column
  • [HUDI-4861] - Relax MERGE INTO restrictions to permit casting of the matching condition
  • [HUDI-4879] - MERGE INTO fails when setting "hoodie.datasource.write.payload.class"
  • [HUDI-4883] - Fix delete savepoint for MOR table
  • [HUDI-4885] - docker demo fails w/ ClassNotFound w/ LogicalType in latest master
  • [HUDI-4892] - Fix hudi-spark3-bundle
  • [HUDI-4899] - Fix incompatibility w/ Spark 3.2.2
  • [HUDI-4906] - Fix the local tests for hudi-flink
  • [HUDI-4907] - Single commit multiple instant causing parquet files to be in wrong states
  • [HUDI-4913] - HoodieSnapshotExporter throws IllegalArgumentException: Wrong FS
  • [HUDI-4914] - Managed memory weight should be set when sort clustering is enabled
  • [HUDI-4923] - CI test flaky: TestHoodieReadClient.testReadFilterExistAfterBulkInsertPrepped
  • [HUDI-4924] - Dedup parallelism is not auto tuned based on input
  • [HUDI-4925] - Should Force to use ExpressionPayload in MergeIntoTableCommand
  • [HUDI-4934] - Cleaner cleans up files touched by clustering
  • [HUDI-4936] - as.of.instant not recognized as hoodie config
  • [HUDI-4938] - Clean action fails due to IllegalStateException: Duplicate key
  • [HUDI-4951] - An incorrect use of the Long method
  • [HUDI-4957] - Shade JOL in every bundle
  • [HUDI-4992] - Spark Row-writing Bulk Insert produces incorrect Bloom Filter metadata

New Feature

  • [HUDI-1271] - Add utility scripts to perform Restores
  • [HUDI-4782] - Support TIMESTAMP_LTZ type for flink

Improvement

  • [HUDI-74] - Improve compaction support in HoodieDeltaStreamer & CLI
  • [HUDI-3403] - Ensure immutable hudi configurations are set properly and not changed later
  • [HUDI-3425] - Clean up spill path created by Hudi during uneventful shutdown
  • [HUDI-3579] - Add timeline commands in hudi-cli
  • [HUDI-3780] - improve drop partitions
  • [HUDI-3959] - Rename class name for spark rdd reader
  • [HUDI-3994] - HoodieDeltaStreamer - Spark master shouldn't have a default
  • [HUDI-4010] - DynamoDB lock configs for naming/docs could be improved
  • [HUDI-4342] - Improve handling of 5xx in timeline server
  • [HUDI-4433] - Hudi-CLI repair deduplicate not working with non-partitioned dataset
  • [HUDI-4453] - Support partition pruning for tables Bootstrapped from Source Hive Style partitioned tables
  • [HUDI-4482] - Remove guava from codebase
  • [HUDI-4483] - Fix checkstyle on scala code and integ-test module
  • [HUDI-4493] - Fix handling of corrupt avro files properly
  • [HUDI-4551] - Tweak the default parallelism of flink pipeline to execution env parallelism
  • [HUDI-4582] - Sync 11w partitions to hive by using HiveSyncTool with(--sync-mode="hms" and use-jdbc=false) with timeout
  • [HUDI-4608] - Fix upgrade command in Hudi CLI
  • [HUDI-4609] - Improve usability of upgrade/downgrade commands in Hudi CLI
  • [HUDI-4633] - Add command to trace partition through a range of commits
  • [HUDI-4635] - Update roadmap page based on H2 2022 plan
  • [HUDI-4642] - Add hudi-cli support to repair deprecated partition
  • [HUDI-4648] - Add command to rename partition
  • [HUDI-4649] - Add command to trace file group through a range of commits
  • [HUDI-4650] - Commits Command: Include both active and archive timeline for a given range of intants
  • [HUDI-4661] - Test COW: Hive QL with bootstrap
  • [HUDI-4665] - Flip default for "ignore.failed.batch" for streaming sink
  • [HUDI-4683] - Use enum class value for default value in flink options
  • [HUDI-4686] - Flip option 'write.ignore.failed' to default false
  • [HUDI-4698] - Rename the package 'org.apache.flink.table.data' to avoid conflicts with flink table core
  • [HUDI-4722] - Add support for metrics for locking infra
  • [HUDI-4731] - Shutdown cloud watch reporter on exit
  • [HUDI-4734] - Add table config change validation in deltastreamer
  • [HUDI-4746] - Fix flaky : ITTestDataStreamWrite.testWriteMergeOnReadWithCompaction
  • [HUDI-4747] - Fix flaky: ITTestHoodieFlinkCompactor.testHoodieFlinkCompactorWithPlanSelectStrategy
  • [HUDI-4748] - Add examples of soft deletes in docs
  • [HUDI-4751] - Ensure transaction owner instant is set by all callers of txnManager apis
  • [HUDI-4752] - Add dedup support for MOR table in cli
  • [HUDI-4805] - Update docs for workaround to make HBase working with HDFS on Hadoop 3
  • [HUDI-4833] - Add Postgres Schema Name to Postgres Debezium Source
  • [HUDI-4837] - Stop sleeping where it is not necessary in Kafka source
  • [HUDI-4844] - Skip partition value resolving when the field does not exists for MergeOnReadInputFormat#getReader
  • [HUDI-4865] - Optimize HoodieAvroUtils#isMetadataField to use O(1) complexity
  • [HUDI-4870] - Improve compaction config description
  • [HUDI-4873] - Report number of messages from AvroKafkaSource to be processed via metrics
  • [HUDI-4884] - Fix website docs for default index type in hudi
  • [HUDI-4908] - Fix flaky TestHoodieBackedTableMetadata.testMultiReaderForHoodieBackedTableMetadata
  • [HUDI-5173] - Skip if there is only one file in clusteringGroup

Test

  • [HUDI-3054] - Fix flaky TestHoodieClientMultiWriter. testHoodieClientBasicMultiWriter
  • [HUDI-4713] - Fix flaky ITTestHoodieDataSource#testAppendWrite
  • [HUDI-4721] - Fix thread safety w/ RemoteTableFileSystemView

Task

  • [HUDI-3013] - Docs for Presto and Hudi
  • [HUDI-3524] - Decouple basic and advanced configs in website
  • [HUDI-3961] - Encounter NoClassDefFoundError when using Spark 3.1 bundle and utilities slim bundle
  • [HUDI-4000] - Docs around DBT
  • [HUDI-4327] - TestHoodieDeltaStreamer#testCleanerDeleteReplacedDataWithArchive is flaky
  • [HUDI-4441] - Disbale INFO level logs from tests
  • [HUDI-4528] - Diff tool to compare metadata across snapshots in a given time range
  • [HUDI-4529] - Tweak some default config options for flink
  • [HUDI-4563] - Docs writing for 0.12.0: key gen API change and perf improvements
  • [HUDI-4638] - Rename payload clazz and preCombine field options for flink sql
  • [HUDI-4644] - Change default flink profile to 1.15.x
  • [HUDI-4687] - Avoid all illegal reflective access in the code
  • [HUDI-4694] - Analyze the latest UT/FT runtime
  • [HUDI-4695] - Flaky: TestInlineCompaction.testCompactionRetryOnFailureBasedOnTime:308 expected: <4> but was: <5>
  • [HUDI-4696] - Flaky: TestHoodieCombineHiveInputFormat.setUpClass:86 » NullPointer
  • [HUDI-4709] - [RFC-48] Log Compaction Code review
  • [HUDI-4723] - Add document about Hoodie Catalog
  • [HUDI-4786] - Add Flink DataStream API demo in Flink Guide
  • [HUDI-4811] - Fix the checkstyle of hudi flink
  • [HUDI-4821] - Presto query for bootstrapped table fails due to IOException
  • [HUDI-4832] - Hive Sync can potentially drop all partitions
  • [HUDI-4864] - Fix AWSDmsAvroPayload during delete operations with MOR snapshot query
  • [HUDI-4943] - Benchmark JOL ClassLayout based object size estimator
  • [HUDI-5050] - Release note for version 0.12.1
评论
  • 按正序
  • 按倒序
  • 按热度
Powered by Waline v2.14.1