Prepend header record (or a string / a file) to large file in Scala / Java












0















What is the most efficient (or recommended) way to prepend a string or a file to another large file in Scala, preferably without using external libraries? The large file can be binary.



E.g.



if prepend string is:
header_information|123.45|xyzn



and large file is:



abcdefghijklmnopqrstuvwxyz0123456789
abcdefghijklmnopqrstuvwxyz0123456789
abcdefghijklmnopqrstuvwxyz0123456789
...


I would expect to get:



header_information|123.45|xyz
abcdefghijklmnopqrstuvwxyz0123456789
abcdefghijklmnopqrstuvwxyz0123456789
abcdefghijklmnopqrstuvwxyz0123456789
...









share|improve this question


















  • 1





    Why not plain unix?

    – erip
    Nov 20 '18 at 2:45











  • @erip Because in this case it will be workaround and second it will not necessarily always be unix filesystem, it can be AWS S3 or something else.

    – Andrey Dmitriev
    Nov 20 '18 at 9:12
















0















What is the most efficient (or recommended) way to prepend a string or a file to another large file in Scala, preferably without using external libraries? The large file can be binary.



E.g.



if prepend string is:
header_information|123.45|xyzn



and large file is:



abcdefghijklmnopqrstuvwxyz0123456789
abcdefghijklmnopqrstuvwxyz0123456789
abcdefghijklmnopqrstuvwxyz0123456789
...


I would expect to get:



header_information|123.45|xyz
abcdefghijklmnopqrstuvwxyz0123456789
abcdefghijklmnopqrstuvwxyz0123456789
abcdefghijklmnopqrstuvwxyz0123456789
...









share|improve this question


















  • 1





    Why not plain unix?

    – erip
    Nov 20 '18 at 2:45











  • @erip Because in this case it will be workaround and second it will not necessarily always be unix filesystem, it can be AWS S3 or something else.

    – Andrey Dmitriev
    Nov 20 '18 at 9:12














0












0








0


1






What is the most efficient (or recommended) way to prepend a string or a file to another large file in Scala, preferably without using external libraries? The large file can be binary.



E.g.



if prepend string is:
header_information|123.45|xyzn



and large file is:



abcdefghijklmnopqrstuvwxyz0123456789
abcdefghijklmnopqrstuvwxyz0123456789
abcdefghijklmnopqrstuvwxyz0123456789
...


I would expect to get:



header_information|123.45|xyz
abcdefghijklmnopqrstuvwxyz0123456789
abcdefghijklmnopqrstuvwxyz0123456789
abcdefghijklmnopqrstuvwxyz0123456789
...









share|improve this question














What is the most efficient (or recommended) way to prepend a string or a file to another large file in Scala, preferably without using external libraries? The large file can be binary.



E.g.



if prepend string is:
header_information|123.45|xyzn



and large file is:



abcdefghijklmnopqrstuvwxyz0123456789
abcdefghijklmnopqrstuvwxyz0123456789
abcdefghijklmnopqrstuvwxyz0123456789
...


I would expect to get:



header_information|123.45|xyz
abcdefghijklmnopqrstuvwxyz0123456789
abcdefghijklmnopqrstuvwxyz0123456789
abcdefghijklmnopqrstuvwxyz0123456789
...






scala io prepend






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 20 '18 at 1:09









Andrey DmitrievAndrey Dmitriev

1571417




1571417








  • 1





    Why not plain unix?

    – erip
    Nov 20 '18 at 2:45











  • @erip Because in this case it will be workaround and second it will not necessarily always be unix filesystem, it can be AWS S3 or something else.

    – Andrey Dmitriev
    Nov 20 '18 at 9:12














  • 1





    Why not plain unix?

    – erip
    Nov 20 '18 at 2:45











  • @erip Because in this case it will be workaround and second it will not necessarily always be unix filesystem, it can be AWS S3 or something else.

    – Andrey Dmitriev
    Nov 20 '18 at 9:12








1




1





Why not plain unix?

– erip
Nov 20 '18 at 2:45





Why not plain unix?

– erip
Nov 20 '18 at 2:45













@erip Because in this case it will be workaround and second it will not necessarily always be unix filesystem, it can be AWS S3 or something else.

– Andrey Dmitriev
Nov 20 '18 at 9:12





@erip Because in this case it will be workaround and second it will not necessarily always be unix filesystem, it can be AWS S3 or something else.

– Andrey Dmitriev
Nov 20 '18 at 9:12












1 Answer
1






active

oldest

votes


















0














I come up with the following solution:




  1. Turn prepend string/file into InputStream

  2. Turn large file into InputStream

  3. "Combine" InputStreams together using java.io.SequenceInputStream


  4. Use java.nio.file.Files.copy to write to target file



    object FileAppender {
    def main(args: Array[String]): Unit = {
    val stringToPrepend = new ByteArrayInputStream("header_information|123.45|xyzn".getBytes)
    val largeFile = new FileInputStream("big_file.dat")
    Files.copy(
    new SequenceInputStream(stringToPrepend, largeFile),
    Paths.get("output_file.dat"),
    StandardCopyOption.REPLACE_EXISTING
    )
    }
    }



Tested on ~30GB file, took ~40 seconds on MacBookPro (3.3GHz/16GB).



This approach can be used (if necessary) to combine multiple partitioned files created by e.g. Spark engine.






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53384829%2fprepend-header-record-or-a-string-a-file-to-large-file-in-scala-java%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    I come up with the following solution:




    1. Turn prepend string/file into InputStream

    2. Turn large file into InputStream

    3. "Combine" InputStreams together using java.io.SequenceInputStream


    4. Use java.nio.file.Files.copy to write to target file



      object FileAppender {
      def main(args: Array[String]): Unit = {
      val stringToPrepend = new ByteArrayInputStream("header_information|123.45|xyzn".getBytes)
      val largeFile = new FileInputStream("big_file.dat")
      Files.copy(
      new SequenceInputStream(stringToPrepend, largeFile),
      Paths.get("output_file.dat"),
      StandardCopyOption.REPLACE_EXISTING
      )
      }
      }



    Tested on ~30GB file, took ~40 seconds on MacBookPro (3.3GHz/16GB).



    This approach can be used (if necessary) to combine multiple partitioned files created by e.g. Spark engine.






    share|improve this answer




























      0














      I come up with the following solution:




      1. Turn prepend string/file into InputStream

      2. Turn large file into InputStream

      3. "Combine" InputStreams together using java.io.SequenceInputStream


      4. Use java.nio.file.Files.copy to write to target file



        object FileAppender {
        def main(args: Array[String]): Unit = {
        val stringToPrepend = new ByteArrayInputStream("header_information|123.45|xyzn".getBytes)
        val largeFile = new FileInputStream("big_file.dat")
        Files.copy(
        new SequenceInputStream(stringToPrepend, largeFile),
        Paths.get("output_file.dat"),
        StandardCopyOption.REPLACE_EXISTING
        )
        }
        }



      Tested on ~30GB file, took ~40 seconds on MacBookPro (3.3GHz/16GB).



      This approach can be used (if necessary) to combine multiple partitioned files created by e.g. Spark engine.






      share|improve this answer


























        0












        0








        0







        I come up with the following solution:




        1. Turn prepend string/file into InputStream

        2. Turn large file into InputStream

        3. "Combine" InputStreams together using java.io.SequenceInputStream


        4. Use java.nio.file.Files.copy to write to target file



          object FileAppender {
          def main(args: Array[String]): Unit = {
          val stringToPrepend = new ByteArrayInputStream("header_information|123.45|xyzn".getBytes)
          val largeFile = new FileInputStream("big_file.dat")
          Files.copy(
          new SequenceInputStream(stringToPrepend, largeFile),
          Paths.get("output_file.dat"),
          StandardCopyOption.REPLACE_EXISTING
          )
          }
          }



        Tested on ~30GB file, took ~40 seconds on MacBookPro (3.3GHz/16GB).



        This approach can be used (if necessary) to combine multiple partitioned files created by e.g. Spark engine.






        share|improve this answer













        I come up with the following solution:




        1. Turn prepend string/file into InputStream

        2. Turn large file into InputStream

        3. "Combine" InputStreams together using java.io.SequenceInputStream


        4. Use java.nio.file.Files.copy to write to target file



          object FileAppender {
          def main(args: Array[String]): Unit = {
          val stringToPrepend = new ByteArrayInputStream("header_information|123.45|xyzn".getBytes)
          val largeFile = new FileInputStream("big_file.dat")
          Files.copy(
          new SequenceInputStream(stringToPrepend, largeFile),
          Paths.get("output_file.dat"),
          StandardCopyOption.REPLACE_EXISTING
          )
          }
          }



        Tested on ~30GB file, took ~40 seconds on MacBookPro (3.3GHz/16GB).



        This approach can be used (if necessary) to combine multiple partitioned files created by e.g. Spark engine.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 20 '18 at 1:43









        Andrey DmitrievAndrey Dmitriev

        1571417




        1571417
































            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53384829%2fprepend-header-record-or-a-string-a-file-to-large-file-in-scala-java%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Post a Comment

            0 Comments